transformers/docs/source/en/model_doc
Arthur 25b7f27234
Add llama4 (#37307)
* remove one of the last deps

* update fast image processor after refactor

* styling

* more quality of life improvements

* nit

* update

* cleanups

* some cleanups

* vllm updates

* update fake image token

* [convert] Fix typo

* [convert] Strip extraneous bytes from shards

* [convert] Minor fixes

* [convert] Use num_experts

* multi-image fixes in modeling + processor

* fixup size

* 128 experts

* Use default rope

* Unfuse mlp

* simplify a lot inputs embeds merging

* remove .item() 👀

* fix from review

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* set seed

* return aspect ratios and bug fixes

* Moe 128 rebased (#8)

* 128 experts

* Use default rope

* Unfuse mlp

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* Meta/llama quant compat (#7)

* add quant compatible model & conversion code for llama4

* fix a few issues

* fix a few issues

* minor type mapping fix

---------

Co-authored-by: Lu Fang <fanglu@fb.com>

* use a new config parameter to determine which model definition to use for MoE

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lu Fang <fanglu@fb.com>

* un-comment write_tokenizer from converting script

* remove un-used imports

* [llama4] Pop aspect_ratios from image processor output in Llama4Processor

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* Fix parameter_count name

* Update src/transformers/models/llama4/configuration_llama4.py

* nit

* Add changes for no_rope, moe_layers, chunked attention. Just need to test all

* Update src/transformers/models/llama4/image_processing_llama4_fast.py

* nit

* fix post merge with main

* support flex attention

* fixes

* fix

* add layer

* small updates

* rebase and delete llm_compressor

* nit

* [llama4/mm] Add back <|image|> token that delimits global tile

* [llama4/mm] Fix Llama 4 image processing unit tests

* add explicit dtype

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* sdpa works

* comment todo small

* fix model loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* revert

* nits

* small fix for TP on 1 node

* Read new params from config

* Add <|eom|>

* lol don't know how this got here

* adding fp8

* Save processor, fix chat template

* style

* Add boi/eoi tokens

We don't use them.

* fixes for now flex seems to work :)

* updates

* nits

* updates

* missking keys

* add context parallel

* update

* update

* fix

* nits

* add worldsize and make eager attn work for vision

* Ignore new key present in base models

* add tp_plan

* fix nope

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* minor fix

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* Clean up Llama4 vision model

* current updates

* add support for `attn_temperature_tuning`

* add floor scale

* add missing attn scales

* push what works, dirty trick for the device synch

* oups

* Fix pad_token_id

See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.

* fix causallml loading

* rm

* fix tied-weights

* fix sdpa

* push current version

* should work with both short and long

* add compressed_tensos & fix fbgemm tp

* Fix flex impl

* style

* chunking

* try to revert the potentially breaking change

* fix auto factory

* fix shapes in general

* rm processing

* commit cache utils cleanup

* Fix context length

* fix

* allocate

* update tp_plan

* fix SDPA!

* Add support for sparse `Llama4TextMoe` layer from the kernel hub

* cleanup

* better merge

* update

* still broken fixing now

* nits

* revert print

* Write max_position_embeddings and max_model_length

* Update modeling_llama4.py

* Save attention_chunk_size

* Sync eos terminators

* Read initializer_range

* style

* remove `dict`

* fix

* eager should use `chunked_attention_mask`

* revert

* fixup

* fix config

* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"

This reverts commit ccda19f050, reversing
changes made to a515579aed.

* Fix typo and remove warning with compiled flex and chunked prefill

* Fix MoE vs FF (#41)

* fix

* Use correct no_rope_layers if provided one is empty list

* update tests

* fix

* skipping some tests

* fix fp8 loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* fix text geneartion pipeline

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* eager needs 4D mask

* fix

* Some cleanup

* fix

* update

* fix

* replace correctly module

* patch

* modulelist

* update

* update

* clean up

* Don't move to `cuda:0` in distributed mode

* restrict to compressed tensors for now

* rm print

* Docs!

* Fixes

* Update docs/source/en/model_doc/llama4.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fixes

* cuda graph fix

* revert some stuff

* fixup

* styling

* Update src/transformers/models/llama4/modeling_llama4.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* commit licence, cleanup here and there and style

* more styling changes

* fix dummies

* fix and clean docstrings

* remove comment

* remove warning

* Only fast image processor is supported

* nit

* trigger CI

* fix issue with flex encoder

* fix dynamic cache

* Code quality

* Code quality

* fix more tests for now

* Code quality

* Code quality

* Nuke bunch of failing stuff

* Code quality

* Code quality

* cleanup removal of slow image processor

* ruff fix fast image processor

* fix

* fix styling

* Docs

* Repo consistency

* Repo consistency

* fix sliding window issue

* separate llama cache

* styling

* Repo consistency

* Repo consistency

* push waht works

* L4 Repo consistency

* Docs

* fix last last alst alst alst alstsaltlsltlaslt

---------

Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: Jon Swenson <jmswen@gmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Co-authored-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-05 22:02:22 +02:00
..
albert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
align.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
altclip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
aria.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
audio-spectrogram-transformer.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
auto.md Add auto model for image-text-to-text (#32472) 2024-10-08 14:26:43 +02:00
autoformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
aya_vision.md Add aya (#36521) 2025-03-04 12:24:33 +01:00
bamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bark.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bart.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
barthez.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bartpho.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
beit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert-generation.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
bertweet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
big_bird.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bigbird_pegasus.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
biogpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blenderbot-small.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blenderbot.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blip-2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bloom.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bort.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bridgetower.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bros.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
byt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
camembert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
canine.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
chameleon.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
chinese_clip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
clap.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
clip.md Updated the model card for CLIP (#37040) 2025-04-02 14:57:38 -07:00
clipseg.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
clvp.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
code_llama.md chore: Update model doc for code_llama (#37115) 2025-04-03 10:09:41 -07:00
codegen.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cohere.md Update model card for Cohere (#37056) 2025-04-03 09:51:40 -07:00
cohere2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
colpali.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
conditional_detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
convbert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
convnext.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
convnextv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cpm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cpmant.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ctrl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cvt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dab-detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dac.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
data2vec.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dbrx.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deberta-v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deberta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
decision_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deepseek_v3.md [WIP] add deepseek-v3 (#35926) 2025-03-28 15:56:59 +01:00
deformable_detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deit.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
deplot.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
depth_anything_v2.md Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550) 2024-10-22 15:50:54 +02:00
depth_anything.md Update model card for Depth Anything (#37065) 2025-04-04 11:36:05 -07:00
depth_pro.md fix typos in the docs directory (#36639) 2025-03-11 09:41:41 -07:00
deta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dialogpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
diffllama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dinat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dinov2_with_registers.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
dinov2.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
distilbert.md Updated model card for distilbert (#37157) 2025-04-04 15:22:46 -07:00
dit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
donut.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dpr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dpt.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
efficientformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
efficientnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
electra.md Update model card for electra (#37063) 2025-04-03 10:45:35 -07:00
emu3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
encodec.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
encoder-decoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ernie_m.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ernie.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
esm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
falcon_mamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
falcon.md Update falcon model card (#37184) 2025-04-02 17:30:37 -07:00
falcon3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fastspeech2_conformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flan-t5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flan-ul2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flaubert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
focalnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fsmt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
funnel.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fuyu.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gemma.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gemma2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gemma3.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
git.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
glm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
glpn.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
got_ocr2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_bigcode.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neox_japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neox.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt-sw3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt2.md [GPT2] Add SDPA support (#31172) 2024-06-19 09:40:57 +02:00
gptj.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gptsan-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granite.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granitemoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granitemoeshared.md add shared experts for upcoming Granite 4.0 language models (#35894) 2025-02-14 16:55:28 +01:00
granitevision.md Update Granite Vision Model Path / Tests (#35998) 2025-02-03 20:06:03 +01:00
graphormer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
grounding-dino.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
groupvit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
helium.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
herbert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
hiera.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
hubert.md [MINOR:TYPO] Update hubert.md (#36733) 2025-03-17 09:07:51 -07:00
ibert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ijepa.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
imagegpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
informer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
instructblip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
instructblipvideo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
jamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
jetmoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
jukebox.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
kosmos-2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
layoutlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
layoutlmv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
layoutlmv3.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
layoutxlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
led.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
levit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
lilt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llama.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
llama2.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
llama3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llama4.md Add llama4 (#37307) 2025-04-05 22:02:22 +02:00
llava_next_video.md fix typos in the docs directory (#36639) 2025-03-11 09:41:41 -07:00
llava_next.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llava_onevision.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
longformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
longt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
luke.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
lxmert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
m2m_100.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
madlad-400.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mamba2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
marian.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
markuplm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mask2former.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
maskformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
matcha.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mbart.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mctct.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mega.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
megatron_gpt2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
megatron-bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mgp-str.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mimi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mistral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mistral3.md Fix Mistral3 tests (#36797) 2025-03-18 13:08:12 -04:00
mixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mllama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mluke.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mms.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilebert.md mobilebert model card update (#37256) 2025-04-04 14:28:35 -07:00
mobilenet_v1.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilenet_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilevit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilevitv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
modernbert.md Update Model Card for ModernBERT (#37052) 2025-04-03 10:14:02 -07:00
moonshine.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
moshi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mpnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mra.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
musicgen_melody.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
musicgen.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mvp.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
myt5.md [WIP] Add Tokenizer for MyT5 Model (#31286) 2024-10-06 10:33:16 +02:00
nat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nemotron.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nezha.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nllb-moe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nllb.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nougat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nystromformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
olmo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
olmo2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
olmoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
omdet-turbo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
oneformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
open-llama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
openai-gpt.md Update OpenAI GPT model card (#37255) 2025-04-04 15:25:16 -07:00
opt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
owlv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
owlvit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
paligemma.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
patchtsmixer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
patchtst.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pegasus_x.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pegasus.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
perceiver.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
persimmon.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi4_multimodal.md [Phi4] add multimodal chat template (#36996) 2025-04-03 09:52:09 +02:00
phimoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phobert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pix2struct.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
plbart.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
poolformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pop2piano.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
prompt_depth_anything.md Add Prompt Depth Anything Model (#35401) 2025-03-20 16:12:44 +00:00
prophetnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pvt_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pvt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
qdqbert.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
qwen2_5_vl.md feat: updated model card for qwen_2.5_vl (#37099) 2025-04-03 09:13:26 -07:00
qwen2_audio.md [qwen2 audio] remove redundant code and update docs (#36282) 2025-03-20 10:54:51 +00:00
qwen2_moe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
qwen2_vl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
qwen2.md Updated model card for Qwen2 (#37192) 2025-04-02 18:10:41 -07:00
qwen3_moe.md Adding Qwen3 and Qwen3MoE (#36878) 2025-03-31 09:50:49 +02:00
qwen3.md Adding Qwen3 and Qwen3MoE (#36878) 2025-03-31 09:50:49 +02:00
rag.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
realm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
recurrent_gemma.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
reformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
regnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rembert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
resnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
retribert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roberta-prelayernorm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roberta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roc_bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rt_detr_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rt_detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rwkv.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sam.md Create and Expose SamVisionModel as public for better accessibility (#36493) 2025-03-31 11:45:07 +02:00
seamless_m4t_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
seamless_m4t.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
segformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
seggpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sew-d.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sew.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
shieldgemma2.md Adding links to ShieldGemma 2 technical report (#37247) 2025-04-03 16:26:29 +01:00
siglip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
siglip2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
smolvlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
speech_to_text_2.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
speech_to_text.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
speech-encoder-decoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
speecht5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
splinter.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
squeezebert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
stablelm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
starcoder2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
superglue.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
superpoint.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swiftformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swin.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swin2sr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swinv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
switch_transformers.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
t5.md Updated T5 model card with standardized format (#37261) 2025-04-04 15:23:09 -07:00
t5v1.1.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
table-transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tapas.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tapex.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
textnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
time_series_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
timesformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
timm_wrapper.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
trajectory_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
transfo-xl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
trocr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tvlt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tvp.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
udop.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ul2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
umt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
unispeech-sat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
unispeech.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
univnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
upernet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
van.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
video_llava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
videomae.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
vilt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vipllava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vision-encoder-decoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vision-text-dual-encoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
visual_bert.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
vit_hybrid.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vit_mae.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
vit_msn.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
vit.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
vitdet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vitmatte.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vitpose.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vits.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vivit.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
wav2vec2_phoneme.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2-bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2-conformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wavlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
whisper.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
xclip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xglm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-prophetnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-roberta-xl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-roberta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-v.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xls_r.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlsr_wav2vec2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xmod.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
yolos.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
yoso.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zamba2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zoedepth.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00