transformers/docs/source/en/model_doc
Jaeyong Sung 583db52bc6
Add Dia model (#38405)
* add dia model

* add tokenizer files

* cleanup some stuff

* brut copy paste code

* rough cleanup of the modeling code

* nuke some stuff

* more nuking

* more cleanups

* updates

* add mulitLayerEmbedding vectorization

* nits

* more modeling simplifications

* updates

* update rope

* update rope

* just fixup

* update configuration files

* more cleanup!

* default config values

* update

* forgotten comma

* another comma!

* update, more cleanups

* just more nits

* more config cleanups

* time for the encoder

* fix

* sa=mall nit

* nits

* n

* refacto a bit

* cleanup

* update cv scipt

* fix last issues

* fix last nits

* styling

* small fixes

* just run 1 generation

* fixes

* nits

* fix conversion

* fix

* more fixes

* full generate

* ouf!

* fixes!

* updates

* fix

* fix cvrt

* fixup

* nits

* delete wrong test

* update

* update

* test tokenization

* let's start changing things bit by bit - fix encoder step

* removing custom generation, moving to GenerationMixin

* add encoder decoder attention masks for generation

* mask changes, correctness checked against ad29837 in dia repo

* refactor a bit already --> next cache

* too important not to push :)

* minimal cleanup + more todos

* make main overwrite modeling utils

* add cfg filter & eos filter

* add eos countdown & delay pattern

* update eos countdown

* add max step eos countdown

* fix tests

* fix some things

* fix generation with testing

* move cfg & eos stuff to logits processor

* make RepetitionPenaltyLogitsProcessor flexible

- can accept 3D scores like (batch_size, channel, vocab)

* fix input_ids concatenation dimension in GenerationMixin for flexibility

* Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility.

* Add stopping criteria

* refactor

* move delay pattern from processor to modeling like musicgen.

- add docs
- change eos countdown to eos delay pattern

* fix processor & fix tests

* refactor types

* refactor imports

* format code

* fix docstring to pass ci

* add docstring to DiaConfig & add DiaModel to test

* fix docstring

* add docstring

* fix some bugs

* check

* porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first

* experimental testing of left padding for first channel

* whoops

* Fix merge to make generation work

* fix cfg filter

* add position ids

* add todos, break things

* revert changes to generation --> we will force 2d but go 3d on custom stuff

* refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos

* some first fixes to get to 10. in generation

* some more generation fixes / adjustment

* style + rope fixes

* move cfg out, simplify a few things, more todos

* nit

* start working on custom logit processors

* nit

* quick fixes

* cfg top k

* more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar

* lets keep changes to core code minimal, only eos scaling is questionable atm

* simpler eos delay logits processor

* that was for debugging :D

* proof of concept rope

* small fix on device mismatch

* cfg fixes + delay logits max len

* transformers rope

* modular dia

* more cleanup

* keep modeling consistently 3D, generate handles 2D internally

* decoder starts with bos if nothing

* post processing prototype

* style

* lol

* force sample / greedy + fixes on padding

* style

* fixup tokenization

* nits

* revert

* start working on dia tests

* fix a lot of tests

* more test fixes

* nit

* more test fixes + some features to simplify code more

* more cleanup

* forgot that one

* autodocs

* small consistency fixes

* fix regression

* small fixes

* dia feature extraction

* docs

* wip processor

* fix processor order

* processing goes brrr

* transpose before

* small fix

* fix major bug but needs now a closer look into the custom processors esp cfg

* small thing on logits

* nits

* simplify indices and shifts

* add simpler version of padding tests back (temporarily)

* add logit processor tests

* starting tests on processor

* fix mask application during generation

* some fixes on the weights conversion

* style + fixup logits order

* simplify conversion

* nit

* remove padding tests

* nits on modeling

* hmm

* fix tests

* trigger

* probably gonna be reverted, just a quick design around audio tokenizer

* fixup typing

* post merge + more typing

* initial design for audio tokenizer

* more design changes

* nit

* more processor tests and style related things

* add to init

* protect import

* not sure why tbh

* add another protect

* more fixes

* wow

* it aint stopping :D

* another missed type issue

* ...

* change design around audio tokenizer to prioritize init and go for auto - in regards to the review

* change to new causal mask function + docstrings

* change ternary

* docs

* remove todo, i dont think its essential tbh

* remove pipeline as current pipelines do not fit in the current scheme, same as csm

* closer to wrapping up the processor

* text to audio, just for demo purposes (will likely be reverted)

* check if it's this

* save audio function

* ensure no grad

* fixes on prefixed audio, hop length is used via preprocess dac, device fixes

* integration tests (tested locally on a100) + some processor utils / fixes

* style

* nits

* another round of smaller things

* docs + some fixes (generate one might be big)

* msytery solved

* small fix on conversion

* add abstract audio tokenizer, change init check to abstract class

* nits

* update docs + fix some processing :D

* change inheritance scheme for audio tokenizer

* delete dead / unnecessary code in copied generate loop

* last nits on new pipeline behavior (+ todo on tests) + style

* trigger

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
2025-06-26 11:04:23 +00:00
..
albert.md Remove merge conflict artifacts in Albert model doc (#38849) 2025-06-16 14:21:18 -07:00
align.md Updated the Model docs - for the ALIGN model (#38072) 2025-05-28 09:19:09 -07:00
altclip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
arcee.md Add Arcee model support (#38621) 2025-06-24 15:05:29 +02:00
aria.md Updated Aria model card (#38472) 2025-06-05 14:36:54 -07:00
audio-spectrogram-transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
auto.md Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
autoformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
aya_vision.md Updated aya_vision.md (#38749) 2025-06-16 10:46:30 -07:00
bamba.md Update bamba model card (#38853) 2025-06-18 16:01:25 -07:00
bark.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bart.md New bart model card (#37858) 2025-05-27 11:51:41 -07:00
barthez.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bartpho.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
beit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bert-generation.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bert-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
bertweet.md Updated BERTweet model card. (#37981) 2025-05-27 11:51:22 -07:00
big_bird.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bigbird_pegasus.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
biogpt.md Update BioGPT model card (#38214) 2025-05-23 13:03:47 -07:00
bit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bitnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blenderbot-small.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blenderbot.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blip-2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blip.md Update blip model card (#38513) 2025-06-20 13:46:19 -07:00
bloom.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bort.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bridgetower.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bros.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
byt5.md Standardize ByT5 model card format (#38699) 2025-06-09 15:02:50 -07:00
camembert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
canine.md New canine model card (#38631) 2025-06-10 09:30:05 -07:00
chameleon.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
chinese_clip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
clap.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
clip.md Updated the model card for CLIP (#37040) 2025-04-02 14:57:38 -07:00
clipseg.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
clvp.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
code_llama.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
codegen.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cohere.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
cohere2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
colpali.md Add ColQwen2 to 🤗 transformers (#35778) 2025-06-02 12:58:01 +00:00
colqwen2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
conditional_detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
convbert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
convnext.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
convnextv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cpm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cpmant.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
csm.md [CSM] update model id (#38211) 2025-05-27 17:03:55 +02:00
ctrl.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cvt.md Update CvT documentation with improved usage examples and additional … (#38731) 2025-06-17 10:30:03 -07:00
d_fine.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dab-detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dac.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
data2vec.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dbrx.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deberta-v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deberta.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
decision_transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deepseek_v3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deformable_detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deplot.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
depth_anything_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
depth_anything.md Update model card for Depth Anything (#37065) 2025-04-04 11:36:05 -07:00
depth_pro.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deta.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
detr.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
dia.md Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
dialogpt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
diffllama.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dinat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dinov2_with_registers.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dinov2.md Add usage example for DINOv2 (#37398) 2025-05-01 08:54:22 -07:00
distilbert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
dit.md [Docs] New DiT model card (#38721) 2025-06-12 10:26:50 -07:00
donut.md Add Fast Image Processor for Donut (#37081) 2025-04-14 16:24:01 +02:00
dots1.md [Model] add dots1 (#38143) 2025-06-25 11:38:25 +02:00
dpr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dpt.md 36978 | Fast image processor for DPT model (#37481) 2025-06-18 17:33:29 +00:00
efficientformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
efficientnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
electra.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
emu3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
encodec.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
encoder-decoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ernie_m.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ernie.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
esm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
falcon_h1.md [MODEL] Add Falcon H1 (#38249) 2025-05-21 10:43:11 +02:00
falcon_mamba.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
falcon.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
falcon3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fastspeech2_conformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
flan-t5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
flan-ul2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flaubert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
flava.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
fnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
focalnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
fsmt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
funnel.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
fuyu.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
gemma.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
gemma2.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
gemma3.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
git.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
glm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
glm4.md Add glm4 (#37388) 2025-04-09 14:02:04 +02:00
glm4v.md GLM-4.1V Model support (#38431) 2025-06-25 10:43:05 +02:00
glpn.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
got_ocr2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
gpt_bigcode.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
gpt_neo.md New gpt neo model card (#38505) 2025-06-04 09:56:47 -07:00
gpt_neox_japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neox.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt-sw3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt2.md Aligning modling code for GPT2 to work with vLLM (fallback) (#36934) 2025-05-02 09:55:16 +02:00
gptj.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gptsan-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granite_speech.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
granite.md Update granite.md (#37791) 2025-05-27 12:55:15 -07:00
granitemoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
granitemoehybrid.md Add GraniteMoeHybrid support for 4.0 (#37658) 2025-05-06 06:47:43 +02:00
granitemoeshared.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
granitevision.md Update Granite Vision Model Path / Tests (#35998) 2025-02-03 20:06:03 +01:00
graphormer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
grounding-dino.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
groupvit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
helium.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
herbert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
hgnet_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
hiera.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
hubert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ibert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
idefics.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics2.md Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
idefics3.md Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
ijepa.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
imagegpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
informer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
instructblip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
instructblipvideo.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
internvl.md 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00
jamba.md 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
janus.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
jetmoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
jukebox.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
kosmos-2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
kyutai_speech_to_text.md [Kyutai-STT] correct model type + model id (#39035) 2025-06-25 16:09:00 +00:00
layoutlm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
layoutlmv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
layoutlmv3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
layoutxlm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
led.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
levit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
lightglue.md Add LightGlue model (#31718) 2025-06-17 18:10:23 +02:00
lilt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llama.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
llama2.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
llama3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llama4.md Add llama4 (#37307) 2025-04-05 22:02:22 +02:00
llava_next_video.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llava_next.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llava_onevision.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llava.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
longformer.md Fix broken tag in Longformer model card (#38828) 2025-06-16 07:44:40 -07:00
longt5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
luke.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
lxmert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
m2m_100.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
madlad-400.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mamba.md Simplify and update trl examples (#38772) 2025-06-13 12:03:49 +00:00
mamba2.md Simplify and update trl examples (#38772) 2025-06-13 12:03:49 +00:00
marian.md 🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108) 2025-05-22 17:12:58 +02:00
markuplm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mask2former.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
maskformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
matcha.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mbart.md Remove head mask in generative models (#35786) 2025-05-15 10:44:19 +02:00
mctct.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mega.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
megatron_gpt2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
megatron-bert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mgp-str.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mimi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
minimax.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mistral.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
mistral3.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
mixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mlcd.md Add MLCD model (#36182) 2025-04-15 11:33:09 +01:00
mllama.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
mluke.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mms.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mobilebert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
mobilenet_v1.md Model card for mobilenet v1 and v2 (#37948) 2025-05-28 09:20:19 -07:00
mobilenet_v2.md Model card for mobilenet v1 and v2 (#37948) 2025-05-28 09:20:19 -07:00
mobilevit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mobilevitv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
modernbert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
moonshine.md Updated moonshine modelcard (#38711) 2025-06-12 10:27:17 -07:00
moshi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mpnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mra.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mt5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
musicgen_melody.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
musicgen.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mvp.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
myt5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nemotron.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nezha.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nllb-moe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nllb.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nougat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nystromformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
olmo.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
olmo2.md Updated model card for OLMo2 (#38394) 2025-05-27 16:24:36 -07:00
olmoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
omdet-turbo.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
oneformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
open-llama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
openai-gpt.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
opt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
owlv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
owlvit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
paligemma.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
patchtsmixer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
patchtst.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pegasus_x.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pegasus.md Add missing div in Pegasus model card (#38773) 2025-06-12 10:27:07 -07:00
perceiver.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
persimmon.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
phi3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
phi4_multimodal.md Update Phi4 converter (#37594) 2025-04-17 23:08:24 +02:00
phimoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
phobert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pix2struct.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
plbart.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
poolformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pop2piano.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
prompt_depth_anything.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
prophetnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pvt_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pvt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qdqbert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qwen2_5_omni.md [doc] fix the code examples in qwen doc (#37803) 2025-04-28 11:56:32 +01:00
qwen2_5_vl.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
qwen2_audio.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qwen2_moe.md Add Qwen2 MoE model card (#38649) 2025-06-11 15:14:01 -07:00
qwen2_vl.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qwen2.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
qwen3_moe.md Adding Qwen3 and Qwen3MoE (#36878) 2025-03-31 09:50:49 +02:00
qwen3.md Adding Qwen3 and Qwen3MoE (#36878) 2025-03-31 09:50:49 +02:00
rag.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
realm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
recurrent_gemma.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
reformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
regnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
rembert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
resnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
retribert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roberta-prelayernorm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
roberta.md [docs] updated roberta model card (#38777) 2025-06-13 12:02:44 -07:00
roc_bert.md Update roc bert docs (#38835) 2025-06-17 11:02:18 -07:00
roformer.md [docs]: update roformer.md model card (#37946) 2025-05-23 16:27:56 -07:00
rt_detr_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
rt_detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
rwkv.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sam_hq.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
sam.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
seamless_m4t_v2.md Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
seamless_m4t.md Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
segformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
seggpt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
sew-d.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
sew.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
shieldgemma2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
siglip.md chore: update model card for SigLIP (#37585) 2025-04-18 13:30:41 -07:00
siglip2.md chore: update SigLIP2 model card (#37624) 2025-04-25 12:46:17 -07:00
smollm3.md Add SmolLM3 (#38755) 2025-06-25 15:12:15 +00:00
smolvlm.md Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
speech_to_text_2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
speech_to_text.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
speech-encoder-decoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
speecht5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
splinter.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
squeezebert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
stablelm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
starcoder2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
superglue.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
superpoint.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
swiftformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
swin.md docs(swin): Update Swin model card to standard format (#37628) 2025-05-21 16:16:43 -07:00
swin2sr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
swinv2.md docs(swinv2): Update SwinV2 model card to new standard format (#37942) 2025-05-23 13:04:13 -07:00
switch_transformers.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
t5.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
t5gemma.md Encoder-Decoder Gemma (#38332) 2025-06-25 09:05:10 +00:00
t5v1.1.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
table-transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
tapas.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tapex.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
textnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
time_series_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
timesfm.md Add TimesFM Time Series Forecasting Model (#34082) 2025-04-16 15:00:53 +02:00
timesformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
timm_wrapper.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
trajectory_transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
transfo-xl.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
trocr.md Docs: Add custom fine-tuning tutorial to TrOCR model page (#38847) 2025-06-18 09:38:58 -07:00
tvlt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
tvp.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
udop.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ul2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
umt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
unispeech-sat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
unispeech.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
univnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
upernet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
van.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
video_llava.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
videomae.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vilt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vipllava.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vision-encoder-decoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vision-text-dual-encoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
visual_bert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vit_hybrid.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vit_mae.md Updated the model card for ViTMAE (#38302) 2025-05-28 09:19:43 -07:00
vit_msn.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vit.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
vitdet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vitmatte.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vitpose.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vits.md Update VITS model card (#37335) 2025-04-15 13:16:05 -07:00
vivit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vjepa2.md Add V-JEPA for video classification model (#38788) 2025-06-13 17:56:15 +01:00
wav2vec2_phoneme.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
wav2vec2-bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2-conformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
wav2vec2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
wavlm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
whisper.md [Whisper] handle deprecation of forced_decoder_ids (#38232) 2025-05-22 09:16:38 +00:00
xclip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xglm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlm-prophetnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlm-roberta-xl.md Created model card for xlm-roberta-xl (#38597) 2025-06-09 13:00:38 -07:00
xlm-roberta.md Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596) 2025-06-09 12:26:31 -07:00
xlm-v.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlm.md Created model card for XLM model (#38595) 2025-06-09 12:26:23 -07:00
xlnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xls_r.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlsr_wav2vec2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xmod.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
yolos.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
yoso.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
zamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zamba2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zoedepth.md added fast image processor for ZoeDepth and expanded tests accordingly (#38515) 2025-06-04 22:59:17 +00:00