transformers/docs/source/en/model_doc
Benjamin Warner 667ed5635e
Add ModernBERT to Transformers (#35158)
* initial cut of modernbert for transformers

* small bug fixes

* fixes

* Update import

* Use compiled mlp->mlp_norm to match research implementation

* Propagate changes in modular to modeling

* Replace duplicate attn_out_dropout in favor of attention_dropout

cc @warner-benjamin let me know if the two should remain separate!

* Update BOS to CLS and EOS to SEP

Please confirm @warner-benjamin

* Set default classifier bias to False, matching research repo

* Update tie_word_embeddings description

* Fix _init_weights for ForMaskedLM

* Match base_model_prefix

* Add compiled_head to match research repo outputs

* Fix imports for ModernBertForMaskedLM

* Just use "gelu" default outright for classifier

* Fix config name typo: initalizer -> initializer

* Remove some unused parameters in docstring. Still lots to edit there!

* Compile the embeddings forward

Not having this resulted in very slight differences - so small it wasn't even noticed for the base model, only for the large model.

But the tiny difference for large propagated at the embedding layer through the rest of the model, leading to notable differences of ~0.0084 average per value, up to 0.2343 for the worst case.

* Add drafts for ForSequenceClassification/ForTokenClassification

* Add initial SDPA support (not exactly equivalent to FA2 yet!)

During testing, FA2 and SDPA still differ by about 0.0098 per value in the token embeddings. It still predicts the correct mask fills, but I'd like to get it fully 1-1 if possible.

* Only use attention dropout if training

* Add initial eager attention support (also not equivalent to FA2 yet!)

Frustratingly, I also can't get eager to be equivalent to FA2 (or sdpa), but it does get really close, i.e. avg ~0.010 difference per value.

Especially if I use fp32 for both FA2&eager, avg ~0.0029 difference per value

The fill-mask results are good with eager.

* Add initial tests, output_attentions, output_hidden_states, prune_heads

Tests are based on BERT, not all tests pass yet: 23 failed, 79 passed, 100 skipped

* Remove kwargs from ModernBertForMaskedLM

Disable sparse_prediction by default to match the normal HF, can be enabled via config

* Remove/adjust/skip improper tests; warn if padding but no attn mask

* Run formatting etc.

* Run python utils/custom_init_isort.py

* FlexAttention with unpadded sequences(matches FA2 within bf16 numerics)

* Reformat init_weights based on review

* self -> module in attention forwards

* Remove if config.tie_word_embeddings

* Reformat output projection on a different line

* Remove pruning

* Remove assert

* Call contiguous() to simplify paths

* Remove prune_qkv_linear_layer

* Format code

* Keep as kwargs, only use if needed

* Remove unused codepaths & related config options

* Remove 3d attn_mask test; fix token classification tuple output

* Reorder: attention_mask above position_ids, fixes gradient checkpointing

* Fix usage if no FA2 or torch v2.5+

* Make torch.compile/triton optional

Should we rename 'compile'? It's a bit vague

* Separate pooling options into separate functions (cls, mean) - cls as default

* Simplify _pad_modernbert_output, remove unused labels path

* Update tied weights to remove decoder.weight, simplify decoder loading

* Adaptively set config.compile based on hf_device_map/device/resize, etc.

* Update ModernBertConfig docstring

* Satisfy some consistency checks, add unfinished docs

* Only set compile to False if there's more than 1 device

* Add docstrings for public ModernBert classes

* Dont replace docstring returns - ends up being duplicate

* Fix mistake in toctree

* Reformat toctree

* Patched FlexAttention, SDPA, Eager with Local Attention

* Implement FA2 -> SDPA -> Eager attn_impl defaulting, crucial

both to match the original performance, and to get the highest inference speed without requiring users to manually pick FA2

* Patch test edge case with Idefics3 not working with 'attn_implementation="sdpa"'

* Repad all_hidden_states as well

* rename config.compile to reference_compile

* disable flex_attention since it crashes

* Update modernbert.md

* Using dtype min to mask in eager

* Fully remove flex attention for now

It's only compatible with the nightly torch 2.6, so we'll leave it be for now. It's also slower than eager/sdpa.

Also, update compile -> reference_compile in one more case

* Call contiguous to allow for .view()

* Copyright 2020 -> 2024

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update/simplify __init__ structure

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove "... if dropout_prob > 0 else identity"

As dropout with 0.0 should be efficient like identity

* re-use existing pad/unpad functions instead of creating new ones

* remove flexattention method

* Compute attention_mask and local_attention_mask once in modeling

* Simplify sequence classification prediction heads, only CLS now

Users can make custom heads if they feel like it

Also removes the unnecessary pool parameter

* Simplify module.training in eager attn

* Also export ModernBertPreTrainedModel

* Update the documentation with links to finetuning scripts

* Explain local_attention_mask parameter in docstring

* Simplify _autoset_attn_implementation, rely on super()

* Keep "in" to initialize Prediction head

Doublechecked with Benjamin that it's correct/what we used for pretraining

* add back mean pooling

* Use the pooling head in TokenClassification

* update copyright

* Reset config._attn_implementation_internal on failure

* Allow optional attention_mask in ForMaskedLM head

* fix failing run_slow tests

* Add links to the paper

* Remove unpad_no_grad, always pad/unpad without gradients

* local_attention_mask -> sliding_window_mask

* Revert "Use the pooling head in TokenClassification"

This reverts commit 99c38badd1.

There was no real motivation, no info on whether having this bigger head does anything useful.

* Simplify pooling, 2 options via if-else

---------

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com>
Co-authored-by: Benjamin Clavié <ben@clavie.eu>
Co-authored-by: Antoine Chaffin <ant54600@hotmail.fr>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-19 14:03:35 +01:00
..
albert.md Add sdpa support for Albert (#32092) 2024-09-03 14:01:00 +01:00
align.md Uniformize kwargs for image-text-to-text processors (#32544) 2024-09-24 21:28:19 -04:00
altclip.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
aria.md Add Aria (#34157) 2024-12-06 12:17:34 +01:00
audio-spectrogram-transformer.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
auto.md Add auto model for image-text-to-text (#32472) 2024-10-08 14:26:43 +02:00
autoformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bamba.md Add the Bamba Model (#34982) 2024-12-18 20:18:17 +01:00
bark.md F.scaled_dot_product_attention support (#26572) 2023-12-09 05:38:14 +09:00
bart.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
barthez.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bartpho.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
beit.md Add sdpa for Beit (#34941) 2024-12-17 14:44:47 +01:00
bert-generation.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
bert-japanese.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bert.md [BERT] Add support for sdpa (#28802) 2024-04-26 16:23:44 +01:00
bertweet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
big_bird.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bigbird_pegasus.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
biogpt.md Add sdpa for BioGpt (#33592) 2024-09-20 14:27:32 +01:00
bit.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
blenderbot-small.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
blenderbot.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
blip-2.md VLMs: patch_size -> num_image_tokens in processing (#33424) 2024-11-18 13:21:07 +01:00
blip.md Blip: Deprecate BlipModel (#31235) 2024-06-04 18:29:45 +02:00
bloom.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bort.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bridgetower.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bros.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
byt5.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
camembert.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
canine.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
chameleon.md Uniformize kwargs for chameleon processor (#32181) 2024-09-26 10:18:07 -04:00
chinese_clip.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
clap.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
clip.md Add sdpa and FA2 for CLIP (#31940) 2024-07-18 10:30:37 +05:30
clipseg.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
clvp.md Add CLVP (#24745) 2023-11-10 13:49:10 +00:00
code_llama.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
codegen.md Add token type ids to CodeGenTokenizer (#29265) 2024-04-17 12:19:18 +02:00
cohere.md Cohere Model Release (#29622) 2024-03-15 14:29:11 +01:00
cohere2.md Add Cohere2 docs details (#35294) 2024-12-17 09:36:31 -08:00
colpali.md Fix documentation for ColPali (#35321) 2024-12-19 09:08:28 +01:00
conditional_detr.md Add examples for detection models finetuning (#30422) 2024-05-08 11:42:07 +01:00
convbert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
convnext.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
convnextv2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
cpm.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
cpmant.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
ctrl.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
cvt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dac.md Add Descript-Audio-Codec model (#31494) 2024-08-19 10:21:51 +01:00
data2vec.md Add sdpa for Beit (#34941) 2024-12-17 14:44:47 +01:00
dbrx.md Follow up: Fix link in dbrx.md (#30514) 2024-05-27 14:57:43 +02:00
deberta-v2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
deberta.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
decision_transformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
deformable_detr.md Add Image Processor Fast Deformable DETR (#34353) 2024-11-19 11:18:58 -05:00
deit.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
deplot.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
depth_anything_v2.md Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550) 2024-10-22 15:50:54 +02:00
depth_anything.md Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550) 2024-10-22 15:50:54 +02:00
deta.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
detr.md Add DetrImageProcessorFast (#34063) 2024-10-21 09:05:05 -04:00
dialogpt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dinat.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dinov2.md Add Flax Dinov2 (#31960) 2024-08-19 09:28:13 +01:00
distilbert.md Add sdpa for DistilBert (#33724) 2024-10-02 13:55:19 +01:00
dit.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
donut.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dpr.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dpt.md [DPT, Dinov2] Add resources (#27655) 2023-11-23 17:44:08 +00:00
efficientformer.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
efficientnet.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
electra.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
encodec.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
encoder-decoder.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
ernie_m.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
ernie.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
esm.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
falcon_mamba.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
falcon.md Add proper Falcon docs and conversion script (#25954) 2023-09-04 17:18:34 +01:00
falcon3.md Add Falcon3 documentation (#35307) 2024-12-17 14:23:13 +01:00
fastspeech2_conformer.md Super tiny fix 12 typos about "with with" (#29926) 2024-03-29 14:31:31 +00:00
flan-t5.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
flan-ul2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
flaubert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
flava.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
fnet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
focalnet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
fsmt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
funnel.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
fuyu.md Uniformize kwargs for image-text-to-text processors (#32544) 2024-09-24 21:28:19 -04:00
gemma.md Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) 2024-05-20 10:06:57 +02:00
gemma2.md Gemma2: add cache warning (#32279) 2024-08-07 10:03:05 +05:00
git.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
glm.md add Glm (#33823) 2024-10-18 17:41:12 +02:00
glpn.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
gpt_bigcode.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
gpt_neo.md F.scaled_dot_product_attention support (#26572) 2023-12-09 05:38:14 +09:00
gpt_neox_japanese.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
gpt_neox.md [GPT-NeoX] Add SDPA support (#31031) 2024-06-26 13:56:36 +01:00
gpt-sw3.md Fix paths to AI Sweden Models reference and model loading (#28423) 2024-01-15 09:09:22 +01:00
gpt2.md [GPT2] Add SDPA support (#31172) 2024-06-19 09:40:57 +02:00
gptj.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
gptsan-japanese.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
granite.md Granite language models (#31502) 2024-08-27 21:27:21 +02:00
granitemoe.md Granitemoe (#33207) 2024-09-21 01:43:50 +02:00
graphormer.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
grounding-dino.md Fix code snippet for Grounding DINO (#32229) 2024-07-25 19:20:47 +01:00
groupvit.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
herbert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
hiera.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
hubert.md Add sdpa and fa2 the Wav2vec2 family. (#30121) 2024-04-22 18:30:38 +01:00
ibert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
idefics.md Port IDEFICS to tensorflow (#26870) 2024-05-13 15:59:46 +01:00
idefics2.md [docs] Fix FlashAttention link (#35171) 2024-12-10 11:36:25 -08:00
idefics3.md Add Aria (#34157) 2024-12-06 12:17:34 +01:00
ijepa.md [I-JEPA] Update docs (#35148) 2024-12-09 10:01:31 +01:00
imagegpt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
informer.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
instructblip.md VLMs: patch_size -> num_image_tokens in processing (#33424) 2024-11-18 13:21:07 +01:00
instructblipvideo.md VLMs: patch_size -> num_image_tokens in processing (#33424) 2024-11-18 13:21:07 +01:00
jamba.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
jetmoe.md Add JetMoE model (#30005) 2024-05-14 16:32:01 +02:00
jukebox.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
kosmos-2.md [KOSMOS-2] Update docs (#27157) 2023-10-30 21:42:19 +01:00
layoutlm.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
layoutlmv2.md [Docs] Add language identifiers to fenced code blocks (#28955) 2024-02-12 10:48:31 -08:00
layoutlmv3.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
layoutxlm.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
led.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
levit.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
lilt.md [Docs] Add language identifiers to fenced code blocks (#28955) 2024-02-12 10:48:31 -08:00
llama.md Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) 2024-05-20 10:06:57 +02:00
llama2.md Fix FA2 integration (#28142) 2023-12-20 14:25:07 +05:30
llama3.md Docs - update formatting of llama3 model card (#33438) 2024-09-12 11:24:56 +02:00
llava_next_video.md [docs] Fix FlashAttention link (#35171) 2024-12-10 11:36:25 -08:00
llava_next.md VLMs: patch_size -> num_image_tokens in processing (#33424) 2024-11-18 13:21:07 +01:00
llava_onevision.md [Docs] Improve VLM docs (#33393) 2024-10-07 09:54:07 +02:00
llava.md Fix remove unused parameter in docs (#35306) 2024-12-17 09:34:41 -08:00
longformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
longt5.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
luke.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
lxmert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
m2m_100.md Add SDPA support for M2M100 (#33309) 2024-09-25 18:04:42 +01:00
madlad-400.md Add madlad-400 MT models (#27471) 2023-11-28 13:19:50 +00:00
mamba.md Trainer - deprecate tokenizer for processing_class (#32385) 2024-10-02 14:08:46 +01:00
mamba2.md quickfix documentation (#32566) 2024-08-26 17:49:44 +02:00
marian.md Mention model_info.id instead of model_info.modelId (#32106) 2024-07-22 14:14:47 +01:00
markuplm.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
mask2former.md Instance segmentation examples (#31084) 2024-05-31 16:56:17 +01:00
maskformer.md Instance segmentation examples (#31084) 2024-05-31 16:56:17 +01:00
matcha.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
mbart.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
mctct.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mega.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
megatron_gpt2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
megatron-bert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mgp-str.md [Docs] Fix broken links and syntax issues (#28918) 2024-02-08 14:13:35 -08:00
mimi.md Moshi integration (#33624) 2024-10-16 11:21:49 +02:00
mistral.md [docs] Fix FlashAttention link (#35171) 2024-12-10 11:36:25 -08:00
mixtral.md [docs] Fix FlashAttention link (#35171) 2024-12-10 11:36:25 -08:00
mllama.md Mllama: update docs (#34334) 2024-10-30 10:11:50 +01:00
mluke.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mms.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
mobilebert.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
mobilenet_v1.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mobilenet_v2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mobilevit.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mobilevitv2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
modernbert.md Add ModernBERT to Transformers (#35158) 2024-12-19 14:03:35 +01:00
moshi.md Moshi integration (#33624) 2024-10-16 11:21:49 +02:00
mpnet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mpt.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
mra.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mt5.md Adding [T5/MT5/UMT5]ForTokenClassification (#28443) 2024-02-01 03:53:49 +01:00
musicgen_melody.md Add MusicGen Melody (#28819) 2024-03-18 13:06:12 +00:00
musicgen.md [Docs] Add language identifiers to fenced code blocks (#28955) 2024-02-12 10:48:31 -08:00
mvp.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
myt5.md [WIP] Add Tokenizer for MyT5 Model (#31286) 2024-10-06 10:33:16 +02:00
nat.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
nemotron.md Add Nemotron HF Support (#31699) 2024-08-06 15:42:05 +02:00
nezha.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
nllb-moe.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
nllb.md Add SDPA support for M2M100 (#33309) 2024-09-25 18:04:42 +01:00
nougat.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
nystromformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
olmo.md Add OLMo model family (#29890) 2024-04-17 17:59:07 +02:00
olmo2.md Rename OLMo November to OLMo2 (#34864) 2024-11-25 16:31:22 +01:00
olmoe.md Add paper link (#33305) 2024-09-05 15:49:28 +02:00
omdet-turbo.md Fix docs and docstrings Omdet-Turbo (#33726) 2024-09-26 12:18:23 -04:00
oneformer.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
open-llama.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
openai-gpt.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
opt.md add sdpa to OPT (#33298) 2024-10-10 11:49:34 +02:00
owlv2.md Fix OWLv2 Doc (#30794) 2024-05-14 08:36:11 +02:00
owlvit.md Update bounding box format everywhere (#27944) 2023-12-11 18:03:42 +00:00
paligemma.md Paligemma support for multi-image (#33447) 2024-09-27 11:23:14 +02:00
patchtsmixer.md [Docs] Add resources (#28705) 2024-02-19 15:22:29 +01:00
patchtst.md [Docs] Add resources (#28705) 2024-02-19 15:22:29 +01:00
pegasus_x.md [Docs] Fix broken links and syntax issues (#28918) 2024-02-08 14:13:35 -08:00
pegasus.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
perceiver.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
persimmon.md Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) 2024-05-20 10:06:57 +02:00
phi.md Fix doctest more (for docs/source/en) (#30247) 2024-04-15 14:10:59 +02:00
phi3.md phi3 chat_template does not support system role (#30606) 2024-05-02 15:30:21 +02:00
phimoe.md PhiMoE (#33363) 2024-10-04 21:39:45 +02:00
phobert.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
pix2struct.md 🌐 [i18n-ZH] Translate chat_templating.md into Chinese (#28790) 2024-02-26 08:42:24 -08:00
pixtral.md Add optimized PixtralImageProcessorFast (#34836) 2024-11-28 16:04:05 +01:00
plbart.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
poolformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
pop2piano.md [Docs] Add language identifiers to fenced code blocks (#28955) 2024-02-12 10:48:31 -08:00
prophetnet.md chore: remove duplicate words (#31853) 2024-07-09 10:38:29 +01:00
pvt_v2.md Add PvT-v2 Model (#26812) 2024-03-13 19:05:20 +00:00
pvt.md [Docs] Fix broken links and syntax issues (#28918) 2024-02-08 14:13:35 -08:00
qdqbert.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
qwen2_audio.md Add Qwen2-Audio (#32137) 2024-08-08 15:47:24 +02:00
qwen2_moe.md Mistral-related models for QnA (#34045) 2024-10-14 08:53:32 +02:00
qwen2_vl.md [Docs] Improve VLM docs (#33393) 2024-10-07 09:54:07 +02:00
qwen2.md Mistral-related models for QnA (#34045) 2024-10-14 08:53:32 +02:00
rag.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
realm.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
recurrent_gemma.md [Docs] Update recurrent_gemma.md for some minor nits (#30238) 2024-04-15 18:30:59 +02:00
reformer.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
regnet.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
rembert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
resnet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
retribert.md Deprecate models (#24787) 2023-07-13 11:46:54 -04:00
roberta-prelayernorm.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
roberta.md [RoBERTa] Minor clarifications to model doc (#31949) 2024-07-22 10:08:27 -07:00
roc_bert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
roformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
rt_detr.md Fix post process function called in the instance segmentation example of mask2former (#34588) 2024-11-19 16:49:25 +01:00
rwkv.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
sam.md [Docs] Add Developer Guide: How to Hack Any Transformers Model (#33979) 2024-10-07 10:08:20 +02:00
seamless_m4t_v2.md [Seamless] Fix links in docs (#27905) 2023-12-14 15:14:13 +00:00
seamless_m4t.md [Seamless] Fix links in docs (#27905) 2023-12-14 15:14:13 +00:00
segformer.md Decorators for deprecation and named arguments validation (#30799) 2024-06-10 12:35:10 +01:00
seggpt.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
sew-d.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
sew.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
siglip.md Make siglip examples clearer and error free (#33667) 2024-09-27 10:33:55 +02:00
speech_to_text_2.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
speech_to_text.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
speech-encoder-decoder.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
speecht5.md add generate method to SpeechT5ForTextToSpeech (#25233) 2023-08-03 14:12:07 +01:00
splinter.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
squeezebert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
stablelm.md Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) 2024-05-20 10:06:57 +02:00
starcoder2.md Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) 2024-05-20 10:06:57 +02:00
superpoint.md 🚨🚨🚨 [SuperPoint] Fix keypoint coordinate output and add post processing (#33200) 2024-10-29 09:36:03 +00:00
swiftformer.md Add TF swiftformer (#23342) 2024-04-19 18:31:43 +01:00
swin.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
swin2sr.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
swinv2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
switch_transformers.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
t5.md Fix doctest more (for docs/source/en) (#30247) 2024-04-15 14:10:59 +02:00
t5v1.1.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
table-transformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
tapas.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
tapex.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
time_series_transformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
timesformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
timm_wrapper.md Add TimmWrapper (#34564) 2024-12-11 12:40:30 +00:00
trajectory_transformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
transfo-xl.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
trocr.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
tvlt.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
tvp.md Update TVP arxiv link (#27672) 2023-11-23 17:02:16 +00:00
udop.md [UDOP] Improve docs, add resources (#29571) 2024-04-10 16:02:50 +02:00
ul2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
umt5.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
unispeech-sat.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
unispeech.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
univnet.md Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration (#24799) 2023-11-22 17:21:36 +01:00
upernet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
van.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
video_llava.md [docs] Fix FlashAttention link (#35171) 2024-12-10 11:36:25 -08:00
videomae.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
vilt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
vipllava.md Fix typo in code block in vipllava.md (#34957) 2024-11-27 08:19:34 -08:00
vision-encoder-decoder.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
vision-text-dual-encoder.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
visual_bert.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
vit_hybrid.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
vit_mae.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
vit_msn.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
vit.md Fast image processor (#28847) 2024-06-11 15:47:38 +01:00
vitdet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
vitmatte.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
vits.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
vivit.md Add sdpa for Vivit (#33757) 2024-10-15 11:27:54 +02:00
wav2vec2_phoneme.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
wav2vec2-bert.md Add new meta w2v2-conformer BERT-like model (#28165) 2024-01-18 13:37:34 +00:00
wav2vec2-conformer.md doc: add info about wav2vec2 bert in older wav2vec2 models. (#31120) 2024-06-05 11:56:11 +01:00
wav2vec2.md doc: add info about wav2vec2 bert in older wav2vec2 models. (#31120) 2024-06-05 11:56:11 +01:00
wavlm.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
whisper.md [docs] add quick usage snippet to Whisper. (#31289) 2024-08-27 14:11:52 +02:00
xclip.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
xglm.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
xlm-prophetnet.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
xlm-roberta-xl.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
xlm-roberta.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
xlm-v.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
xlm.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
xlnet.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
xls_r.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
xlsr_wav2vec2.md doc: add info about wav2vec2 bert in older wav2vec2 models. (#31120) 2024-06-05 11:56:11 +01:00
xmod.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
yolos.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
yoso.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
zamba.md Add Zamba (#30950) 2024-10-04 22:28:05 +02:00
zoedepth.md Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550) 2024-10-22 15:50:54 +02:00