transformers/docs/source/en/model_doc
Minho Ryu eca74d1367
[WIP] add deepseek-v3 (#35926)
* init commit

* style

* take comments into account

* add deepseekv3 modeling

* remove redundant code

* apply make style

* apply fix-copies

* make format

* add init files

* rename deepseekv3 into deepseek_v3 based on its model_type

* rename deepseekv3 into deepseek_v3 based on its model_type

* deepseek-v3 not deepseek_v3

* set model_type as deepseek_v3

* use default docs

* apply make

* fill type and docstring

* add rope_config_validation

* use custom DeepseekV3MLP

* hold code only for checkpoints congifuration; remove redundant

* revise rope yarn for DeepSeek variation

* rename DeepSeek-V3

* some refactoring

* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral

* fix attention forward

* use -1 for not-changing dim when to use exapnd

* refactor DeepseekV3TopkRouter

* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim

* register pre_hook and hook both

* make style

* use n_shared_experts

* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add test file

* update modeling_file according to modular file

* make style

* add mapping for DeepseekV3ForSequenceClassification

* remove aux_loss_alpha

* add deepseek_v3 for perf

* add deepseek_v3

* rename test as deepseekv3

* use tiny-deepseek-v3

* remove DeepseekV3ForSequenceClassification

* cache before padding

* remote output_router_logits

* Revert "remote output_router_logits"

This reverts commit f264f800d0.

* remove output_router_logits

* make e_score_correction_bias as buffer

* skip tests not compatible

* make style

* make e_score_correction_bias as buffer

* use rope_interleave instead of load_hook

* skip tests not compatible with MLA

* add doc for rope_interleave

* fix typo

* remove torch.no_grad for selecting topk

* fix post merge issue

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* support TP better

* stash

* changes currently requires

* remove synch

* more fixes for TP

* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used

* updates to have generation work!

* push most of the changes

* reorder functions + call for contributions!

* update readme

* nits

* update

* ruff was updated on main

* merge with main and fix copies

* revert unrelated changes

* route all tokens to all experts when testing to avoid no gradient iddues

* finish fixing all tests

* fixup

* nit

* clean config

* last readme changes

* nit

* do cnit

* typo

* last nit

* one more one more

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
2025-03-28 15:56:59 +01:00
..
albert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
align.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
altclip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
aria.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
audio-spectrogram-transformer.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
auto.md Add auto model for image-text-to-text (#32472) 2024-10-08 14:26:43 +02:00
autoformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
aya_vision.md Add aya (#36521) 2025-03-04 12:24:33 +01:00
bamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bark.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bart.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
barthez.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bartpho.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
beit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert-generation.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
bertweet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
big_bird.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bigbird_pegasus.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
biogpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blenderbot-small.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blenderbot.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blip-2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
blip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bloom.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bort.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bridgetower.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bros.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
byt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
camembert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
canine.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
chameleon.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
chinese_clip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
clap.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
clip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
clipseg.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
clvp.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
code_llama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
codegen.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cohere.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cohere2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
colpali.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
conditional_detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
convbert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
convnext.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
convnextv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cpm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cpmant.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ctrl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cvt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dab-detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dac.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
data2vec.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dbrx.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deberta-v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deberta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
decision_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deepseek_v3.md [WIP] add deepseek-v3 (#35926) 2025-03-28 15:56:59 +01:00
deformable_detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deit.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
deplot.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
depth_anything_v2.md Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550) 2024-10-22 15:50:54 +02:00
depth_anything.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
depth_pro.md fix typos in the docs directory (#36639) 2025-03-11 09:41:41 -07:00
deta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dialogpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
diffllama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dinat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dinov2_with_registers.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
dinov2.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
distilbert.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
dit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
donut.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dpr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
dpt.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
efficientformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
efficientnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
electra.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
emu3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
encodec.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
encoder-decoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ernie_m.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ernie.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
esm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
falcon_mamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
falcon.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
falcon3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fastspeech2_conformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flan-t5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flan-ul2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flaubert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
focalnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fsmt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
funnel.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fuyu.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gemma.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gemma2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gemma3.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
git.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
glm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
glpn.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
got_ocr2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_bigcode.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neox_japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neox.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt-sw3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt2.md [GPT2] Add SDPA support (#31172) 2024-06-19 09:40:57 +02:00
gptj.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gptsan-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granite.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granitemoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granitemoeshared.md add shared experts for upcoming Granite 4.0 language models (#35894) 2025-02-14 16:55:28 +01:00
granitevision.md Update Granite Vision Model Path / Tests (#35998) 2025-02-03 20:06:03 +01:00
graphormer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
grounding-dino.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
groupvit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
helium.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
herbert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
hiera.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
hubert.md [MINOR:TYPO] Update hubert.md (#36733) 2025-03-17 09:07:51 -07:00
ibert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ijepa.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
imagegpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
informer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
instructblip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
instructblipvideo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
jamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
jetmoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
jukebox.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
kosmos-2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
layoutlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
layoutlmv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
layoutlmv3.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
layoutxlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
led.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
levit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
lilt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llama.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
llama2.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
llama3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llava_next_video.md fix typos in the docs directory (#36639) 2025-03-11 09:41:41 -07:00
llava_next.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llava_onevision.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
longformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
longt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
luke.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
lxmert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
m2m_100.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
madlad-400.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mamba2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
marian.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
markuplm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mask2former.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
maskformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
matcha.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mbart.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mctct.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mega.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
megatron_gpt2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
megatron-bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mgp-str.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mimi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mistral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mistral3.md Fix Mistral3 tests (#36797) 2025-03-18 13:08:12 -04:00
mixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mllama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mluke.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mms.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilebert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilenet_v1.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilenet_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilevit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mobilevitv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
modernbert.md Support QuestionAnswering Module for ModernBert based models. (#35566) 2025-03-26 21:24:18 +01:00
moonshine.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
moshi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mpnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mra.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
musicgen_melody.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
musicgen.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mvp.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
myt5.md [WIP] Add Tokenizer for MyT5 Model (#31286) 2024-10-06 10:33:16 +02:00
nat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nemotron.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nezha.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nllb-moe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nllb.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nougat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
nystromformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
olmo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
olmo2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
olmoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
omdet-turbo.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
oneformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
open-llama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
openai-gpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
opt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
owlv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
owlvit.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
paligemma.md [docs] Attention mask image (#36970) 2025-03-26 10:11:34 -07:00
patchtsmixer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
patchtst.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pegasus_x.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pegasus.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
perceiver.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
persimmon.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi4_multimodal.md Add Phi4 multimodal (#36939) 2025-03-25 09:55:21 +01:00
phimoe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phobert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pix2struct.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
plbart.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
poolformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pop2piano.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
prompt_depth_anything.md Add Prompt Depth Anything Model (#35401) 2025-03-20 16:12:44 +00:00
prophetnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pvt_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pvt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
qdqbert.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
qwen2_5_vl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
qwen2_audio.md [qwen2 audio] remove redundant code and update docs (#36282) 2025-03-20 10:54:51 +00:00
qwen2_moe.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
qwen2_vl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
qwen2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rag.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
realm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
recurrent_gemma.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
reformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
regnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rembert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
resnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
retribert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roberta-prelayernorm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roberta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roc_bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rt_detr_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rt_detr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
rwkv.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sam.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
seamless_m4t_v2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
seamless_m4t.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
segformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
seggpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sew-d.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sew.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
shieldgemma2.md Shieldgemma2 (#36678) 2025-03-20 15:14:38 +01:00
siglip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
siglip2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
smolvlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
speech_to_text_2.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
speech_to_text.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
speech-encoder-decoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
speecht5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
splinter.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
squeezebert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
stablelm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
starcoder2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
superglue.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
superpoint.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swiftformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swin.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swin2sr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
swinv2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
switch_transformers.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
t5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
t5v1.1.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
table-transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tapas.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tapex.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
textnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
time_series_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
timesformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
timm_wrapper.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
trajectory_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
transfo-xl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
trocr.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tvlt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tvp.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
udop.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
ul2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
umt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
unispeech-sat.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
unispeech.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
univnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
upernet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
van.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
video_llava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
videomae.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
vilt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vipllava.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vision-encoder-decoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vision-text-dual-encoder.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
visual_bert.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
vit_hybrid.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vit_mae.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
vit_msn.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
vit.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
vitdet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vitmatte.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vitpose.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vits.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
vivit.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
wav2vec2_phoneme.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2-bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2-conformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wavlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
whisper.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
xclip.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xglm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-prophetnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-roberta-xl.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-roberta.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm-v.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlnet.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xls_r.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xlsr_wav2vec2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
xmod.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
yolos.md Refactor Attention implementation for ViT-based models (#36545) 2025-03-20 15:15:01 +00:00
yoso.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zamba2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zoedepth.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00