transformers/docs/source/en
Arthur 25b7f27234
Add llama4 (#37307)
* remove one of the last deps

* update fast image processor after refactor

* styling

* more quality of life improvements

* nit

* update

* cleanups

* some cleanups

* vllm updates

* update fake image token

* [convert] Fix typo

* [convert] Strip extraneous bytes from shards

* [convert] Minor fixes

* [convert] Use num_experts

* multi-image fixes in modeling + processor

* fixup size

* 128 experts

* Use default rope

* Unfuse mlp

* simplify a lot inputs embeds merging

* remove .item() 👀

* fix from review

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* set seed

* return aspect ratios and bug fixes

* Moe 128 rebased (#8)

* 128 experts

* Use default rope

* Unfuse mlp

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* Meta/llama quant compat (#7)

* add quant compatible model & conversion code for llama4

* fix a few issues

* fix a few issues

* minor type mapping fix

---------

Co-authored-by: Lu Fang <fanglu@fb.com>

* use a new config parameter to determine which model definition to use for MoE

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lu Fang <fanglu@fb.com>

* un-comment write_tokenizer from converting script

* remove un-used imports

* [llama4] Pop aspect_ratios from image processor output in Llama4Processor

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* Fix parameter_count name

* Update src/transformers/models/llama4/configuration_llama4.py

* nit

* Add changes for no_rope, moe_layers, chunked attention. Just need to test all

* Update src/transformers/models/llama4/image_processing_llama4_fast.py

* nit

* fix post merge with main

* support flex attention

* fixes

* fix

* add layer

* small updates

* rebase and delete llm_compressor

* nit

* [llama4/mm] Add back <|image|> token that delimits global tile

* [llama4/mm] Fix Llama 4 image processing unit tests

* add explicit dtype

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* sdpa works

* comment todo small

* fix model loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* revert

* nits

* small fix for TP on 1 node

* Read new params from config

* Add <|eom|>

* lol don't know how this got here

* adding fp8

* Save processor, fix chat template

* style

* Add boi/eoi tokens

We don't use them.

* fixes for now flex seems to work :)

* updates

* nits

* updates

* missking keys

* add context parallel

* update

* update

* fix

* nits

* add worldsize and make eager attn work for vision

* Ignore new key present in base models

* add tp_plan

* fix nope

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* minor fix

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* Clean up Llama4 vision model

* current updates

* add support for `attn_temperature_tuning`

* add floor scale

* add missing attn scales

* push what works, dirty trick for the device synch

* oups

* Fix pad_token_id

See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.

* fix causallml loading

* rm

* fix tied-weights

* fix sdpa

* push current version

* should work with both short and long

* add compressed_tensos & fix fbgemm tp

* Fix flex impl

* style

* chunking

* try to revert the potentially breaking change

* fix auto factory

* fix shapes in general

* rm processing

* commit cache utils cleanup

* Fix context length

* fix

* allocate

* update tp_plan

* fix SDPA!

* Add support for sparse `Llama4TextMoe` layer from the kernel hub

* cleanup

* better merge

* update

* still broken fixing now

* nits

* revert print

* Write max_position_embeddings and max_model_length

* Update modeling_llama4.py

* Save attention_chunk_size

* Sync eos terminators

* Read initializer_range

* style

* remove `dict`

* fix

* eager should use `chunked_attention_mask`

* revert

* fixup

* fix config

* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"

This reverts commit ccda19f050, reversing
changes made to a515579aed.

* Fix typo and remove warning with compiled flex and chunked prefill

* Fix MoE vs FF (#41)

* fix

* Use correct no_rope_layers if provided one is empty list

* update tests

* fix

* skipping some tests

* fix fp8 loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* fix text geneartion pipeline

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* eager needs 4D mask

* fix

* Some cleanup

* fix

* update

* fix

* replace correctly module

* patch

* modulelist

* update

* update

* clean up

* Don't move to `cuda:0` in distributed mode

* restrict to compressed tensors for now

* rm print

* Docs!

* Fixes

* Update docs/source/en/model_doc/llama4.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fixes

* cuda graph fix

* revert some stuff

* fixup

* styling

* Update src/transformers/models/llama4/modeling_llama4.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* commit licence, cleanup here and there and style

* more styling changes

* fix dummies

* fix and clean docstrings

* remove comment

* remove warning

* Only fast image processor is supported

* nit

* trigger CI

* fix issue with flex encoder

* fix dynamic cache

* Code quality

* Code quality

* fix more tests for now

* Code quality

* Code quality

* Nuke bunch of failing stuff

* Code quality

* Code quality

* cleanup removal of slow image processor

* ruff fix fast image processor

* fix

* fix styling

* Docs

* Repo consistency

* Repo consistency

* fix sliding window issue

* separate llama cache

* styling

* Repo consistency

* Repo consistency

* push waht works

* L4 Repo consistency

* Docs

* fix last last alst alst alst alstsaltlsltlaslt

---------

Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: Jon Swenson <jmswen@gmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Co-authored-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-05 22:02:22 +02:00
..
internal [RoPE] abstract dynamic RoPE update under a decorator (#37249) 2025-04-04 14:27:28 +01:00
main_classes Support loading Quark quantized models in Transformers (#36372) 2025-03-20 15:40:51 +01:00
model_doc Add llama4 (#37307) 2025-04-05 22:02:22 +02:00
quantization [doc] Fix link for Quark quantization page (#37179) 2025-04-01 20:57:38 +02:00
tasks [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
_config.py Add optimized PixtralImageProcessorFast (#34836) 2024-11-28 16:04:05 +01:00
_redirects.yml Docs / Quantization: Redirect deleted page (#31063) 2024-05-28 18:29:22 +02:00
_toctree.yml Add llama4 (#37307) 2025-04-05 22:02:22 +02:00
accelerate.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
add_new_model.md Add support for fast image processors in add-new-model-like CLI (#36313) 2025-03-13 14:16:37 -04:00
add_new_pipeline.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
agents.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
attention_interface.md Fix AttentionInterface following feedback (#37010) 2025-03-28 18:00:35 +01:00
attention.md [Docs] Fix broken links and syntax issues (#28918) 2024-02-08 14:13:35 -08:00
backbones.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cache_explanation.md Fix typos (#36910) 2025-03-24 14:08:29 +00:00
chat_extras.md Update chat_extras.md with content correction (#36599) 2025-03-07 13:09:02 +00:00
chat_templating_multimodal.md Fix typos (#36910) 2025-03-24 14:08:29 +00:00
chat_templating_writing.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
chat_templating.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
community.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
conversations.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
custom_models.md Fix typos (#36910) 2025-03-24 14:08:29 +00:00
debugging.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deepspeed.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
executorch.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fast_tokenizers.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
feature_extractors.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fsdp.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
generation_features.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
generation_strategies.md fix typos in the docs directory (#36639) 2025-03-11 09:41:41 -07:00
gguf.md Fix gguf docs (#36601) 2025-03-11 15:29:14 +01:00
glossary.md Fix typos (#31819) 2024-07-08 11:52:47 +01:00
gpu_selection.md Fix typos (#36910) 2025-03-24 14:08:29 +00:00
how_to_hack_models.md fix typos in the docs directory (#36639) 2025-03-11 09:41:41 -07:00
hpo_train.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
image_processors.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
index.md Adding Qwen3 and Qwen3MoE (#36878) 2025-03-31 09:50:49 +02:00
installation.md Update installation.md (#36826) 2025-03-21 16:32:02 -07:00
kv_cache.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llm_optims.md [CI] green llama tests (#37244) 2025-04-03 14:15:53 +01:00
llm_tutorial_optimization.md fix typos in the docs directory (#36639) 2025-03-11 09:41:41 -07:00
llm_tutorial.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
model_memory_anatomy.md Enable BNB multi-backend support (#31098) 2024-09-24 03:40:56 -06:00
model_sharing.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
model_summary.md model_summary.md - Restore link to Harvard's Annotated Transformer. (#29702) 2024-03-23 18:29:39 -07:00
models.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
modular_transformers.md Support custom dosctrings in modular (#36726) 2025-03-18 14:00:54 -04:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
optimizers.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pad_truncation.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
peft.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_hardware.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
perf_infer_cpu.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_infer_gpu_multi.md enable tp on CPU (#36299) 2025-03-31 10:55:47 +02:00
perf_infer_gpu_one.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_torch_compile.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_train_cpu_many.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_train_cpu.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_train_gpu_many.md Mention UltraScale Playbook 🌌 in docs (#36589) 2025-03-06 14:48:11 -08:00
perf_train_gpu_one.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_train_special.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_train_tpu_tf.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perplexity.md [docs] use device-agnostic API instead of cuda (#34913) 2024-11-26 09:23:34 -08:00
philosophy.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
pipeline_gradio.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pipeline_tutorial.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
pipeline_webserver.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pr_checks.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
processors.md [docs] Fix image link (#36869) 2025-03-25 11:34:21 -07:00
quicktour.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
run_scripts.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
serialization.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
serving.md [docs] Serving LLMs (#36522) 2025-03-10 13:14:19 -07:00
task_summary.md [doctest] Fixes (#35863) 2025-01-26 15:26:38 -08:00
tasks_explained.md fix: Wrong task mentioned in docs (#34757) 2024-11-18 18:42:28 +00:00
testing.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
tf_xla.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tflite.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tokenizer_summary.md [docs] Spanish translation of tokenizer_summary.md (#31154) 2024-06-03 16:52:23 -07:00
tools.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
torchscript.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
trainer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
training.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
troubleshooting.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00