Arthur
|
25b7f27234
|
Add llama4 (#37307)
* remove one of the last deps
* update fast image processor after refactor
* styling
* more quality of life improvements
* nit
* update
* cleanups
* some cleanups
* vllm updates
* update fake image token
* [convert] Fix typo
* [convert] Strip extraneous bytes from shards
* [convert] Minor fixes
* [convert] Use num_experts
* multi-image fixes in modeling + processor
* fixup size
* 128 experts
* Use default rope
* Unfuse mlp
* simplify a lot inputs embeds merging
* remove .item() 👀
* fix from review
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* set seed
* return aspect ratios and bug fixes
* Moe 128 rebased (#8)
* 128 experts
* Use default rope
* Unfuse mlp
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* Meta/llama quant compat (#7)
* add quant compatible model & conversion code for llama4
* fix a few issues
* fix a few issues
* minor type mapping fix
---------
Co-authored-by: Lu Fang <fanglu@fb.com>
* use a new config parameter to determine which model definition to use for MoE
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lu Fang <fanglu@fb.com>
* un-comment write_tokenizer from converting script
* remove un-used imports
* [llama4] Pop aspect_ratios from image processor output in Llama4Processor
Signed-off-by: Jon Swenson <jmswen@gmail.com>
* Fix parameter_count name
* Update src/transformers/models/llama4/configuration_llama4.py
* nit
* Add changes for no_rope, moe_layers, chunked attention. Just need to test all
* Update src/transformers/models/llama4/image_processing_llama4_fast.py
* nit
* fix post merge with main
* support flex attention
* fixes
* fix
* add layer
* small updates
* rebase and delete llm_compressor
* nit
* [llama4/mm] Add back <|image|> token that delimits global tile
* [llama4/mm] Fix Llama 4 image processing unit tests
* add explicit dtype
Signed-off-by: Jon Swenson <jmswen@gmail.com>
* sdpa works
* comment todo small
* fix model loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* revert
* nits
* small fix for TP on 1 node
* Read new params from config
* Add <|eom|>
* lol don't know how this got here
* adding fp8
* Save processor, fix chat template
* style
* Add boi/eoi tokens
We don't use them.
* fixes for now flex seems to work :)
* updates
* nits
* updates
* missking keys
* add context parallel
* update
* update
* fix
* nits
* add worldsize and make eager attn work for vision
* Ignore new key present in base models
* add tp_plan
* fix nope
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* minor fix
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* Clean up Llama4 vision model
* current updates
* add support for `attn_temperature_tuning`
* add floor scale
* add missing attn scales
* push what works, dirty trick for the device synch
* oups
* Fix pad_token_id
See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.
* fix causallml loading
* rm
* fix tied-weights
* fix sdpa
* push current version
* should work with both short and long
* add compressed_tensos & fix fbgemm tp
* Fix flex impl
* style
* chunking
* try to revert the potentially breaking change
* fix auto factory
* fix shapes in general
* rm processing
* commit cache utils cleanup
* Fix context length
* fix
* allocate
* update tp_plan
* fix SDPA!
* Add support for sparse `Llama4TextMoe` layer from the kernel hub
* cleanup
* better merge
* update
* still broken fixing now
* nits
* revert print
* Write max_position_embeddings and max_model_length
* Update modeling_llama4.py
* Save attention_chunk_size
* Sync eos terminators
* Read initializer_range
* style
* remove `dict`
* fix
* eager should use `chunked_attention_mask`
* revert
* fixup
* fix config
* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"
This reverts commit ccda19f050 , reversing
changes made to a515579aed .
* Fix typo and remove warning with compiled flex and chunked prefill
* Fix MoE vs FF (#41)
* fix
* Use correct no_rope_layers if provided one is empty list
* update tests
* fix
* skipping some tests
* fix fp8 loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* fix text geneartion pipeline
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* eager needs 4D mask
* fix
* Some cleanup
* fix
* update
* fix
* replace correctly module
* patch
* modulelist
* update
* update
* clean up
* Don't move to `cuda:0` in distributed mode
* restrict to compressed tensors for now
* rm print
* Docs!
* Fixes
* Update docs/source/en/model_doc/llama4.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fixes
* cuda graph fix
* revert some stuff
* fixup
* styling
* Update src/transformers/models/llama4/modeling_llama4.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* commit licence, cleanup here and there and style
* more styling changes
* fix dummies
* fix and clean docstrings
* remove comment
* remove warning
* Only fast image processor is supported
* nit
* trigger CI
* fix issue with flex encoder
* fix dynamic cache
* Code quality
* Code quality
* fix more tests for now
* Code quality
* Code quality
* Nuke bunch of failing stuff
* Code quality
* Code quality
* cleanup removal of slow image processor
* ruff fix fast image processor
* fix
* fix styling
* Docs
* Repo consistency
* Repo consistency
* fix sliding window issue
* separate llama cache
* styling
* Repo consistency
* Repo consistency
* push waht works
* L4 Repo consistency
* Docs
* fix last last alst alst alst alstsaltlsltlaslt
---------
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: Jon Swenson <jmswen@gmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Co-authored-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2025-04-05 22:02:22 +02:00 |
|