jiqing-feng
99f9f1042f
Fix torchao usage ( #37034 )
...
* fix load path
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix path
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Fix torchao usage
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert useless change
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert fp8 test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix fp8 test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix fp8 test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torch dtype
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-07 14:50:48 +02:00
Arthur
25b7f27234
Add llama4 ( #37307 )
...
* remove one of the last deps
* update fast image processor after refactor
* styling
* more quality of life improvements
* nit
* update
* cleanups
* some cleanups
* vllm updates
* update fake image token
* [convert] Fix typo
* [convert] Strip extraneous bytes from shards
* [convert] Minor fixes
* [convert] Use num_experts
* multi-image fixes in modeling + processor
* fixup size
* 128 experts
* Use default rope
* Unfuse mlp
* simplify a lot inputs embeds merging
* remove .item() 👀
* fix from review
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* set seed
* return aspect ratios and bug fixes
* Moe 128 rebased (#8 )
* 128 experts
* Use default rope
* Unfuse mlp
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* Meta/llama quant compat (#7 )
* add quant compatible model & conversion code for llama4
* fix a few issues
* fix a few issues
* minor type mapping fix
---------
Co-authored-by: Lu Fang <fanglu@fb.com>
* use a new config parameter to determine which model definition to use for MoE
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lu Fang <fanglu@fb.com>
* un-comment write_tokenizer from converting script
* remove un-used imports
* [llama4] Pop aspect_ratios from image processor output in Llama4Processor
Signed-off-by: Jon Swenson <jmswen@gmail.com>
* Fix parameter_count name
* Update src/transformers/models/llama4/configuration_llama4.py
* nit
* Add changes for no_rope, moe_layers, chunked attention. Just need to test all
* Update src/transformers/models/llama4/image_processing_llama4_fast.py
* nit
* fix post merge with main
* support flex attention
* fixes
* fix
* add layer
* small updates
* rebase and delete llm_compressor
* nit
* [llama4/mm] Add back <|image|> token that delimits global tile
* [llama4/mm] Fix Llama 4 image processing unit tests
* add explicit dtype
Signed-off-by: Jon Swenson <jmswen@gmail.com>
* sdpa works
* comment todo small
* fix model loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* revert
* nits
* small fix for TP on 1 node
* Read new params from config
* Add <|eom|>
* lol don't know how this got here
* adding fp8
* Save processor, fix chat template
* style
* Add boi/eoi tokens
We don't use them.
* fixes for now flex seems to work :)
* updates
* nits
* updates
* missking keys
* add context parallel
* update
* update
* fix
* nits
* add worldsize and make eager attn work for vision
* Ignore new key present in base models
* add tp_plan
* fix nope
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* minor fix
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* Clean up Llama4 vision model
* current updates
* add support for `attn_temperature_tuning`
* add floor scale
* add missing attn scales
* push what works, dirty trick for the device synch
* oups
* Fix pad_token_id
See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.
* fix causallml loading
* rm
* fix tied-weights
* fix sdpa
* push current version
* should work with both short and long
* add compressed_tensos & fix fbgemm tp
* Fix flex impl
* style
* chunking
* try to revert the potentially breaking change
* fix auto factory
* fix shapes in general
* rm processing
* commit cache utils cleanup
* Fix context length
* fix
* allocate
* update tp_plan
* fix SDPA!
* Add support for sparse `Llama4TextMoe` layer from the kernel hub
* cleanup
* better merge
* update
* still broken fixing now
* nits
* revert print
* Write max_position_embeddings and max_model_length
* Update modeling_llama4.py
* Save attention_chunk_size
* Sync eos terminators
* Read initializer_range
* style
* remove `dict`
* fix
* eager should use `chunked_attention_mask`
* revert
* fixup
* fix config
* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"
This reverts commit ccda19f050
, reversing
changes made to a515579aed
.
* Fix typo and remove warning with compiled flex and chunked prefill
* Fix MoE vs FF (#41 )
* fix
* Use correct no_rope_layers if provided one is empty list
* update tests
* fix
* skipping some tests
* fix fp8 loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* fix text geneartion pipeline
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
* eager needs 4D mask
* fix
* Some cleanup
* fix
* update
* fix
* replace correctly module
* patch
* modulelist
* update
* update
* clean up
* Don't move to `cuda:0` in distributed mode
* restrict to compressed tensors for now
* rm print
* Docs!
* Fixes
* Update docs/source/en/model_doc/llama4.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fixes
* cuda graph fix
* revert some stuff
* fixup
* styling
* Update src/transformers/models/llama4/modeling_llama4.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* commit licence, cleanup here and there and style
* more styling changes
* fix dummies
* fix and clean docstrings
* remove comment
* remove warning
* Only fast image processor is supported
* nit
* trigger CI
* fix issue with flex encoder
* fix dynamic cache
* Code quality
* Code quality
* fix more tests for now
* Code quality
* Code quality
* Nuke bunch of failing stuff
* Code quality
* Code quality
* cleanup removal of slow image processor
* ruff fix fast image processor
* fix
* fix styling
* Docs
* Repo consistency
* Repo consistency
* fix sliding window issue
* separate llama cache
* styling
* Repo consistency
* Repo consistency
* push waht works
* L4 Repo consistency
* Docs
* fix last last alst alst alst alstsaltlsltlaslt
---------
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: Jon Swenson <jmswen@gmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Co-authored-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-05 22:02:22 +02:00
Rahul Tuli
ebe47ce3e9
Fix: Unexpected Keys, Improve run_compressed
, Rename Test Folder ( #37077 )
2025-04-04 21:30:11 +02:00
byi8220
a4e55fcff8
Disable delay_optimizer_creation in Trainer
to support fsdp2 ( #37147 )
...
* github why you do this
* fix
* make fixup
* disable cpu offload test
* fixup
* tmp reworks
* git branch movement
* make fixup
* add require_fsdp_v2_version
* dep issues
* update ruff and fixup
2025-04-04 20:11:37 +02:00
Matt
8ebc435267
Fix llava_onevision tests ( #37280 )
...
* Fix llava_onevision tests
* Trigger tests
2025-04-04 15:03:38 +01:00
Joao Gante
acbcb5d07d
[Tests] flaky test_constrained_beam_search_generate_dict_output
( #37276 )
2025-04-04 13:38:42 +01:00
cyyever
edd345b52e
Fix deprecated PT functions ( #37237 )
...
* Fix deprecated PT functions
Signed-off-by: cyy <cyyever@outlook.com>
* Revert some changes
Signed-off-by: cyy <cyyever@outlook.com>
---------
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-04 12:31:11 +01:00
Nikos Antoniou
f74d7da836
Introduce modular files for speech models ( #35902 )
...
* WAV_2_VEC_2 to WAV2VEC2
* added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio
* remove unnessary definitions in modulars
* added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer
* docstring fix for UniSpeechForCTC
* removed unneccessary re-definition of modular classes
* reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput
* top-level import of deepspeed in seamless_m4t, speecht5
* avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations
* convert modular
* tiny modular typing fixes
* some more modular fixes
* make style
---------
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-04-04 11:46:27 +02:00
Raushan Turganbay
41b9b92b52
[qwen-vl] fix image processor ( #37258 )
...
* fix
* add test
2025-04-03 19:48:56 +02:00
Matt
2d46a08b63
Purge unused ModelTester code ( #37085 )
...
* Purge correctly this time
* Remove more methods from recent PRs
* make fixup
2025-04-03 17:48:35 +01:00
Joao Gante
9a1c1fe7ed
[CI] green llama tests ( #37244 )
...
* green llama tests
* use cleanup instead
* better test comment; cleanup upgrade
* better test comment; cleanup upgrade
2025-04-03 14:15:53 +01:00
Matt
782d7d945d
Allow flexible generation params arg when checking pipeline specs ( #37211 )
...
* Allow flexible generation params arg
* Trigger tests
* Add docstring and rename js_generate to hub_generate
2025-04-03 13:29:36 +01:00
Yao Matrix
f697b3f824
enable 2 types of case on XPU ( #37198 )
...
enable 2 types of case on XPU 1. test_resize_tokens_embeddings_with_deepspeed_multi_gpu 2. test_resize_embeddings_untied_with_deepspeed_multi_gpu
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-03 11:37:55 +02:00
Joao Gante
2099287a59
[CI] lazy loading external datasets ( #37218 )
2025-04-03 09:57:45 +01:00
Fanli Lin
a0803a9555
[tests] fix mamba integration simple inference precision issue ( #37193 )
...
* fix precision issue
* use float32
2025-04-03 10:38:03 +02:00
Cyril Vallez
6ce238fe7a
Fix test ( #37213 )
...
* Update test_modeling_common.py
* style
2025-04-03 10:24:34 +02:00
Matt
3d133cc557
Stop DOSing the Hub in the CI ( #37209 )
...
* As the title suggests, stop hammering the same files
* make fixup
* Use shutil instead of pathlib
2025-04-02 17:19:33 +01:00
Joao Gante
e90d55ebcc
[Tests] add min_new_tokens
to prevent flaky length checks ( #37175 )
2025-04-02 15:24:00 +01:00
Matt
cbfa14823b
No more dtype_byte_size() ( #37144 )
...
* No more dtype_byte_size()
* Remove function once again
* Fix rebase cruft
* Trigger tests
2025-04-02 14:58:38 +01:00
Yih-Dar
adfc91cd46
Try to avoid/reduce some remaining CI job failures ( #37202 )
...
* try
* try
* Update tests/pipelines/test_pipelines_video_classification.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-02 14:39:57 +02:00
Xavier Dupré
6f5dc9c82e
Fixes DynamicCache export issues due to control flow and inplace modifications ( #36652 )
...
* Remove unnecessary masked_fill in deberta models
* Enable some code when exporting but not compiling
* add missing import
* style
* replace if by torch.cond
* style
* use numel
* style
* add unit tests
* style
* change empty value for dynamic cache
* replace != [] by numel()
* fix import issue
* style
2025-04-02 12:04:40 +01:00
Jerry Zhang
a165458901
Add device workaround for int4 weight only quantization after API update ( #36980 )
...
* merge
* fix import
* format
* reformat
* reformat
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-02 12:42:22 +02:00
Raushan Turganbay
211e4dc9a4
[chat-template] fix video loading ( #37146 )
...
* fix
* add video
* trigger
* push new iamges
* fix tests
* revert
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-02 11:27:50 +02:00
Yih-Dar
35253076f4
Avoid pipeline test failing related to Hub call ( #37170 )
...
* cls
* cls
* cls
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-01 18:22:45 +02:00
Pavel Iakubovskii
3249c5dc15
Refactor attention for SigLIP based models ( #36981 )
...
* Update Siglip attention implementation
* Update tests for Siglip
* Remove one level of indentation
* Update test to be more specific
* Fixup
* Idefics2
* Idefics3
* Emu3
* SmolVLM
* Phi4 (just init small update)
* Idefics2 (test fix)
* Update siglip2 tests
* Update eager
* trigger
* Clean up
* Transfer inputs to device in test
* Fixing test
* Fixing test
* Revert contiguous
* Remove unused is_flash_attn_2_available
* Move flaky to specific models
2025-04-01 15:37:25 +02:00
Yao Matrix
24e311f42b
fix XPU UT error case brough by RNG difference btw XPU and CUDA ( #37121 )
...
* fix XPU UT error case brough by RNG difference btw XPU and CUDA
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* Revert "enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu"
This reverts commit 3ef83a4f02
.
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-01 13:52:55 +01:00
Tom Aarsen
897ff9af0e
[ModernBERT
] Never save 'reference_compile' config; should be set based on end user ( #36305 )
...
* Never save 'reference_compile' config; should be set based on end user
* Reformat (I ran 'make style' from the wrong env)
* Use pop instead of del
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Use pop instead of del
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-04-01 14:14:39 +02:00
Qizhi Chen
fac70ff3c0
Convert _VALID_DICT_FIELDS
to class attribute for shared dict parsing in subclasses ( #36736 )
...
* make _VALID_DICT_FIELDS as a class attribute
* fix test case about TrainingArguments
2025-04-01 12:29:12 +02:00
Yao Matrix
8f6b27eb5c
enable test_assisted_decoding_in_different_gpu
test on XPU ( #37120 )
...
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-01 11:22:59 +02:00
jiqing-feng
737cbd2109
Fix llava xpu tests. ( #37130 )
...
* fix llava 4bit xpu test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix llava 4bit xpu test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-01 11:10:13 +02:00
jiqing-feng
3a6ab46a0b
add gpt2 test on XPU ( #37028 )
...
* add gpt2 test on XPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* auto dtype has been fixed
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* convert model to train mode
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-01 11:09:29 +02:00
cyyever
786d9c5ed9
Fix more inefficient PT operations ( #37060 )
...
* Fix inefficient operations
* Remove cpu() call
* Reorder detach()
* Reorder detach()
* tolist without detach
* item without detach
* Update src/transformers/models/rag/modeling_rag.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/models/encodec/test_modeling_encodec.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Use detach().cpu().numpy
* Revert some numpy operations
* More fixes
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-31 16:31:24 +01:00
Pavel Iakubovskii
a1e389e637
Refactor return_dict
logic to remove complicated if/else paths ( #36794 )
...
* SAM
* CLIP
* SigLIP
* GOT-OCR2 (depends on SAM)
* SigLIP2 (depends on SigLIP)
* trigger tests
* Fix SAM
* Fix missed indexing, use named attributes
* Llama
* Aria
* Bamba
* Update llama: missed outputs return type
* (fixup) Aria
* DiffLlama
* Emu3
* Gemma
* Gemma2
* Paligemma
* Fix paligemma
* Gemma3
* GLM
* Helium
* JetMoe
* Jamba
* Mistral
* Mistral
* Mixtral
* Nemotron
* Olmo
* Olmo2
* Persimmon
* Phi
* Phi3
* PhiMoe
* Qwen2
* Qwen2_moe
* StableLM
* Starcoder2
* Add return_dict decorator
* SAM
* Update decorator: compile, export, trace - friendly
* Llama (decorator)
* SAM (decorator)
* Add decorator `can_return_tuple`
* Llama
* Update to decorator
* Update CLIP
* Update decorator to store `_is_top_level_module` in self
* Update decorator to correctly handle compile/export
* Remove is_torchdynamo_compiling constraint, all work fine with self attribute assignment
* Typing
* GPT NeoX
* Fixup
* Fix attribute Granite
* Fix return type mixtral
* Update Gemma3
* Fix Cohere amd Cohere2
* Fixup
* Fix corner case for Phi4, when activation is shared
* (fix-copies) deepseekv3, phi4
* Fixup
* Apply to qwen3/qwen3_moe
* Fix
2025-03-31 16:23:37 +01:00
Cyril Vallez
f304318f5f
Remove low_cpu_mem_usage and _fast_init ( #36963 )
...
* Remove low_cpu_mem_usage and _fast_init
* Update deepspeed.py
* Update modeling_utils.py
* remove the first 2 tests everywhere
* Update test_modeling_common.py
* remove what was remaining about fast_init
* fix logic and simplify
* mismatched keys logic update
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* fix 2 models init_weights
* extend to others
* remove grad
* Update modeling_fsmt.py
* init weights in tests
* style
* Update test_modeling_fsmt.py
* more old models
* fix more init_weights
* copies
* fix
* style
* Update modeling_lxmert.py
* fix inits
* more and more
* more
* should finalize
* style
* Update modeling_dinov2_with_registers.py
* fix
* Update modeling_encoder_decoder.py
* fix
* style
* Update modeling_lxmert.py
* post rebase cleanup
* Update modeling_informer.py
* back to start for device
* fix
* add test to detect all failing cases correctly
* Update test_modeling_common.py
* fix
* fix
* sam
* style
* Update modeling_maskformer_swin.py
* CIs
* CIs
* remove test - will add it on separate PR
* fix
* fix
* Update modeling_sam.py
* CIs
* CIs
* CIs
* convnext
* suggestions
* CIs
* fix copies after merge
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-31 17:18:43 +02:00
Raushan Turganbay
8805600406
[qwen3] fix generation tests ( #37142 )
...
* do not skip tests
* fix qwen3-moe as well
* fixup
* fixup
2025-03-31 16:33:41 +02:00
Zhen
e686fed635
[Feature] Support using FlashAttention2 on Ascend NPU ( #36696 )
...
* [Feature] Support using flash-attention on Ascend NPU
* Fix qwen3 and qwen3_moe moduler conversion mismatch
2025-03-31 16:12:58 +02:00
Yih-Dar
a03cee7a1d
skip ( #37141 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-31 15:38:40 +02:00
Guang Yang
3b07ca78bb
Export T5 (encoder-decoder) to ExecuTorch ( #36486 )
...
Co-authored-by: Guang Yang <guangyang@fb.com>
2025-03-31 12:10:26 +02:00
Fanli Lin
475664e2c6
[tests] remove cuda-only test marker in AwqConfigTest
( #37032 )
...
* enable on xpu
* add xpu support
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-31 11:53:02 +02:00
Armaghan Shakir
0710e9b1e8
Create and Expose SamVisionModel as public for better accessibility ( #36493 )
...
* move encoder below
* auto modeling
* write SamVisionTester
* fix vision attention shape
* fix SamVisionTest
* minor changes to SamVisionTest
* Revert "fix vision attention shape"
This reverts commit d2a4083ae5
.
* fix attention output shape in new tests
* remove encoder examples
* run modular on got_ocr2
* code formatting
* fix got_ocr2
* ruff fixes
* code quality
* add sam_vision in auto modeling and auto configuration
* remove composite test
* updated index.md
* add TFSamVisionEncoder to __init__
* fix public TFSamVisionEncoder
* remove outdated todo comment
* set test_torch_exportable
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* rename: VisionEncoder -> VisionModel
* bring back original SamVisionEncoder
* rename back: VisionEncoderOutput -> VisionModelOutput
* undo changes in SamModelTester
* reuse SamVisionEncoder in SamVisionModel
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-03-31 11:45:07 +02:00
cyyever
f99c279d20
Remove deprecated code ( #37059 )
...
* Remove deprecated code
* fix get_loading_attributes
* fix error
* skip test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-31 11:15:35 +02:00
huismiling
d0b65bb479
[MLU] Fix FA2 check error, remove deepspeed-mlu deps. ( #36159 )
...
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
* Cambricon support SDPA and flash_attn
* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu
* Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support.
* fix testing errors.
* Merge branch 'hf/main' into main
* fix get_device_count error.
* fix mlu testing utils.
* fix code quality and style.
* switch to @require_torch_multi_accelerator
2025-03-31 11:02:49 +02:00
jiqing-feng
286393fbb1
enable tp on CPU ( #36299 )
...
* enable tp on CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* get rank from cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable TP tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* em print
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix model id
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix conflict
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix index and add doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-03-31 10:55:47 +02:00
Bo Zheng
6acd5aecb3
Adding Qwen3 and Qwen3MoE ( #36878 )
...
* Initial commit for Qwen3
* fix and add tests for qwen3 & qwen3_moe
* rename models for tests.
* fix
* fix
* fix and add docs.
* fix model name in docs.
* simplify modular and fix configuration issues
* Fix the red CI: ruff was updated
* revert ruff, version was wrong
* fix qwen3moe.
* fix
* make sure MOE can load
* fix copies
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-03-31 09:50:49 +02:00
Joao Gante
9fd9476005
[generate] beam search -- fix output cropping ( #37080 )
...
* handle jagged beams
* better comment
* bart -- beam search tests print special tokens
* more bart test updates
* more tests!
* better comment
2025-03-28 18:57:51 +01:00
Minho Ryu
eca74d1367
[WIP] add deepseek-v3 ( #35926 )
...
* init commit
* style
* take comments into account
* add deepseekv3 modeling
* remove redundant code
* apply make style
* apply fix-copies
* make format
* add init files
* rename deepseekv3 into deepseek_v3 based on its model_type
* rename deepseekv3 into deepseek_v3 based on its model_type
* deepseek-v3 not deepseek_v3
* set model_type as deepseek_v3
* use default docs
* apply make
* fill type and docstring
* add rope_config_validation
* use custom DeepseekV3MLP
* hold code only for checkpoints congifuration; remove redundant
* revise rope yarn for DeepSeek variation
* rename DeepSeek-V3
* some refactoring
* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral
* fix attention forward
* use -1 for not-changing dim when to use exapnd
* refactor DeepseekV3TopkRouter
* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim
* register pre_hook and hook both
* make style
* use n_shared_experts
* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add test file
* update modeling_file according to modular file
* make style
* add mapping for DeepseekV3ForSequenceClassification
* remove aux_loss_alpha
* add deepseek_v3 for perf
* add deepseek_v3
* rename test as deepseekv3
* use tiny-deepseek-v3
* remove DeepseekV3ForSequenceClassification
* cache before padding
* remote output_router_logits
* Revert "remote output_router_logits"
This reverts commit f264f800d0
.
* remove output_router_logits
* make e_score_correction_bias as buffer
* skip tests not compatible
* make style
* make e_score_correction_bias as buffer
* use rope_interleave instead of load_hook
* skip tests not compatible with MLA
* add doc for rope_interleave
* fix typo
* remove torch.no_grad for selecting topk
* fix post merge issue
* mrege with main and simplify
* nits
* final
* small fixes
* fix
* support TP better
* stash
* changes currently requires
* remove synch
* more fixes for TP
* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used
* updates to have generation work!
* push most of the changes
* reorder functions + call for contributions!
* update readme
* nits
* update
* ruff was updated on main
* merge with main and fix copies
* revert unrelated changes
* route all tokens to all experts when testing to avoid no gradient iddues
* finish fixing all tests
* fixup
* nit
* clean config
* last readme changes
* nit
* do cnit
* typo
* last nit
* one more one more
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
2025-03-28 15:56:59 +01:00
Yih-Dar
1fcaad6df9
Use lru_cache
for tokenization tests ( #36818 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 15:09:35 +01:00
cyyever
6cc9c8d7d1
Remove deprecated batch_size parameter ( #37007 )
2025-03-27 15:01:56 +00:00
cyyever
41a0e58e5b
Set weights_only in torch.load ( #36991 )
2025-03-27 14:55:50 +00:00
Joao Gante
29f322d04d
[generate, cache] handle more complex device maps ( #37014 )
2025-03-27 14:33:20 +00:00