Commit Graph

18999 Commits

Author SHA1 Message Date
Manuel de Prada Corral
d34e21e7dd
New cache tests and refactored Hybrid Cache (#37972) 2025-05-20 12:46:13 +02:00
Matthew Hoffman
183fb3637c
Add Llama4TextModel to AutoModel mapping (#38162)
Add Llama4TextModel to AutoModel mapping

using Llama4TextConfig on AutoModel.from_config raises a ValueError when it is expected to instantiate a Llama4TextModel
2025-05-20 10:01:00 +00:00
Titus
f022bf9322
Remove trust_remote_code=True tests from bnb quantization tests (MPT now integrated) (#38206)
bnb quant tests: remove obsolete trust_remote_code test

The MPT model is now natively integrated in Transformers and no longer requires trust_remote_code=True. This removes the failing test_get_keys_to_not_convert_trust_remote_code and related usage, which depended on remote code and caused CI issues due to missing dependencies (e.g., triton_pre_mlir).
2025-05-20 11:43:11 +02:00
Raushan Turganbay
0a52bd2403
[fix] sliding window attention mask (#38045)
* fix sliding attn

* make style

* Update tests/test_modeling_common.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* no a second throught, should default to `True` fo BC

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-05-20 09:32:19 +00:00
Yong Hoon Shin
555715f418
Fix broken example generation script for Llama3 (#38062)
Fix broken example generation script for llama3
2025-05-20 10:53:43 +02:00
Matej Sirovatka
7a611f0afd
Fix: make docs work better with doc builder (#38213) 2025-05-20 08:23:03 +00:00
Yao Matrix
3bd1c20149
enable misc cases on XPU & use device agnostic APIs for cases in tests (#38192)
* use device agnostic APIs in tests

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* more

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* add reset_peak_memory_stats API

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-20 10:09:01 +02:00
shawn
dbc4b91db4
Qwen2.5-Omni: Update modeling_qwen2_5_omni.py to fix error when loading quantized weights with AutoAWQ. (#38013)
* Update modular_qwen2_5_omni.py

fix the error when loading quantized model by AuotAWQ.

* Update modeling_qwen2_5_omni.py

sync code to modular_qwen2_5_omni.py
2025-05-20 09:53:51 +02:00
Matej Sirovatka
46a4b7c909
Feat: save_pretrained for tensor parallel (and other parallelisms) models (#37919)
* tmp: initial save pretrained with dtensors

* Feat: add correctness tests

* Refactor: version checks

* Temp: 1:1 checkpoint llama4

* refactor

* Tests

* Feat: works

* Style

* Feat: version checks + minor fixes

* Style

* Fix: version checks in tests

* Feat: move more stuff into tensor_parallel.py
2025-05-19 18:16:21 +00:00
Fanli Lin
9ecee14378
[doc] fix bugs in how_to_hack_models.md (#38198)
fix several bugs
2025-05-19 10:37:54 -07:00
Nanji Huaji
f524439cc5
Translating model_doc/bert.md to Chinese (#37806)
* Translated model_doc/bert.md

* Revise grammatical errors

* Changed _toctree.yml

* Revise some errors
2025-05-19 10:14:57 -07:00
Matej Sirovatka
6e738411e1
Tensor parallel docs (#38178)
* Feat: initial docs

* Feat: update doc

* Final typos/changes

* Refactor: reorder top to bottom.
2025-05-19 17:05:01 +00:00
Joao Gante
9c500015c5
🚨🚨🚨 [pipelines] update defaults in pipelines that can generate (#38129)
* pipeline generation defaults

* add max_new_tokens=20 in test pipelines

* pop all kwargs that are used to parameterize generation config

* add class attr that tell us whether a pipeline calls generate

* tmp commit

* pt text gen pipeline tests passing

* remove failing tf tests

* fix text gen pipeline mixin test corner case

* update text_to_audio pipeline tests

* trigger tests

* a few more tests

* skips

* some more audio tests

* not slow

* broken

* lower severity of generation mode errors

* fix all asr pipeline tests

* nit

* skip

* image to text pipeline tests

* text2test pipeline

* last pipelines

* fix flaky

* PR comments

* handle generate attrs more carefully in models that cant generate

* same as above
2025-05-19 18:02:06 +01:00
Joao Gante
6f9da7649f
[image-text-to-text pipeline] Accept a chat as a positional arg (#38204)
accept chat as a positional arg
2025-05-19 17:26:09 +01:00
NielsRogge
7c9b0ca08c
[SAM-HQ] Update names in the docs (#38058)
Update names
2025-05-19 09:21:14 -07:00
Daize Dong
04282a9ef5
Remove Deprecated verbose arg in LayerWiseDummyScheduler (#38197)
Remove Deprecated args in LayerWiseDummyScheduler
2025-05-19 13:49:11 +00:00
Shane A
aef12349b6
Make HF implementation match original OLMo 2 models for lower precisions (#38131)
* Make HF implementation match OLMo models for lower precisions

* Add test of 1B logits in bfloat16

* Run make fixup
2025-05-19 15:35:23 +02:00
Fanli Lin
9644acb7cb
[docs] add Audio import (#38195)
add Audio import
2025-05-19 13:16:35 +00:00
Fanli Lin
7d93f93f83
[docs] minor fixes in models.md (#38193)
minor gix
2025-05-19 13:14:21 +00:00
Sergio Paniego Blanco
47f8578d96
Pass eps to Mistral3RMSNorm (#38026)
Pass eps to Mistral3RMSNorm

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-19 15:09:25 +02:00
Emmanuel Ferdman
6c6302817d
Resolve Python logger warnings (#38183)
* Resolve Python logger warnings

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* Apply style fixes

---------

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-19 12:53:07 +00:00
Lysandre Debut
003deb16f1
Support for transformers explicit filename (#38152)
* Support for transformers explicit filename

* Tests

* Rerun tests
2025-05-19 14:33:47 +02:00
Joao Gante
dbb9813dff
[generation] Less verbose warnings by default (#38179)
* tmp commit (imports broken)

* working version; update tests

* remove line break

* shorter msg

* dola checks need num_beams=1; other minor PR comments

* update early trainer failing on bad gen config

* make fixup

* test msg
2025-05-19 10:03:37 +00:00
Daize Dong
656e2eab3f
Add adam_kwargs for Apollo Optimizer (#38168)
Add adam_kwargs for Apollo
2025-05-19 08:59:49 +00:00
Yaswanth Gali
6bb6821d93
Refactor get_XXX_dataloader from Trainer (#38090)
* Remove test_dataloader

* refactor
2025-05-19 10:43:27 +02:00
Joao Gante
40a493c7ed
[tests] remove test_sdpa_equivalence (redundant) (#37911)
* rm test_sdpa_equivalence

* make fixup

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-16 18:37:27 +01:00
kang sheng
ea29f61ed9
fix bug in distributed loss test (#38166)
* fix bug in distributed loss test and change some config to pass at both 2&8 gpus

* fix doc
2025-05-16 16:21:35 +00:00
Chachura Baptiste
a4389494c7
Fix import torchao.prototype.low_bit_optim since torchao v0.11 (#38174)
* Fix ModuleNotFoundError torchao.prototype.low_bit_optim since torchao v 0.11.0

* Fix space on blank line

* update torchao's AdamW4bit and AdamW8bit import for v0.11.0

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-16 18:02:33 +02:00
Yoni Gozlan
0ba95564b7
Add args support for fast image processors (#37018)
* add args support to fast image processors

* add comment for clarity

* fix-copies

* Handle child class args passed as both args or kwargs in call and preprocess functions

* revert support args passed as kwargs in overwritten preprocess

* fix image processor errors
2025-05-16 12:01:46 -04:00
Peter St. John
d69945e5fc
[ESM] Add flash-attention-2 backend for ESM-2 (#38023)
* Add flash-attention-2 backend for ESM-2

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* update extended_attention_mask for fa2

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* add test_flash_attn_2_equivalence test

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-05-16 14:11:56 +01:00
Matej Sirovatka
7b5e327c6e
Feat: add warnings for unused keys and rules in tensor parallel (#37893)
Feat: tensor parallel plan verification
2025-05-16 14:52:47 +02:00
Yih-Dar
120935234f
remove some commands from fetch_tests CircleCI job (#38176)
delete

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-16 14:42:50 +02:00
Yih-Dar
91f6fa00f4
Disable convert to draft workflow (#38177)
delete

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-16 14:42:14 +02:00
Yih-Dar
5036ec8872
Disable Trigger CircleCI by ready for review (#38171)
delete

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-16 14:02:48 +02:00
Yao Matrix
7f28da2850
clean autoawq cases on xpu (#38163)
* clean autoawq cases on xpu

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 13:56:43 +02:00
Raushan Turganbay
01ad9f4b49
Bart: new cache format (#35314)
* bart compile

* add mbart

* some more models touched by fix-copies

* more

* more models

* even more models

* fix copies

* fix tests

* fix copies

* fix

* biogpt accepts position ids now (breaking?)

* fix failing non-slow tests

* fix some tests

* should not be removed

* small update

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update for last `main`

* fix copies

* clone `update_causal_mask` from llama

* tmp

* fixup

* why? how?

* fix bart tests

* dont skip test

* address comments

* fix tests

* fix

* fixup and delete the file

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-05-16 13:26:54 +02:00
Raushan Turganbay
3ab47b6ce3
[VLMs] add helpers to get multimodal encodings (#37743)
* add helpers in VLMs

* fix tests and copies

* fix blip tests

* make fix-copies

* fix copies

* fixup
2025-05-16 13:20:10 +02:00
Codys12
1e921a3a9c
Add optional RMSNorm support to BitNet quantization (config + layers) (#38087)
* enable optional RMS in BitLinear

* Fix naming

* Import RMS from Llama using config.*

* make fix-copies

* ran CI loop

* remove default BitNetQuantConfig values

* Fix BitNetQuantConfig to be Optional

* Fix config docstrings to match Optoinal

* Edit docstrings to match standards

---------

Co-authored-by: steinmetzc <codysteinmetz7@gmail.com>
Co-authored-by: codys12 <steinmetzc@dh-mgmt4.hpc.msoe.edu>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-16 12:38:06 +02:00
BakerBunker
57a79f51b2
Fix Qwen2.5 Omni SinusoidsPositionEmbedding precision (#38151)
* Fix Qwen2.5 Omni `SinusoidsPositionEmbedding` precision

fixes https://github.com/QwenLM/Qwen2.5-Omni/issues/271

* Update modular_qwen2_5_omni.py
2025-05-16 12:24:50 +02:00
Jerry Zhang
44fa04ae8d
Include output embedding as well with include_embedding flag (#37935)
* Include output embedding as well with `include_embedding` flag

Summary:
att

Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding

Reviewers:

Subscribers:

Tasks:

Tags:

* format

* rename include_embedding to include_input_output_embeddings

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-16 12:06:11 +02:00
Yao Matrix
34c1e29cdd
enable autoround cases on XPU (#38167)
* enable autoround cases on XPU

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 09:08:35 +00:00
Pavel Gein
0f77ca72ca
[FIX] Save speed metrics to logs (#38136)
Previously, we calculated speed metrics and did not do anything with the result.
2025-05-15 16:58:50 +02:00
Simon Levine
27ef46e846
Omit creation of positional IDs within ESM if applicable (#38089)
* omit pos emb creation

* rft

---------

Co-authored-by: sgottreich <sgottreich@absci.com>
2025-05-15 14:09:21 +00:00
Wing Lian
fe9426f12d
disable deepspeed when setting up fake trainer (#38101)
* disable deepspeed when setting up fake trainer

* Apply style fixes

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-15 15:34:04 +02:00
Yao Matrix
7caa57e85e
enable trainer test cases on xpu (#38138)
* enable trainer test cases on xpu

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-15 12:17:44 +00:00
Aurélien Lac
b11b28cc4e
Hotfix: Flash Attention 2 support in Pixtral (#38146)
setting attention_mask to None when flash_attention_2 is selected

Co-authored-by: aurelien.lac <aurelien.lac@lighton.ai>
2025-05-15 11:45:35 +02:00
Joao Gante
0e0e5c1044
[generate] Run custom generation code from the Hub (#36405)
* mvp

* remove trust_remote_code

* generate_from_hub

* handle requirements; docs

* english

* doc PR suggestions

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* changed remote code path to generate/generate.py

* model repo has custom generate -> override base generate

* check for proper inheritance

* some doc updates (missing: tag-related docs)

* update docs to model repo

* nit

* nit

* nits

* Update src/transformers/dynamic_module_utils.py

* Apply suggestions from code review

* Update docs/source/en/generation_strategies.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* trust remote code is required

* use new import utils for requirements version parsing

* use  org examples

* add tests

* Apply suggestions from code review

Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>

* ascii file structure; tag instructions on readme.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
2025-05-15 10:35:54 +01:00
Raushan Turganbay
955e61b0da
Remove head mask in generative models (#35786)
* just squash into one commit

* delete print
2025-05-15 10:44:19 +02:00
Yao Matrix
0173a99e73
enable csm integration cases on xpu, all passed (#38140)
* enable csm test cases on XPU, all passed

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-15 09:46:29 +02:00
Huang, Guangtai
e5a48785d9
[Qwen3] Qwen3 MoE add tp plan for expert mlps (#38135)
fix tp plan
2025-05-15 09:12:39 +02:00