Commit Graph

19598 Commits

Author SHA1 Message Date
Pavel Iakubovskii
fe1a5b73e6
[modular] speedup check_modular_conversion with multiprocessing (#37456)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Change topological sort to return level-based output (lists of lists)

* Update main for modular converter

* Update test

* update check_modular_conversion

* Update gitignore

* Fix missing conversion for glm4

* Update

* Fix error msg

* Fixup

* fix docstring

* update docs

* Add comment

* delete qwen3_moe
2025-07-10 19:07:59 +01:00
Cyril Vallez
571a8c2131
Add a default value for position_ids in masking_utils (#39310)
* set default

* Update masking_utils.py

* add small test
2025-07-10 18:53:40 +02:00
Kyle Sayers
bdc8028cb3
[Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups (#39263)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-10 18:33:30 +02:00
Joao Gante
df49b399dc
[tests] tag serve tests as slow (#39343)
* maybe they need more cpu resources?

* add todo
2025-07-10 15:40:08 +00:00
Paul Pak
36e80a18da
[modeling][lfm2] LFM2: Remove deprecated seen_tokens (#39342)
* [modeling][lfm2] remove deprecated seen_tokens

* [modular][lfm2] remove deprecated seen_tokens from modular file
2025-07-10 17:27:55 +02:00
Paul Pak
9682d07f92
LFM2 (#39340)
* [modeling][lfm2] LFM2 model on 4.53.0 interface

* [configuration] hook in LFM2 keys

* [modeling][lfm2] update modeling interface for 4.53.1

* [modeling][lfm2] apply mask to hidden conv states

* [misc] ruff format/lint

* [modeling][lfm2] minor: NotImplemented legacy cache conversion

* Create lfm2.md

* create nice modular

* style

* Update modeling_auto.py

* clean and start adding tests

* style

* Update test_modeling_lfm2.py

* Update __init__.py

* small test model size

* config

* small fix

* fix

* remove useless config attrs -> block_dim and conv_dim are hiden_size

* fix prepare inputs

* fix config

* test

* typo

* skip tests accordingly

* config docstrings

* add doc to .md

* skip config docstring check

---------

Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-10 16:07:33 +02:00
Joao Gante
38c3931362
[server] add tests and fix passing a custom generation_config (#39230)
* add tests; fix passing a custom generation_config

* tool integration test

* add install step

* add accelerate as dep to serving

* add todo
2025-07-10 13:41:38 +00:00
edwko
6b09c8eab0
Handle DAC conversion when using weight_norm with newer PyTorch versions (#36393)
* Update convert_dac_checkpoint.py

* Update convert_dac_checkpoint.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-10 10:36:58 +00:00
Yih-Dar
92043bde29
fix phi3 tests (#39312)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-10 11:51:55 +02:00
Kingsley
520b9dcb42
fix Glm4v batch videos forward (#39172)
* changes for video

* update modular

* change get_video_features

* update video token replacement

* update modular

* add test and fix typo

* lint

* fix order

* lint

* fix

* remove dependency

* lint

* lint

* remove todo

* resize video for test

* lint..

* fix test

* new a processor for video_test

* fix test
2025-07-10 10:44:28 +02:00
Raushan Turganbay
bc161d5d06
Delete deprecated stuff (#38838)
* delete deprecated stuff

* fix copies

* remove unused tests

* fix modernbert and fuyu

* Update src/transformers/cache_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* bye bye `seen_tokens`

* address comments

* update typings

* ecnoder decoder models follow same pattern as whisper

* fix copies

* why is it set to False?

* fix switch transformers

* fix encoder decoder models shared weight

* fix copies and RAG

* remove `next_cache`

* fix gptj/git

* fix copies

* fix copies

* style...

* another forgotten docsrting

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-10 05:18:44 +00:00
Yoni Gozlan
c6ee0b1da8
Fix broken SAM after #39120 (#39289)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
fix
2025-07-09 17:46:22 -04:00
jiqing-feng
aff7df8436
enable static cache on TP model (#39164)
* enable static cache on TP model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check tp size before init kv cache

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix docstring

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add tp tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix other cache head size

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-09 21:14:45 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
2ef59646b8
Fix max_length_q and max_length_k types to flash_attn_varlen_func (#37206)
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-09 23:12:39 +02:00
Avihu Dekel
2d600a4363
Granite speech speedups (#39197)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum

* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.

* support audio.shape=(1,L)
2025-07-09 23:09:50 +02:00
Tom Aarsen
5111c8ea2f
Fix typo: langauge -> language (#39317) 2025-07-09 12:06:46 -07:00
Priya aka Priyamvadha Balakrishnan
2781ad092d
docs: update LLaVA-NeXT model card (#38894)
* docs: update LLaVA-NeXT model card

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [docs] Updated llava_next model card

* Update docs/source/en/model_doc/llava_next.md remove image sources

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [fix] Change Flash Attention to SDPA badge

* [doc] fixed quantization example

* docs: updated contribution details and badges

* Update llava_next.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 11:32:40 -07:00
Yih-Dar
16dd7f48d0
skip files in src/ for doctest (for now) (#39316)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:36:48 +02:00
Eman Risha
d61c0d087c
Updated the Model docs - for the MARIAN model (#39138)
* Update marian.md

This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include:

- Added a clear description of MarianMT, its architecture, and how it differs from other models.
- Provided usage examples for Pipeline and AutoModel.
- Added a quantization example for optimizing model inference.
- Included instructions and examples for multilingual translation with language codes.
- Added an Attention Mask Visualizer example.
- Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation.
- Fixed formatting issues in the code blocks for correct rendering.

This update improves the readability, usability, and consistency of the Marian model documentation for users.

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 10:23:03 -07:00
Yih-Dar
161cf3415e
add stevhliu to the list in self-comment-ci.yml (#39315)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:07:44 +02:00
Cyril Vallez
3be10c6d19
Fix consistency and a few docstrings warnings (#39314)
* Update modeling_deepseek_v2.py

* fix docstrings

* fix

* fix
2025-07-09 18:40:37 +02:00
MaCAT
4652677c89
🌐 [i18n-KO] Translated quark.md to Korean (#39268)
* initial translation

* removed english parts

* maintain consistency

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* add toctree

* fixed indentation

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-07-09 09:29:51 -07:00
Vladislav Bronzov
c980904204
Add DeepSeek V2 Model into Transformers (#36400)
* add initial structure

* doc fixes, add model base logic

* update init files

* some fixes to config and modular

* some improvements for attention

* format

* remove unused attn

* some fixes for moe layer and for decoder

* adapt _compute_yarn_parameters for deepseek

* format

* small fix

* fix for decoder forward

* add tests, small refactoring

* fix dummies

* fix init

* fix doc

* fix config docs

* add sequce doc, fix init for gate

* fix issues in tests

* fix config doc

* remove unused args

* some fixes and refactoring after review

* fix doc for config

* small fixes for config args

* revert config refactoring

* small refactoring

* minor fixes after rebase

* small fix after merge

* fix modular

* remove rotaryembd from public init

* small test fix

* some rotary pos calculation improvement

* fix format

* some improvements and fixes

* fix config

* some refactoring

* adjust some unit tests

* skip test

* small fixes and tests adjustment

* reapply modular

* fix all tests except Integration

* fix integration testzs

* cleanup BC stuff

* rope

* fix integrations tests based on a10

* style

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-09 17:04:28 +02:00
Raushan Turganbay
accbd8e0fe
[sliding window] revert and deprecate (#39301)
* bring back and deprecate

* oops

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-09 16:10:38 +02:00
Cyril Vallez
1cefb5d788
[modular] Allow method with the same name in case of @property decorator (#39308)
* fix

* add example

* fix

* Update modular_model_converter.py
2025-07-09 15:46:53 +02:00
Yih-Dar
4798c05c64
skip test_torchscript_* for now until the majority of the community ask for it (#39307)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 15:35:48 +02:00
Yih-Dar
fe5f3c85d2
fix aria tests (#39277)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 13:49:33 +02:00
Raushan Turganbay
0687d481e2
[flash attn 3] bring back flags (#39294)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* flash attn 3 flag

* fix copies
2025-07-09 09:45:01 +02:00
JJJYmmm
25343aafee
Fix SDPA attention precision issue in Qwen2.5-VL (#37363)
* solve conflicts and remove  redundant attention_mask in qwenvit

* update decoded text check

* remove trailing whitespace
2025-07-09 07:03:44 +02:00
Yaswanth Gali
0e1c281745
[Tests] Update model_id in AIMv2 Tests (#39281)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Update model_id in tests

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 21:46:32 +02:00
Biao Zhang
7ef592c96c
Update T5gemma (#39210)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* bug fix: add vocab_size to t5gemmaconfig for pipeline.

* Update checkpoint placeholder

* minor change

* minor change

* minor change: update example.

* fix: add vocab_size as an explict arg.

* buf fix:

remove vocab_size verification; instead, re-set encoder/decoder vocab size.

Note, in t5gemma, vocab size of encoder/decoder shoud be always the same.

* add `add_generation_prompt` for message preprocessing.
2025-07-08 19:08:48 +02:00
Quentin Lhoest
1ecd52e50a
Add torchcodec in docstrings/tests for datasets 4.0 (#39156)
* fix dataset run_object_detection

* bump version

* keep same dataset actually

* torchcodec in docstrings and testing utils

* torchcodec in dockerfiles and requirements

* remove duplicate

* add torchocodec to all the remaining docker files

* fix tests

* support torchcodec in audio classification and ASR

* [commit to revert] build ci-dev images

* [commit to revert] trigger circleci

* [commit to revert] build ci-dev images

* fix

* fix modeling_hubert

* backward compatible run_object_detection

* revert ci trigger commits

* fix mono conversion and support torch tensor as input

* revert map_to_array docs + fix it

* revert mono

* nit in docstring

* style

* fix modular

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 17:06:12 +02:00
StevenBucaille
1255480fd2
[lightglue] add support for remote code DISK keypoint detector (#39253)
* feat: add trust_remote_code in LightGlueConfig

* fix: made sure trust_remote_code is provided only when necessary

* fix: make style

* docs: added missing trust_remote_code docstring

* refactor: refactored LightGlue config init

* fix: removed unnecessary argument
2025-07-08 15:03:04 +00:00
Yih-Dar
838a0268b8
fix flaky test_generate_compile_model_forward (#39276)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 15:36:05 +02:00
Pavel Iakubovskii
29d0030e23
Refactor PretrainedConfig.__init__ method to make it more explicit (#39158)
* cleanup

* fix no `__init__` test

* fix missing inits
2025-07-08 14:24:39 +01:00
Joao Gante
1580f64653
[smollm3] add tokenizer mapping for smollm3 (#39271)
add tok mapping to smollm3
2025-07-08 10:44:01 +00:00
Kashif Rasul
db05e4ff33
[pagged-attention] fix off-by-1 error in pagged attention generation (#39258)
* fix off-by-1 error in pagged attention generation

* formatting

* use update_with_token
2025-07-08 12:34:22 +02:00
Joao Gante
6f1a43896c
[CI] fix docs (#39273)
* fix docs

* add ko gloassary file to toctree
2025-07-08 11:31:03 +01:00
Yaswanth Gali
fbdaa7b099
Add Aimv2 model (#36625)
* Model skelton

* changes

* temp push

* changes

* Added support for aimv2-native

* More changes

* More changes

* Stupid mistake correction

* Added config and refactor

* Added vison model

* update

* Refactor for lit variant

* Added Text Model

* Minor fixes

* nits

* update

* Preliminary tests

* More fixes

* Updated tests 🤗

* Refactor

* Updated testcase

* Updated config

* make fixup

* more fixes

* Bug fix and updates

* deadcode

* Fixes

* nit

* up

* Happy CI 

* Reduce LOC

* nit

* nit

* make style

* return_dict refactor

* bug fix

* fix

* doc update

* nit

* make fixup

* Minor update

* _init_weigths modifcation

* update tests

* Minor fixes post review

* Update w.r.t GradientCheckpointingLayer

* docs update

* update

* nit

* Use more Modular 😉

* Change name from AIMv2 to Aimv2

* Nit

* make style

* Add model doc pointer

* make style

* Update model doc section

* updates

* Modify attn mask and interface

* update test

* Final change

* Utilize flash and flex attn

* keep attn mask

* camelcase model name in test file

* Fix docstring

* Fix config warning finally and create_causal_mask

* disable torchscript

* remove unused arg

* remove from tests

* balance model size for tests

* fix device

* tests

* tests

* flaky test

* fix import

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:53:21 +02:00
Jingze Shi
d8590b4b0c
Add Doge model (#35891)
* Add Doge Model

* Fix code quality

* Rollback an error commit

* Fix config for open-source weights

* Revert "Fix config for open-source weights"

This reverts commit 229cdcac10.

* Add modular_doge

* Update Doge inherits from Llama

* Fix import bug

* [docs] Add usage of doge model

* Fix Doge import pretrainedconfig from modeling_utils to configuration_utils

* [docs] remove trust remote code from doge

* Fix dynamo bug in doge model

* Update docstrings

* Import apply_rotary_pos_emb and repeat_kv from Llama

* Fix all nits

* Fix code quality

* Fix some bugs

* Fix code quality

* Remove inherited `_update_causal_mask` from Llama
This leads to incorrect weight initialization.

* Fix the wrong tensor orderings in DogeCDMoE

* Fix attention mask bug
We have to provide attention_mask for dynamic mask computation

* Modify most implementations to inherit from Llama
But there are two problems:
1. `flex_attention_forward` is not updated properly
2. `Example` error in the forward method of DogeForCausalLM

* Modify CDMoE for batch efficient implementation

* Uniform MoE configuration names, just like QwenMoE

* Fix code quality

* Fix code quality

* Fix code quality

* Add tp plan of CDMoE Module

* Hybird DMA with sliding window

* Update valid tokens greater than window size

* Fix code quality

* Add `convert_doge_weights_to_hf`

* Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py

* Fix nits in modular_doge

* Fix code quality

* Fix all nits

* Fix all nits

* Make sure the attention function is updated inside the class

* Fix code quality issues in the Doge model and add a test for it

* Fix `test_generate`

* Fix code quality

* Fix nits fllowing suggestions

* Fix code quality

* Fix code quality issues

* Fix nits

* Fix code quality nits

* Fix the missing parameters in the configuration.

* Fix the missing parameters in the configuration.

* Fix nits

* Add initialization of attention

* Fix last nits

* Simplify dynamic mask generation logic

* Rename router_logits to gate_logits for matching latest changes of MixtralModel

* Rename typings for matching latest changes of MixtralModel

* Fixes typo in comment

* Update src/transformers/models/doge/modular_doge.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix code quality issues to match other modular

* Fix code quality issues to match other modular

* Fix the static compilation errors

* Update model weights link

* Fix code quality issues to match other modular

* reapply modular and support for new outputs

* style

* simplify a lot

* fix import location

* reapply modular

* fix

* fix integration test

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:44:29 +02:00
Joonchen Liau
d370bc64c6
Fix errors when use verl to train GLM4.1v model (#39199)
* Fix errors when use verl to train GLM4.1v model

* Support glm4v load from AutoModelForVision2Seq
* Set glm4v model _checkpoint_conversion_mapping attr from None to {}

* Update modeling_auto.py
2025-07-08 09:39:31 +00:00
Arthur
5fb8bb3e1a
fix recompiles due to instance key, and deepcopy issues (#39270)
* fix recompiles due to instance key, and deepcopy issues

* dict
2025-07-08 11:38:11 +02:00
Guang Yang
356fd68109
fix(generation): stop beam search per-instance when heuristic satisfied (#38778)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* fix(decoding): stop beam search per-instance when heuristic satisfied

Previously, when early_stopping is set to `False`, the early-stopping heuristic only halted generation when **all** batch instances reached the criterion. This caused instances that are impossible (suggested by the heuristic) to improve keep generating, leading to inconsistent and overlong outputs across the batch.

Now we apply the heuristic **per-instance**: once a certain instance of batch has its all beams impossibe to improve, we mark that instance finished while letting others continue. This restores expected behavior and ensures consistency in batched generation.

* Add test case GenerationIntegrationTests.test_beam_search_early_stop_heuristic

* Update naming improvement_possibility -> is_early_stop_heuristic_unsatisfied

* Add comments for early stop heuristic

* Update src/transformers/generation/utils.py

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-08 08:59:37 +00:00
Pablo Montalvo
0b0ede8b2b
remove broken block (#39255)
* remove broken block

* fixup
2025-07-08 10:41:44 +02:00
Yih-Dar
a21557fa3e
Skip test_eager_matches sdpa generate and update an integration test for blip-like models (#39248)
* skip

* skip

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 10:38:25 +02:00
gudwls215
ea3c2c0277
Fix license text, duplicate assignment, and typo in constant names (#39250)
- Complete Apache License text in Italian documentation
- Remove duplicate variable assignment in Perceiver converter
- Fix typo in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES constant
2025-07-08 10:20:52 +02:00
Yao Matrix
b2816da802
fix xpu failures on PT 2.7 and 2.8 w/o IPEX and enable hqq cases on XPU (#39187)
* chameleon xpu bnb groundtruth update on bnb triton backend since we are
deprecating ipex backend

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable hqq uts on XPU, all passed

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix comment

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-07-08 10:18:26 +02:00
Yuxuan Zhang
17b3c96c00
Glm 4 doc (#39247)
* update the glm4 model readme

* update test

* update GLM-4.1V model

* update as format

* update

* fix some tests

* fix the rest

* fix on a10, not t4

* nit: dummy import

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-07-08 08:22:04 +02:00
Drew Ross
bbca9782ca
Update LED model card (#39233)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Update LED model card

* Remove extra arguments

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-07 15:56:57 -07:00
Yih-Dar
41e865bb8d
fix some flaky tests in tests/generation/test_utils.py (#39254)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 19:49:41 +02:00