Commit Graph

19165 Commits

Author SHA1 Message Date
Yao Matrix
7f28da2850
clean autoawq cases on xpu (#38163)
* clean autoawq cases on xpu

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 13:56:43 +02:00
Raushan Turganbay
01ad9f4b49
Bart: new cache format (#35314)
* bart compile

* add mbart

* some more models touched by fix-copies

* more

* more models

* even more models

* fix copies

* fix tests

* fix copies

* fix

* biogpt accepts position ids now (breaking?)

* fix failing non-slow tests

* fix some tests

* should not be removed

* small update

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update for last `main`

* fix copies

* clone `update_causal_mask` from llama

* tmp

* fixup

* why? how?

* fix bart tests

* dont skip test

* address comments

* fix tests

* fix

* fixup and delete the file

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-05-16 13:26:54 +02:00
Raushan Turganbay
3ab47b6ce3
[VLMs] add helpers to get multimodal encodings (#37743)
* add helpers in VLMs

* fix tests and copies

* fix blip tests

* make fix-copies

* fix copies

* fixup
2025-05-16 13:20:10 +02:00
Codys12
1e921a3a9c
Add optional RMSNorm support to BitNet quantization (config + layers) (#38087)
* enable optional RMS in BitLinear

* Fix naming

* Import RMS from Llama using config.*

* make fix-copies

* ran CI loop

* remove default BitNetQuantConfig values

* Fix BitNetQuantConfig to be Optional

* Fix config docstrings to match Optoinal

* Edit docstrings to match standards

---------

Co-authored-by: steinmetzc <codysteinmetz7@gmail.com>
Co-authored-by: codys12 <steinmetzc@dh-mgmt4.hpc.msoe.edu>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-16 12:38:06 +02:00
BakerBunker
57a79f51b2
Fix Qwen2.5 Omni SinusoidsPositionEmbedding precision (#38151)
* Fix Qwen2.5 Omni `SinusoidsPositionEmbedding` precision

fixes https://github.com/QwenLM/Qwen2.5-Omni/issues/271

* Update modular_qwen2_5_omni.py
2025-05-16 12:24:50 +02:00
Jerry Zhang
44fa04ae8d
Include output embedding as well with include_embedding flag (#37935)
* Include output embedding as well with `include_embedding` flag

Summary:
att

Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding

Reviewers:

Subscribers:

Tasks:

Tags:

* format

* rename include_embedding to include_input_output_embeddings

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-16 12:06:11 +02:00
Yao Matrix
34c1e29cdd
enable autoround cases on XPU (#38167)
* enable autoround cases on XPU

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 09:08:35 +00:00
Pavel Gein
0f77ca72ca
[FIX] Save speed metrics to logs (#38136)
Previously, we calculated speed metrics and did not do anything with the result.
2025-05-15 16:58:50 +02:00
Simon Levine
27ef46e846
Omit creation of positional IDs within ESM if applicable (#38089)
* omit pos emb creation

* rft

---------

Co-authored-by: sgottreich <sgottreich@absci.com>
2025-05-15 14:09:21 +00:00
Wing Lian
fe9426f12d
disable deepspeed when setting up fake trainer (#38101)
* disable deepspeed when setting up fake trainer

* Apply style fixes

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-15 15:34:04 +02:00
Yao Matrix
7caa57e85e
enable trainer test cases on xpu (#38138)
* enable trainer test cases on xpu

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-15 12:17:44 +00:00
Aurélien Lac
b11b28cc4e
Hotfix: Flash Attention 2 support in Pixtral (#38146)
setting attention_mask to None when flash_attention_2 is selected

Co-authored-by: aurelien.lac <aurelien.lac@lighton.ai>
2025-05-15 11:45:35 +02:00
Joao Gante
0e0e5c1044
[generate] Run custom generation code from the Hub (#36405)
* mvp

* remove trust_remote_code

* generate_from_hub

* handle requirements; docs

* english

* doc PR suggestions

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* changed remote code path to generate/generate.py

* model repo has custom generate -> override base generate

* check for proper inheritance

* some doc updates (missing: tag-related docs)

* update docs to model repo

* nit

* nit

* nits

* Update src/transformers/dynamic_module_utils.py

* Apply suggestions from code review

* Update docs/source/en/generation_strategies.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* trust remote code is required

* use new import utils for requirements version parsing

* use  org examples

* add tests

* Apply suggestions from code review

Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>

* ascii file structure; tag instructions on readme.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
2025-05-15 10:35:54 +01:00
Raushan Turganbay
955e61b0da
Remove head mask in generative models (#35786)
* just squash into one commit

* delete print
2025-05-15 10:44:19 +02:00
Yao Matrix
0173a99e73
enable csm integration cases on xpu, all passed (#38140)
* enable csm test cases on XPU, all passed

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-15 09:46:29 +02:00
Huang, Guangtai
e5a48785d9
[Qwen3] Qwen3 MoE add tp plan for expert mlps (#38135)
fix tp plan
2025-05-15 09:12:39 +02:00
Olivier Schipper
4005e30c80
Fix incorrect attention mask truncate in WhisperFlashAttention2 (#36477)
* Fix incorrect attention mask truncate in whisper flash attention

* also fix incorrect attention mask truncate in qwen2 audio

* Nit attention mask truncate modeling_qwen2_audio.py

* Nit attention mask truncate modeling_whisper.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-05-14 20:08:31 +00:00
Sangbum Daniel Choi
aa27fa75cd
enable d_fine finetuning properly (#37962)
add pre_output in the front

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-05-14 16:53:04 +01:00
Manuel de Prada Corral
e021bf6bf8
Add manueldeprada to run_slow whitelist (#38126)
Add manueldeprada to run_slow allowed users
2025-05-14 15:16:58 +02:00
Arjuna Sky Kok
ef27b2bc22
[docs] add uv installation instructions for source builds (#37968) 2025-05-14 13:09:41 +00:00
guspuffygit
4a2decd192
Update trainer.md (#38113)
Fix typo in torch.compile method parameters
2025-05-14 12:40:00 +00:00
Kirire
935bbbc711
Add config validation and style tweaks (#37589)
* Add config validation and style tweaks

* Fix style issues

* Fix style issues

* style

* Small fixes for copy/paste errors

---------

Co-authored-by: Cyrile <cyrile.delestre@arkea.com>
2025-05-14 12:22:10 +00:00
ivarflakstad
1b00966395
Fix auto batch size finder test (#38125)
Ensure --auto_find_batch_size is the last test arg so indexing is correct
2025-05-14 12:12:04 +00:00
Ritwick Chaudhry
fe918d13b9
Fix temporal padding in Qwen2VLImageProcessor when the number of frames is not divisible by temporal_patch_size (#38076)
Qwen2VL: Fix temporal padding in Qwen2VLImageProcessor when frames are not divisible by temporal_patch_size
2025-05-14 12:28:21 +02:00
Raushan Turganbay
aaf224d570
[video processor] fix tests (#38104)
* fix tests

* delete

* fix one more test

* fix qwen + some tests are failing irrespective of `VideoProcessor`

* delete file
2025-05-14 10:24:07 +00:00
Yao Matrix
9b5ce556aa
enable finegrained_fp8 and granite_speech cases on XPU (#38036)
* enable finegrained_fp8 cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* change back to auto

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* rename per comments

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-05-14 08:58:40 +00:00
bilibili12433014
b311a3f506
Fix description and formatting errors in code docs (#38074)
* Update stopping_criteria.py

Fix description and formatting errors.

* Update stopping_criteria.py

Align formatting with existing files for consistency.
2025-05-13 17:17:15 +00:00
Marc Sun
b499a14b17
Add style bot (#38102)
add style bot
2025-05-13 19:07:17 +02:00
eustlb
e0f225cb10
[CSM] update test for t4 runners (#38110)
update test for t4 runners
2025-05-13 11:59:26 -04:00
Jinyong Lee
342961f669
Add Fast Image Processor for vilt (#37304)
* init vilt image processor fast

* Refactor image processor tests to use loop for all processors

* Add ViltImageProcessorFast with PyTorch-based optimized image processing

* Change made automatically by make fixup command

* Change made automatically by make fix-copies command

* Fix type hints in ViltImageProcessorFast for Python compatibility

* Define constants for image resizing based on COCO dataset aspect ratio

* Add missing property initializations to ViltImageProcessorFast

* Extract resize logic into dedicated method in ViltImageProcessorFast

* Extract padding logic into dedicated method

* Implement shape-based image grouping for optimized processing in Vilt

* Update test suite to verify ViltImageProcessorFast attributes

* Move variable declarations to _preprocess method parameters

* Remove unused parameters

* Rename _resize method to resize to override existing function

* Remove whitespace

* Remove unnecessary type check and conversion for stacked_images

* Remove redundant loop and apply padding directly to stacked images

* Refactor pad function to return images and mask as tuple instead of dict

* Add tests comparing padding masks in slow and fast implementations

* Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations

* Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast

* Move docstrings of custom args to ViltFastImageProcessorKwargs

* Use reorder_images function for both masks and images

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-05-13 15:40:53 +00:00
Yoni Gozlan
8771766a70
Fix InternVL interpolate_pos_encoding and add to video_processing_auto (#38092)
* fix InternVL interpolate_pos_encoding

* fix modular and auto_video_processor for internvl
2025-05-13 11:18:40 -04:00
Yih-Dar
582d5e0e11
fix check_bad commit.py gives wrong results (#38107)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-13 16:58:22 +02:00
youngrok cha
a5cc7a67d7
[bug] fix llava processor to calculate unpadding size correctly (#37988)
* fix llava processor to calculate unpad size correctly

* repo consistency

* Revert "repo consistency" & "setUp in llava family"

This reverts commit 26a50af8db.

* add edge case test for padding & unpadding

* compute unpadding size from original size

* make test config explicit

* Revert "compute unpadding size from original size"

This reverts commit 752cd27ad9.

* Revert "add edge case test for padding & unpadding"

This reverts commit ccbd094d69.

* revert unpad logic

* remove irrelevant tests

* model test

* remove processor from model test

---------

Co-authored-by: jaycha <jaycha@ncsoft.com>
2025-05-13 13:49:09 +00:00
Chris
67b3d45eb6
Fix past_key_values type hint in model output types (#37953)
* F: Fix type hint.

* F: Use Cache type.

* F: Sort import.

* U: Format.

* U: Address reviews.
2025-05-13 13:36:49 +00:00
Eva Koroleva
07feaad8fb
Fix bug in prefill_chunk_size that ignores disable_compile flag (#38067)
Fix bug in prefill_chunk_size implementation that ignores disable_compile flag
2025-05-13 13:23:23 +00:00
Raushan Turganbay
e40f301f1f
[smolvlm] skip the test (#38099)
skip the test
2025-05-13 12:50:43 +00:00
ivarflakstad
e27d230ddd
Disable report callbacks for certain training tests (#38088)
* Disable report callbacks for certain training tests

* Disable report callbacks for test_auto_batch_size_finder
2025-05-13 14:49:55 +02:00
Bongseok Lee
ab65ba47ad
fix: Propagate lr_scheduler_kwargs options to create LR Scheduler when LayerWiseDummyOptimizer is used (#34559)
fix: fix get_scheduler
2025-05-13 13:56:45 +02:00
Fanli Lin
8fb60bf6be
add timeout for downloading the librispeech_asr dataset (#38073)
* add timeout

* change 10 to 60
2025-05-13 11:50:12 +01:00
Yih-Dar
3ad35d0bca
update require_read_token (#38093)
* update require_read_token

* new repo

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-13 12:07:07 +02:00
Yoni Gozlan
e3b70b0d1c
Refactor image processor phi4 (#36976)
* refactor image processor phi4

* nits fast image proc

* add image tests phi4

* Fix image processing tests

* update integration tests

* remove revision and add comment in integration tests
2025-05-12 15:13:40 -04:00
Yih-Dar
4143f94d51
uninstall kernels from docker images (#38083)
uninstall kernels

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-12 18:03:47 +02:00
Shiyu
a63cb7578e
update seed_worker to set seed based on worker_id and rank (#37980)
* update seed_worker to set seed based on worker_id and rank

* test case

* set output_dir as remove tmp dir
2025-05-12 15:59:16 +00:00
efsotr
e387821a96
Fix tot update in trainer (#37923)
* fix total updates in epoch

* add test; fix max_steps

* replace with multi-gpu decorator
2025-05-12 17:45:24 +02:00
Weipeng Jiang
f0e975c6cf
fix the inconsist docstring in apply_chat_template (#38069)
The commit (5cf11e5ab9) fixed the type hints for the parameter `tools` in apply_chat_template, but the docstring was not changed.
2025-05-12 16:32:01 +01:00
Junlin Zhou
31791b16a1
chore(qwen2): display warning log only when sliding window attention … (#36316)
* chore(qwen2): display warning log only when sliding window attention is enabled

* Align modeling_qwen2.py and modular_qwen2.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-05-12 16:31:44 +01:00
ivarflakstad
8ea72d12a2
Fix mt5 test on AMD devices (#38081) 2025-05-12 16:59:00 +02:00
谭九鼎
5c85018072
docs: fix md style (#38057) 2025-05-12 15:56:31 +01:00
ivarflakstad
7eaa90b87b
Add AMD expectation to test_gpt2_sample (#38079) 2025-05-12 16:51:21 +02:00
Pavel Iakubovskii
4220039b29
Fix OneFormer integration test (#38016)
* Fix integration tests

* format
2025-05-12 16:02:41 +02:00