Commit Graph

19383 Commits

Author SHA1 Message Date
Marc Sun
79d6f9fd70
Log the correct learning rate (#36973)
* fix learning rate log

* fix lr log

* add lr
2025-03-26 16:52:00 +01:00
Mohamed Mekkouri
13d36e89fe
Fix device_map check for ggml files (#37003)
fix
2025-03-26 16:24:57 +01:00
Josh Marshall
021006e1b0
Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support. (#36975)
* Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.

Related to https://github.com/bitsandbytes-foundation/bitsandbytes/issues/1573 and https://github.com/huggingface/transformers/issues/36949 , this resolves a bug in allowing ROCm/HIP support in bitsandbytes.

* Related to bitsandbytes-foundation/bitsandbytes#1573 and huggingface#36949 , this resolves a bug in the biteandbytes integration, allowing ROCm/HIP support in bitsandbytes.

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-26 16:18:08 +01:00
Cyril Vallez
788e1092e9
Allow easy registration of custom attention functions (#36889)
* Update modeling_utils.py

* style

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* add to init

* Update modeling_utils.py

* style

* update

* Update modeling_utils.py

* Update modeling_utils.py

* style

* Add some doc

* Update _toctree.yml

* readd it for tgi/vllm compat

* CIs

* CIs
2025-03-26 16:15:06 +01:00
ivarflakstad
ad5d40de9c
Fix get_device_properties (#36997)
Fix remove remnant self from get_device_properties

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-26 15:46:34 +01:00
cyyever
8084b26294
Fix Optional type annotation (#36841)
* Fix annotation

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-26 13:53:44 +00:00
Yih-Dar
b56d8f07e4
Install networkx==3.2.1 manually in some CircleCI jobs after #36957 (#37000)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 14:49:09 +01:00
cyyever
78afa1c537
Use torch.expm1 (#36995) 2025-03-26 13:06:33 +00:00
Yih-Dar
181d453069
byebye CircleCI TF jobs (#36998)
* byebye tf jobs

* byebye tf jobs

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 12:49:50 +01:00
cyyever
e7139d06f5
Fix tensor dtype mismatch (#36985)
* Fix tensor dtype mismatch

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 10:37:46 +01:00
Yoni Gozlan
be37d34f44
🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default (#36307)
* deprecate legacy argument and adopt new behavior by default

* revert back modification git
2025-03-25 17:32:17 -04:00
Yih-Dar
ab4656f6b7
update bot comment again (#36974)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 19:42:09 +01:00
cyyever
ba531278ca
Add ruff target-version (#36971) 2025-03-25 19:41:25 +01:00
Steven Liu
a844297088
[docs] Fix image link (#36869)
* fix image link

* fix

* update

* fix
2025-03-25 11:34:21 -07:00
cyyever
d68a91aebf
Remove extra tensor clone in PyTorch code (#36748)
* Use detach().clone()

* Eliminate continuous()

* Merge clone and other calls with to

* Merge clone and other calls with to
2025-03-25 17:42:15 +00:00
Yih-Dar
121830ab47
update examples after ruff being updated (#36972)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 18:15:47 +01:00
Sai-Suraj-27
a41677a68b
Updated docker files to use uv for installing packages (#36957)
* Updated docker files to use uv pip install as uv is blazingly fast.

* Removed -y flag for uv pip uninstall.

* Passed --no-build-isolation flag

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 18:12:51 +01:00
NargiT
3dce98a437
typo fixed in README_fr.md (#36951) 2025-03-25 09:29:36 -07:00
湛露先生
ebd2029483
Change GPUS to GPUs (#36945)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 17:25:39 +01:00
Yih-Dar
69632aadb7
Update after #36962 (#36965)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:16:06 +01:00
Yih-Dar
c6814b4ee8
Update ruff to 0.11.2 (#36962)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
Joao Gante
bc1c90a755
[Utils] torch version checks optionally accept dev versions (#36847) 2025-03-25 10:58:58 +00:00
Marc Sun
80b4c5dcc9
Fix cuda index issue in cache allocator (#36937)
fix
2025-03-25 11:51:41 +01:00
Raushan Turganbay
0f733110a6
Support return_tensors in audio chat templates (#34601)
* add audio chat templates

* update

* update

* nit

* green ci

* we dont care about the order anymore

* clean up after rebase

* overriden tests rename

* rename shieldgemma also

* one more rename

* require_read_token

* removde images/videos

* retrigger CI flaky
2025-03-25 11:08:47 +01:00
Afanti
19085c28da
fix typos in the tests directory (#36932)
* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: format codes
2025-03-25 10:49:24 +01:00
Guang Yang
69bcb86c58
Export for Phi4-mini (#36780)
* Export for Phi4-mini

* Update tests/models/phi3/test_modeling_phi3.py

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 10:46:38 +01:00
Mohamed Mekkouri
be2c0e7bff
Fixing _pre_quantization_dtype when torch_dtype is None (#36930)
fix
2025-03-25 10:43:27 +01:00
Cyril Vallez
4303d88c09
Add Phi4 multimodal (#36939)
* raw start

* update

* update

* add to imports

* update

* up

* simplify configs

* clean configs

* style

* typos

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* fix

* up

* up

* up

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* up

* up

* up

* Update feature_extraction_phi4_multimodal.py

* up

* up

* up

* up

* up

* simplify configs

* typo

* cut code

* typo

* typo

* typo

* re

* typo

* up

* up

* up

* add tests

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* up

* Update test_modeling_phi4_multimodal.py

* doc

* fix

* up

* up

* up

* up

* up

* up

* simplify

* up

* simplify

* config docstrings

* cleanup

* clean

* typo

* typo

* fix

* Update phi4_multimodal.md

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* update

* simplify reshapes and permutes

* up

* simplify special tokens

* simplify processor a lot

* Update processing_phi4_multimodal.py

* Update processing_phi4_multimodal.py

* switch to fast processor

* image processor

* Update image_processing_phi4_multimodal_fast.py

* add lora extraction to converter

* Update convert_phi4_multimodal_weights_to_hf.py

* Update __init__.py

* add AudioInput type in audio_utils

* rewrite feature_extraction: support torch batched FFT

* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values

* test update

* not mono channel warning update

* remove auto maps from processor

* kargs dispatch in processor

* simplify kwargs dispatch

* simplify merging

* remove default sampling rate

* style

* Update test_modeling_phi4_multimodal.py

* update doc

* doc

* torch only feature extractor

* make fake tokens adjustable

* Update feature_extraction_phi4_multimodal.py

* fix

* Update processing_phi4_multimodal.py

* simplify mask

* last touch

* fix copies

* style

* Update audio_utils.py

* style

* Update feature_extraction_phi4_multimodal.py

* Update __init__.py

* docstrings

* copies

* fix all checks

* back to fix-copies

* trigger CIs

* Update feature_extraction_phi4_multimodal.py

* improve tests with multimodal inputs

* trigger CIs

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
Raushan Turganbay
47e5432805
Deprecate #36741 and map Causal to Conditional (#36917)
* deprecate the prev fix

* reword warning and update docs

* reword warning

* tests

* dont bloat `get_text_config()`
2025-03-25 09:13:56 +01:00
Mohamed Mekkouri
2b8a15cc3f
Disallow Offload to disk for gguf files (#36933)
update

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-24 19:30:01 +01:00
Yoni Gozlan
91455c1825
Fix processor kwargs qwen2 vl (#36890)
* Fix qwen2_vl and qwen2_5_vl processors cutom images kwargs

* change version warning
2025-03-24 13:19:26 -04:00
gautham
48385aa4f4
Added support for seed in DataCollatorForWholeWordMask (#36903)
* Added support for seed in `DataCollatorForWholeWordMask`, and also wrote tests.

Also fixed bugs where the code hardcoded values for mask replacement probability and random replacement probability, instead of using the values passed by the user.

* formatting issues

* Used better way to generate seed in TF. Made tests more consistent.
2025-03-24 16:57:17 +00:00
Yih-Dar
5932606d8e
More precise comment (#36935)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-24 17:03:09 +01:00
Pavel Iakubovskii
2be2984462
Fix pytorch defomr attn path (#36923)
* Fix pytorch path for DeformableAttention

* Apply for GroundingDino
2025-03-24 15:58:51 +00:00
cyyever
00d077267a
[2/N] Use pyupgrade --py39-plus to improve code (#36857)
Use pyupgrade --py39-plus to improve code
2025-03-24 15:42:25 +00:00
Ethan Knights
a6ecb54159
Update trainer_pt_utils.py docstrings for consistency (#36912)
* Update trainer_pt_utils.py

* update docstrings trainer_pt_utils.py for consistency

* Update src/transformers/trainer_pt_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-03-24 14:46:41 +00:00
omahs
cbf924b76c
Fix typos (#36910)
* fix typos

* fix typos

* fix typos

* fix typos
2025-03-24 14:08:29 +00:00
Yih-Dar
340500b1a9
Use another repo. for Mistral3 processor testing (#36925)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-24 14:36:05 +01:00
Mohamed Mekkouri
9e125d9a2e
Fix Compressed tensors to_dict_diff (#36922)
fix
2025-03-24 13:06:33 +01:00
Raushan Turganbay
57f551c78d
[chameleon] fix num image token check (#36918)
* [chameleon] fix num image token check

* embed after merging image token

* skip this also

* mistral require_read_token
2025-03-24 12:36:08 +01:00
Dmitry Rogozhkin
a41e08aa19
tests: fix asyncio.wait() usage for python>=3.11 (#36898)
tests: fix asyncio.wait() usage for python>=3.7

Passing coroutings directly to `asyncio.wait()` is deprecated since
python 3.8 and removed starting from python 3.11. Instead, it's required
to explicitly wrap coroutine in the task with `asyncio.create_task()` which
first appeared in python 3.7.

We step into this issue running the following Transformers tests on a
system with python 3.11 or later (for example, Ubuntu 24.04 has python 3.12):

* `tests/trainer/test_trainer_distributed.py`
* `tests/extended/test_trainer_ext.py`

The error will be:
```
src/transformers/testing_utils.py:2380: in execute_subprocess_async
    result = loop.run_until_complete(
/usr/lib/python3.12/asyncio/base_events.py:687: in run_until_complete
    return future.result()
src/transformers/testing_utils.py:2368: in _stream_subprocess
    await asyncio.wait(
...
E           TypeError: Passing coroutines is forbidden, use tasks explicitly.

```

See: https://docs.python.org/3.10/library/asyncio-task.html#asyncio.wait
See: https://docs.python.org/3.10/library/asyncio-task.html#asyncio.wait
See: https://docs.python.org/3.7/library/asyncio-task.html#asyncio.create_task

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-24 11:53:59 +01:00
XinyuanTong
e28be7a692
[Fix] Add original_max_position_embeddings to YARN rope_scaling optional keys (#36877)
[fix] Update optional keys in _validate_yarn_parameters to include original_max_position_embeddings
2025-03-24 11:05:19 +01:00
Raushan Turganbay
48da44be24
Fix torch version guard at import (#36907)
fix
2025-03-24 10:33:33 +01:00
AbdelKarim ELJANDOUBI
fe4ca2f4a7
fix Gemma3 Config (#36893)
* fix Gemma3 Config

* fix config in modular gemm3
2025-03-24 10:05:44 +01:00
Aritra Roy Gosthipaty
c9d1e5238a
Update installation.md (#36826)
* Update installation.md

* Update README.md
2025-03-21 16:32:02 -07:00
Steven Liu
d253de6d58
[docs] Model docs (#36469)
* initial

* fix

* fix

* update

* fix

* fixes

* quantization

* attention mask visualizer

* multimodal

* small changes

* fix code samples
2025-03-21 15:35:22 -07:00
Yoni Gozlan
beb9b5b022
Fix Pan and Scan on batched images Gemma3 (#36864)
* process flattened images in fast image proc

* process flattened images in low proc and add tests

* remove print

* add unbalanced batch test pas image proc

* fix integration tests
2025-03-21 13:56:00 -04:00
Cyril Vallez
dd3933dd65
Simplify keep_in_fp32_modules logic (#36722)
* better regex everywhere

* fix

* Update test_modeling_instructblip.py

* BC with explanations this time otherwise it makes no sense at all

* Update test_modeling_instructblip.py

* style

* CIs

* update _keep_in_fp32_modules in blip2

* Update modeling_utils.py

* Update modeling_utils.py

* style

* CIs

* add check

* trigger CIs

* Update modeling_utils.py

* trigger CIs
2025-03-21 16:12:59 +01:00
Sukriti Sharma
90e2df5d55
fix: loss computation after embeddings resize - mllama (#36840)
* move loss to generation class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* code cleanup

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* test for resize and loss computation

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix tests

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix:test for resize and loss

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix resize embedding mllama test

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* review changes

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
2025-03-21 14:47:59 +01:00
Arthur Zucker
4542b8fb27 push v4.51.0.dev0 2025-03-21 13:45:25 +01:00