Marc Sun
79d6f9fd70
Log the correct learning rate ( #36973 )
...
* fix learning rate log
* fix lr log
* add lr
2025-03-26 16:52:00 +01:00
Mohamed Mekkouri
13d36e89fe
Fix device_map check for ggml files ( #37003 )
...
fix
2025-03-26 16:24:57 +01:00
Josh Marshall
021006e1b0
Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support. ( #36975 )
...
* Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.
Related to https://github.com/bitsandbytes-foundation/bitsandbytes/issues/1573 and https://github.com/huggingface/transformers/issues/36949 , this resolves a bug in allowing ROCm/HIP support in bitsandbytes.
* Related to bitsandbytes-foundation/bitsandbytes#1573 and huggingface#36949 , this resolves a bug in the biteandbytes integration, allowing ROCm/HIP support in bitsandbytes.
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-26 16:18:08 +01:00
Cyril Vallez
788e1092e9
Allow easy registration of custom attention functions ( #36889 )
...
* Update modeling_utils.py
* style
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* add to init
* Update modeling_utils.py
* style
* update
* Update modeling_utils.py
* Update modeling_utils.py
* style
* Add some doc
* Update _toctree.yml
* readd it for tgi/vllm compat
* CIs
* CIs
2025-03-26 16:15:06 +01:00
ivarflakstad
ad5d40de9c
Fix get_device_properties ( #36997 )
...
Fix remove remnant self from get_device_properties
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-26 15:46:34 +01:00
cyyever
8084b26294
Fix Optional type annotation ( #36841 )
...
* Fix annotation
* Update src/transformers/generation/candidate_generator.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-26 13:53:44 +00:00
Yih-Dar
b56d8f07e4
Install networkx==3.2.1
manually in some CircleCI jobs after #36957 ( #37000 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 14:49:09 +01:00
cyyever
78afa1c537
Use torch.expm1 ( #36995 )
2025-03-26 13:06:33 +00:00
Yih-Dar
181d453069
byebye CircleCI TF jobs ( #36998 )
...
* byebye tf jobs
* byebye tf jobs
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 12:49:50 +01:00
cyyever
e7139d06f5
Fix tensor dtype mismatch ( #36985 )
...
* Fix tensor dtype mismatch
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 10:37:46 +01:00
Yoni Gozlan
be37d34f44
🚨 Deprecate legacy argument for image-text-to-text models and adopt new behavior by default ( #36307 )
...
* deprecate legacy argument and adopt new behavior by default
* revert back modification git
2025-03-25 17:32:17 -04:00
Yih-Dar
ab4656f6b7
update bot comment again ( #36974 )
...
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 19:42:09 +01:00
cyyever
ba531278ca
Add ruff target-version ( #36971 )
2025-03-25 19:41:25 +01:00
Steven Liu
a844297088
[docs] Fix image link ( #36869 )
...
* fix image link
* fix
* update
* fix
2025-03-25 11:34:21 -07:00
cyyever
d68a91aebf
Remove extra tensor clone in PyTorch code ( #36748 )
...
* Use detach().clone()
* Eliminate continuous()
* Merge clone and other calls with to
* Merge clone and other calls with to
2025-03-25 17:42:15 +00:00
Yih-Dar
121830ab47
update examples after ruff being updated ( #36972 )
...
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 18:15:47 +01:00
Sai-Suraj-27
a41677a68b
Updated docker files to use uv
for installing packages ( #36957 )
...
* Updated docker files to use uv pip install as uv is blazingly fast.
* Removed -y flag for uv pip uninstall.
* Passed --no-build-isolation flag
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 18:12:51 +01:00
NargiT
3dce98a437
typo fixed in README_fr.md ( #36951 )
2025-03-25 09:29:36 -07:00
湛露先生
ebd2029483
Change GPUS to GPUs ( #36945 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 17:25:39 +01:00
Yih-Dar
69632aadb7
Update after #36962 ( #36965 )
...
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:16:06 +01:00
Yih-Dar
c6814b4ee8
Update ruff to 0.11.2
( #36962 )
...
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
Joao Gante
bc1c90a755
[Utils] torch version checks optionally accept dev versions ( #36847 )
2025-03-25 10:58:58 +00:00
Marc Sun
80b4c5dcc9
Fix cuda index issue in cache allocator ( #36937 )
...
fix
2025-03-25 11:51:41 +01:00
Raushan Turganbay
0f733110a6
Support return_tensors
in audio chat templates ( #34601 )
...
* add audio chat templates
* update
* update
* nit
* green ci
* we dont care about the order anymore
* clean up after rebase
* overriden tests rename
* rename shieldgemma also
* one more rename
* require_read_token
* removde images/videos
* retrigger CI flaky
2025-03-25 11:08:47 +01:00
Afanti
19085c28da
fix typos in the tests directory ( #36932 )
...
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: format codes
2025-03-25 10:49:24 +01:00
Guang Yang
69bcb86c58
Export for Phi4-mini ( #36780 )
...
* Export for Phi4-mini
* Update tests/models/phi3/test_modeling_phi3.py
---------
Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 10:46:38 +01:00
Mohamed Mekkouri
be2c0e7bff
Fixing _pre_quantization_dtype when torch_dtype is None ( #36930 )
...
fix
2025-03-25 10:43:27 +01:00
Cyril Vallez
4303d88c09
Add Phi4 multimodal ( #36939 )
...
* raw start
* update
* update
* add to imports
* update
* up
* simplify configs
* clean configs
* style
* typos
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* fix
* up
* up
* up
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* up
* up
* up
* Update feature_extraction_phi4_multimodal.py
* up
* up
* up
* up
* up
* simplify configs
* typo
* cut code
* typo
* typo
* typo
* re
* typo
* up
* up
* up
* add tests
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* up
* Update test_modeling_phi4_multimodal.py
* doc
* fix
* up
* up
* up
* up
* up
* up
* simplify
* up
* simplify
* config docstrings
* cleanup
* clean
* typo
* typo
* fix
* Update phi4_multimodal.md
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* update
* simplify reshapes and permutes
* up
* simplify special tokens
* simplify processor a lot
* Update processing_phi4_multimodal.py
* Update processing_phi4_multimodal.py
* switch to fast processor
* image processor
* Update image_processing_phi4_multimodal_fast.py
* add lora extraction to converter
* Update convert_phi4_multimodal_weights_to_hf.py
* Update __init__.py
* add AudioInput type in audio_utils
* rewrite feature_extraction: support torch batched FFT
* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values
* test update
* not mono channel warning update
* remove auto maps from processor
* kargs dispatch in processor
* simplify kwargs dispatch
* simplify merging
* remove default sampling rate
* style
* Update test_modeling_phi4_multimodal.py
* update doc
* doc
* torch only feature extractor
* make fake tokens adjustable
* Update feature_extraction_phi4_multimodal.py
* fix
* Update processing_phi4_multimodal.py
* simplify mask
* last touch
* fix copies
* style
* Update audio_utils.py
* style
* Update feature_extraction_phi4_multimodal.py
* Update __init__.py
* docstrings
* copies
* fix all checks
* back to fix-copies
* trigger CIs
* Update feature_extraction_phi4_multimodal.py
* improve tests with multimodal inputs
* trigger CIs
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
Raushan Turganbay
47e5432805
Deprecate #36741 and map Causal to Conditional ( #36917 )
...
* deprecate the prev fix
* reword warning and update docs
* reword warning
* tests
* dont bloat `get_text_config()`
2025-03-25 09:13:56 +01:00
Mohamed Mekkouri
2b8a15cc3f
Disallow Offload to disk for gguf files ( #36933 )
...
update
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-24 19:30:01 +01:00
Yoni Gozlan
91455c1825
Fix processor kwargs qwen2 vl ( #36890 )
...
* Fix qwen2_vl and qwen2_5_vl processors cutom images kwargs
* change version warning
2025-03-24 13:19:26 -04:00
gautham
48385aa4f4
Added support for seed in DataCollatorForWholeWordMask
( #36903 )
...
* Added support for seed in `DataCollatorForWholeWordMask`, and also wrote tests.
Also fixed bugs where the code hardcoded values for mask replacement probability and random replacement probability, instead of using the values passed by the user.
* formatting issues
* Used better way to generate seed in TF. Made tests more consistent.
2025-03-24 16:57:17 +00:00
Yih-Dar
5932606d8e
More precise comment ( #36935 )
...
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-24 17:03:09 +01:00
Pavel Iakubovskii
2be2984462
Fix pytorch defomr attn path ( #36923 )
...
* Fix pytorch path for DeformableAttention
* Apply for GroundingDino
2025-03-24 15:58:51 +00:00
cyyever
00d077267a
[2/N] Use pyupgrade --py39-plus to improve code ( #36857 )
...
Use pyupgrade --py39-plus to improve code
2025-03-24 15:42:25 +00:00
Ethan Knights
a6ecb54159
Update trainer_pt_utils.py
docstrings for consistency ( #36912 )
...
* Update trainer_pt_utils.py
* update docstrings trainer_pt_utils.py for consistency
* Update src/transformers/trainer_pt_utils.py
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-03-24 14:46:41 +00:00
omahs
cbf924b76c
Fix typos ( #36910 )
...
* fix typos
* fix typos
* fix typos
* fix typos
2025-03-24 14:08:29 +00:00
Yih-Dar
340500b1a9
Use another repo. for Mistral3 processor testing ( #36925 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-24 14:36:05 +01:00
Mohamed Mekkouri
9e125d9a2e
Fix Compressed tensors to_dict_diff ( #36922 )
...
fix
2025-03-24 13:06:33 +01:00
Raushan Turganbay
57f551c78d
[chameleon] fix num image token check ( #36918 )
...
* [chameleon] fix num image token check
* embed after merging image token
* skip this also
* mistral require_read_token
2025-03-24 12:36:08 +01:00
Dmitry Rogozhkin
a41e08aa19
tests: fix asyncio.wait() usage for python>=3.11 ( #36898 )
...
tests: fix asyncio.wait() usage for python>=3.7
Passing coroutings directly to `asyncio.wait()` is deprecated since
python 3.8 and removed starting from python 3.11. Instead, it's required
to explicitly wrap coroutine in the task with `asyncio.create_task()` which
first appeared in python 3.7.
We step into this issue running the following Transformers tests on a
system with python 3.11 or later (for example, Ubuntu 24.04 has python 3.12):
* `tests/trainer/test_trainer_distributed.py`
* `tests/extended/test_trainer_ext.py`
The error will be:
```
src/transformers/testing_utils.py:2380: in execute_subprocess_async
result = loop.run_until_complete(
/usr/lib/python3.12/asyncio/base_events.py:687: in run_until_complete
return future.result()
src/transformers/testing_utils.py:2368: in _stream_subprocess
await asyncio.wait(
...
E TypeError: Passing coroutines is forbidden, use tasks explicitly.
```
See: https://docs.python.org/3.10/library/asyncio-task.html#asyncio.wait
See: https://docs.python.org/3.10/library/asyncio-task.html#asyncio.wait
See: https://docs.python.org/3.7/library/asyncio-task.html#asyncio.create_task
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-24 11:53:59 +01:00
XinyuanTong
e28be7a692
[Fix] Add original_max_position_embeddings
to YARN rope_scaling optional keys ( #36877 )
...
[fix] Update optional keys in _validate_yarn_parameters to include original_max_position_embeddings
2025-03-24 11:05:19 +01:00
Raushan Turganbay
48da44be24
Fix torch version guard at import ( #36907 )
...
fix
2025-03-24 10:33:33 +01:00
AbdelKarim ELJANDOUBI
fe4ca2f4a7
fix Gemma3 Config ( #36893 )
...
* fix Gemma3 Config
* fix config in modular gemm3
2025-03-24 10:05:44 +01:00
Aritra Roy Gosthipaty
c9d1e5238a
Update installation.md ( #36826 )
...
* Update installation.md
* Update README.md
2025-03-21 16:32:02 -07:00
Steven Liu
d253de6d58
[docs] Model docs ( #36469 )
...
* initial
* fix
* fix
* update
* fix
* fixes
* quantization
* attention mask visualizer
* multimodal
* small changes
* fix code samples
2025-03-21 15:35:22 -07:00
Yoni Gozlan
beb9b5b022
Fix Pan and Scan on batched images Gemma3 ( #36864 )
...
* process flattened images in fast image proc
* process flattened images in low proc and add tests
* remove print
* add unbalanced batch test pas image proc
* fix integration tests
2025-03-21 13:56:00 -04:00
Cyril Vallez
dd3933dd65
Simplify keep_in_fp32_modules logic ( #36722 )
...
* better regex everywhere
* fix
* Update test_modeling_instructblip.py
* BC with explanations this time otherwise it makes no sense at all
* Update test_modeling_instructblip.py
* style
* CIs
* update _keep_in_fp32_modules in blip2
* Update modeling_utils.py
* Update modeling_utils.py
* style
* CIs
* add check
* trigger CIs
* Update modeling_utils.py
* trigger CIs
2025-03-21 16:12:59 +01:00
Sukriti Sharma
90e2df5d55
fix: loss computation after embeddings resize - mllama ( #36840 )
...
* move loss to generation class
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* code cleanup
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* test for resize and loss computation
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix:test for resize and loss
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix resize embedding mllama test
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* review changes
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
---------
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
2025-03-21 14:47:59 +01:00
Arthur Zucker
4542b8fb27
push v4.51.0.dev0
2025-03-21 13:45:25 +01:00