Joao Gante
ece8c42488
Test: generate with torch.compile(model.forward)
as a fast test ( #34544 )
2025-01-28 14:10:38 +00:00
Cyril Vallez
f48ecd7608
Fix TP initialization ( #35860 )
...
* fix tp
* Update modeling_utils.py
* style
* style
* Update test_tp.py
* Update test_tp.py
* style
* Update test_tp.py
* Update test_tp.py
* Update test_tp.py
* Update test_tp.py
2025-01-28 15:07:37 +01:00
Raushan Turganbay
f85ba20449
Qwen-2-5-VL: fix CI ( #35935 )
...
fix
2025-01-28 14:51:57 +01:00
Cyril Vallez
3f860dba55
Fix mask slicing for models with HybridCache ( #35681 )
...
* correctly slice
* check mask
* Update modular_gemma2.py
* fix
* add tests
* fix typo
* finally fix mask slicing
* Finally correctly slice in all cases!!
* add test for all attention functions
* small fix in tests
* trick around dynamo tracing issue
* last update
* more robust
* kwargs propagation
* make it explicit for checkpointing
* apply modular
2025-01-28 14:35:00 +01:00
Raushan Turganbay
b764c20b09
Fix: loading DBRX back from saved path ( #35728 )
...
* fix dtype as dict for some models + add test
* add comment in tests
2025-01-28 11:38:45 +01:00
Cyril Vallez
3613f568cd
Add default TP plan for all models with backend support ( #35870 )
...
* Add some tp plans!
* More tp plans!
* Add it in the comment
* style
* Update configuration_mixtral.py
* Update configuration_phi.py
* update the layout according to special archs
* fix mixtral
* style
* trigger CIs
* trigger CIs
* CIs
* olmo2
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-28 11:20:58 +01:00
ivarflakstad
96625d85fd
Use rocm6.2 for AMD images ( #35930 )
...
* Use rocm6.2 as rocm6.3 only has nightly pytorch wheels atm
* Use stable wheel index for torch libs
2025-01-28 11:10:28 +01:00
Yih-Dar
bf16a182ba
Remove _supports_static_cache = True
for some model classes ( #34975 )
...
* use mask_fill
* remove comment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-28 10:42:10 +01:00
Steven Liu
86d7564611
[docs] Fix Zamba2 ( #35916 )
...
fix code block
2025-01-27 11:44:10 -08:00
Matt
414658f94f
Close Zamba2Config code block ( #35914 )
...
* close zamba2 code block
* Add Zamba2 to toctree
2025-01-27 19:09:42 +00:00
Matt
63e9c941eb
Fix the config class comparison for remote code models ( #35592 )
...
* Fix the config class comparison when repeatedly saving and loading remote code models
* once again you have committed your debug breakpoint
2025-01-27 18:37:30 +00:00
Steven Liu
c550a1c640
[docs] uv install ( #35821 )
...
uv install
2025-01-27 08:49:28 -08:00
CalOmnie
cd6591bfb2
Fix typing in audio_utils.chroma_filter_bank ( #35888 )
...
* Fix typing in audio_utils.chroma_filter_bank
* Apply make style
---------
Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-27 16:06:03 +00:00
Isotr0py
e57b459997
Split and clean up GGUF quantization tests ( #35502 )
...
* clean up ggml test
Signed-off-by: Isotr0py <2037008807@qq.com>
* port remaining tests
Signed-off-by: Isotr0py <2037008807@qq.com>
* further cleanup
Signed-off-by: Isotr0py <2037008807@qq.com>
* format
Signed-off-by: Isotr0py <2037008807@qq.com>
* fix broken tests
Signed-off-by: Isotr0py <2037008807@qq.com>
* update comment
Signed-off-by: Isotr0py <2037008807@qq.com>
* fix
Signed-off-by: Isotr0py <2037008807@qq.com>
* reorganize tests
Signed-off-by: Isotr0py <2037008807@qq.com>
* k-quants use qwen2.5-0.5B
Signed-off-by: Isotr0py <2037008807@qq.com>
* move ggml tokenization test
Signed-off-by: Isotr0py <2037008807@qq.com>
* remove dead code
Signed-off-by: Isotr0py <2037008807@qq.com>
* add assert for serilization test
Signed-off-by: Isotr0py <2037008807@qq.com>
* use str for parameterize
Signed-off-by: Isotr0py <2037008807@qq.com>
---------
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-27 15:46:57 +01:00
Ross Wightman
5c576f5a66
🚨 🚨 🚨 image-classification pipeline single-label and multi-label prob type squashing fns (sigmoid vs softmax) are backwards ( #35848 )
...
single-label and multi-label prob type squashing fns (sigmoid vs softmax) were backwards for image-classification pipeline
2025-01-27 15:34:57 +01:00
Mikhail Moskovchenko
5450e7c84a
🔴 🔴 🔴 Added segmentation maps
support for DPT image processor ( #34345 )
...
* Added `segmentation_maps` support for DPT image processor
* Added tests for dpt image processor
* Moved preprocessing into separate functions
* Added # Copied from statements
* Fixed # Copied from statements
* Added `segmentation_maps` support for DPT image processor
* Added tests for dpt image processor
* Moved preprocessing into separate functions
* Added # Copied from statements
* Fixed # Copied from statements
2025-01-27 15:14:00 +01:00
ivarflakstad
a50befa9b9
Update deepspeed amd image ( #35906 )
2025-01-27 14:32:36 +01:00
pglorio
33cb1f7b61
Add Zamba2 ( #34517 )
...
* First commit
* Finish model implementation
* First commit
* Finish model implementation
* Register zamba2
* generated modeling and configuration
* generated modeling and configuration
* added hybrid cache
* fix attention_mask in mamba
* dropped unused loras
* fix flash2
* config docstrings
* fix config and fwd pass
* make fixup fixes
* text_modeling_zamba2
* small fixes
* make fixup fixes
* Fix modular model converter
* added inheritances in modular, renamed zamba cache
* modular rebase
* new modular conversion
* fix generated modeling file
* fixed import for Zamba2RMSNormGated
* modular file cleanup
* make fixup and model tests
* dropped inheritance for Zamba2PreTrainedModel
* make fixup and unit tests
* Add inheritance of rope from GemmaRotaryEmbedding
* moved rope to model init
* drop del self.self_attn and del self.feed_forward
* fix tests
* renamed lora -> adapter
* rewrote adapter implementation
* fixed tests
* Fix torch_forward in mamba2 layer
* Fix torch_forward in mamba2 layer
* Fix torch_forward in mamba2 layer
* Dropped adapter in-place sum
* removed rope from attention init
* updated rope
* created get_layers method
* make fixup fix
* make fixup fixes
* make fixup fixes
* update to new attention standard
* update to new attention standard
* make fixup fixes
* minor fixes
* cache_position
* removed cache_position postion_ids use_cache
* remove config from modular
* removed config from modular (2)
* import apply_rotary_pos_emb from llama
* fixed rope_kwargs
* Instantiate cache in Zamba2Model
* fix cache
* fix @slow decorator
* small fix in modular file
* Update docs/source/en/model_doc/zamba2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* several minor fixes
* inherit mamba2decoder fwd and drop position_ids in mamba
* removed docstrings from modular
* reinstate zamba2 attention decoder fwd
* use regex for tied keys
* Revert "use regex for tied keys"
This reverts commit 9007a522b1
.
* use regex for tied keys
* add cpu to slow forward tests
* dropped config.use_shared_mlp_adapter
* Update docs/source/en/model_doc/zamba2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* re-convert from modular
---------
Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-27 10:51:23 +01:00
Sugendran Ganess
14a9bb520e
Fix fast image processor warnings in object detection examples ( #35892 )
...
Have the DETR examples default to using the fast image processor
2025-01-27 08:32:44 +00:00
Steven Liu
f11f57c925
[doctest] Fixes ( #35863 )
...
doctest fixes
2025-01-26 15:26:38 -08:00
Yih-Dar
fc269f77da
Add Rocketknight1
to self-comment-ci.yml
( #35881 )
...
my bad
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-24 19:07:07 +00:00
Fanli Lin
bcb841f007
add xpu device check in device_placement ( #35865 )
...
add xpu device
2025-01-24 19:13:07 +01:00
Arthur
b912f5ee43
use torch.testing.assertclose instead to get more details about error in cis ( #35659 )
...
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
2025-01-24 16:55:28 +01:00
Suyuchen Wang
72d1a4cd53
Fix Llava-NeXT / Llava-NeXT Video / Llava-OneVision's token unpadding mismatch ( #35779 )
...
* Fix Llava OneVision's token padding
* Fix Llava next and Llava next video's token unpadding for consistency
2025-01-24 09:10:27 +01:00
CalOmnie
b5aaf87509
Fix test_pipelines_video_classification
that was always failing ( #35842 )
...
* Fix test_pipelines_video_classification that was always failing
* Update video pipeline docstring to reflect actual return type
---------
Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-23 19:22:32 +01:00
baoyf4244
328e2ae4c0
fix apply_chat_template() padding choice ( #35828 )
...
fix apply_chat_template() padding choice to bool, str, PaddingStrategy and the docstring of pad()
2025-01-23 17:32:32 +00:00
SilverSoldier
d2a424b550
Fix typo ( #35854 )
2025-01-23 17:32:18 +00:00
Yosshi999
045c02f209
[DOC] Fix contamination and missing paragraph in translation ( #35851 )
...
Fix contamination and missing paragraph in translation
2025-01-23 08:33:44 -08:00
Alex Brooks
71cc8161b2
Granite Vision Support ( #35579 )
...
* Add multimodal granite support
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Support multiple image feature layres
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Remove failing validation for visual encoders with no cls
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update llava based models / configs to support list of feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add tests for multiple feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use conditional instead of except for misaligned feature shapes
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
* crop cls from each hidden state
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
* Fix formatting
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Support single vision feature int in vipllava
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix typo in vision feature selection strategy validation
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
* Add tentative integration test for granite vision models
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
* Add granite vision docs
Replace multimodal granite refs with granite vision
Add granite vision / llava next alias
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
* Use image url in granitevision example
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
---------
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-01-23 17:15:52 +01:00
Arthur
8f1509a96c
Fix more CI tests ( #35661 )
...
add tooslow for the fat ones
2025-01-23 14:45:42 +01:00
Jack Roberts
0a950e0bbe
Fix uploading processors/tokenizers to WandB on train end ( #35701 )
...
* rename tokenizer to processing_class in WandbCallback.on_train_end
* rename tokenizer to processing_class in ClearMLCallback and DVCLiveCallback
2025-01-23 13:32:15 +01:00
張庭瑜
4ec425ffad
Fix GA loss for Deepspeed ( #35808 )
...
* Fix GA loss for Deepspeed
* Turn off loss scaling in DeepSpeed engine by scale_wrt_gas
* Add comment linking to PR
2025-01-23 11:45:02 +01:00
ShuaiBai623
f3f6c86582
add qwen2.5vl ( #35569 )
...
* add qwen2.5vl
* fix
* pass check table
* add modular file
* fix style
* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py
Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>
* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py
Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>
* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py
Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>
* padd copy check
* use modular
* fix
* fix
* fix
* update flashatt2&sdpa support_list
* Update docs/source/en/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_5_vl.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_5_vl.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_5_vl.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/qwen2_5_vl.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update config
* update
* fix hf path
* rename Qwen2_5_VLVideosKwargs
* fix
* fix
* update
* excuted modular
* rollback init
* fix
* formated
* simpler init
* fix
* fix
* fix
* fix
* fix
* update docs
* fix
* fix
* update Qwen2VLRotaryEmbedding for yarn
* fix
---------
Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: gewenbin0992 <gewenbin292@163.com>
Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com>
2025-01-23 11:23:00 +01:00
Cyril Vallez
d3af76df58
[Backend support] Allow num_logits_to_keep
as Tensor + add flag ( #35757 )
...
* support
* Update modeling_utils.py
* style
* most models
* Other models
* fix-copies
* tests + generation utils
2025-01-23 09:47:54 +01:00
Arthur
8736e91ad6
[ tests
] remove some flash attention class tests ( #35817 )
...
remove class from tests
2025-01-23 09:44:21 +01:00
Marc Sun
2c3a44f9a7
Fix NoneType type as it requires py>=3.10 ( #35843 )
...
fix type
2025-01-22 15:56:53 +00:00
Mohit Sharma
fdcc62c855
Add PyTorch version check for FA backend on AMD GPUs ( #35813 )
...
Disable FA backend for SDPA on AMD GPUs (PyTorch < 2.4.1)
2025-01-22 16:09:23 +01:00
LRL-ModelCloud
3b9770581e
Fix compatibility issues when using auto_gptq with these older versions ( #35830 )
...
convert_model method of optimum only accepts a single nn.Module type model parameter for versions less than 1.23.99.
2025-01-22 15:46:47 +01:00
Joao Gante
62bd83947a
[chat] docs fix ( #35840 )
...
docs fix
2025-01-22 14:32:27 +00:00
Isotr0py
487e2f63bd
Fix head_dim
in config extracted from Gemma2 GGUF model ( #35818 )
...
fix gemma2 head dim
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-01-22 15:22:04 +01:00
Joao Gante
b3d6722469
[Chat] Add Chat from TRL 🐈 ( #35714 )
...
* tmp commit
* add working chat
* add docts
* docs 2
* use auto dtype by default
2025-01-22 13:30:12 +00:00
Mohamed Mekkouri
a7738f5a89
Fix : Nemotron tokenizer for GGUF format ( #35836 )
...
fix nemotron gguf
2025-01-22 12:28:40 +01:00
Joao Gante
ec28957f94
[pipeline] missing import regarding assisted generation ( #35752 )
...
missing import
2025-01-22 10:34:28 +00:00
Joao Gante
36c9181f5c
[gpt2] fix generation tests ( #35822 )
...
fix gpt2 generation tests
2025-01-22 09:41:04 +00:00
Yih-Dar
f439e28d32
Hotfix: missing working-directory
in self-comment-ci.yml
( #35833 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-22 10:25:50 +01:00
Raushan Turganbay
373e50e970
Init cache on meta device ( #35164 )
...
* init cache on meta device
* offloaded static + enable tests
* tests weren't running before :(
* update
* fix mamba
* fix copies
* update
* address comments and fix tests
* fix copies
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update
* mamba fix
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-22 09:49:17 +01:00
Yih-Dar
870e2c8ea0
Another security patch for self-comment-ci.yml
( #35816 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-22 09:29:54 +01:00
CalOmnie
f4f33a20a2
Remove pyav pin to allow python 3.11 to be used ( #35823 )
...
* Remove pyav pin to allow python 3.11 to be used
* Run make fixup
---------
Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-21 20:16:18 +00:00
Joao Gante
90b46e983f
Remove old benchmark
code ( #35730 )
...
* remove traces of the old deprecated benchmarks
* also remove old tf benchmark example, which uses deleted code
* run doc builder
2025-01-21 17:56:43 +00:00
eustlb
870eb7b41b
[Mimi] update test expected values for t4 runners ( #35696 )
...
update values for t4
2025-01-21 18:23:36 +01:00