Yaswanth Gali
fa814b0250
Merge branch 'main' into add-aimv2-model
2025-03-29 08:55:53 +05:30
yaswant19
da7bb61274
Updated testcase
2025-03-29 07:43:11 +05:30
yaswant19
b893bc8762
Refactor
2025-03-29 07:43:04 +05:30
Yih-Dar
b7fc2daf8b
Kenlm ( #37091 )
...
* kenlm
* kenlm
* kenlm
* kenlm
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 21:42:54 +01:00
Joao Gante
bab605dd04
[Cache] rename dtype attribute 🚨 🚨 ( #37044 )
...
* yoink
* same pattern in all cache
2025-03-28 19:08:02 +01:00
Joao Gante
9fd9476005
[generate] beam search -- fix output cropping ( #37080 )
...
* handle jagged beams
* better comment
* bart -- beam search tests print special tokens
* more bart test updates
* more tests!
* better comment
2025-03-28 18:57:51 +01:00
湛露先生
257bc670fb
fixed typo. ( #37057 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-28 17:12:14 +00:00
Cyril Vallez
2bea6bf24e
Fix AttentionInterface following feedback ( #37010 )
...
* up
* typo
* update doc
* Update attention_interface.md
2025-03-28 18:00:35 +01:00
Cyril Vallez
a86dad56bc
Fix state_dict map location when quantized ( #37086 )
...
* Update modeling_utils.py
* Update modeling_utils.py
2025-03-28 17:57:16 +01:00
Zach Mueller
d6064754ea
Update w/ new account ( #37084 )
...
* Update w/ new account
* DS
2025-03-28 12:43:00 -04:00
Yih-Dar
581cf96e0c
fix tied weigths issue ( #37031 )
...
* fix
* comment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 16:36:44 +01:00
Minho Ryu
eca74d1367
[WIP] add deepseek-v3 ( #35926 )
...
* init commit
* style
* take comments into account
* add deepseekv3 modeling
* remove redundant code
* apply make style
* apply fix-copies
* make format
* add init files
* rename deepseekv3 into deepseek_v3 based on its model_type
* rename deepseekv3 into deepseek_v3 based on its model_type
* deepseek-v3 not deepseek_v3
* set model_type as deepseek_v3
* use default docs
* apply make
* fill type and docstring
* add rope_config_validation
* use custom DeepseekV3MLP
* hold code only for checkpoints congifuration; remove redundant
* revise rope yarn for DeepSeek variation
* rename DeepSeek-V3
* some refactoring
* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral
* fix attention forward
* use -1 for not-changing dim when to use exapnd
* refactor DeepseekV3TopkRouter
* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim
* register pre_hook and hook both
* make style
* use n_shared_experts
* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add test file
* update modeling_file according to modular file
* make style
* add mapping for DeepseekV3ForSequenceClassification
* remove aux_loss_alpha
* add deepseek_v3 for perf
* add deepseek_v3
* rename test as deepseekv3
* use tiny-deepseek-v3
* remove DeepseekV3ForSequenceClassification
* cache before padding
* remote output_router_logits
* Revert "remote output_router_logits"
This reverts commit f264f800d0
.
* remove output_router_logits
* make e_score_correction_bias as buffer
* skip tests not compatible
* make style
* make e_score_correction_bias as buffer
* use rope_interleave instead of load_hook
* skip tests not compatible with MLA
* add doc for rope_interleave
* fix typo
* remove torch.no_grad for selecting topk
* fix post merge issue
* mrege with main and simplify
* nits
* final
* small fixes
* fix
* support TP better
* stash
* changes currently requires
* remove synch
* more fixes for TP
* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used
* updates to have generation work!
* push most of the changes
* reorder functions + call for contributions!
* update readme
* nits
* update
* ruff was updated on main
* merge with main and fix copies
* revert unrelated changes
* route all tokens to all experts when testing to avoid no gradient iddues
* finish fixing all tests
* fixup
* nit
* clean config
* last readme changes
* nit
* do cnit
* typo
* last nit
* one more one more
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
2025-03-28 15:56:59 +01:00
Raushan Turganbay
52cc204dd7
[blip-2] Fix dtype mismatch when keep in fp32 ( #37068 )
...
* fix fp32 BLIP2
* no need to reorder that
* check for `Noneness` as well before casting dtype
2025-03-28 15:52:11 +01:00
cyyever
aa3778afc2
Change deprecated PT functions ( #37041 )
...
Change deprecated functions
2025-03-28 14:26:22 +00:00
湛露先生
c90e6e9625
Fix some typos about benchmark scripts. ( #37027 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-28 14:10:20 +00:00
Yih-Dar
1fcaad6df9
Use lru_cache
for tokenization tests ( #36818 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 15:09:35 +01:00
jp
3af425d4c6
fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' ( #37026 )
...
* Add image_token_id and video_token_id handling in Llava processors
* fix: image to video
* fix: correct image and video token ID handling in Llava processors
* fix: improve image and video token ID handling in Llava processors
2025-03-28 10:46:24 +01:00
Manuel Faysse
064cd7cdac
Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) ( #36891 )
...
* fix sdpa implementation
* ruff
* also modify 2_5 for consistency
2025-03-28 09:54:21 +01:00
Perry Gibson
348f3285c5
fix: Fully remove legacy cache from Llama ( #36958 )
...
* bug: fully remove legacy cache from Llama
* bug: fix CI issues
* bug: update jetmoe model
* bug: apply =check_modular_conversion.py= fix
* bug: apply make fix-copies
* bug: fix ruff
* PR suggestions
* Remove trailing commas in auto-gen files
* Trivial new line removal
2025-03-27 17:22:44 +00:00
Finn-Ole Höner
d6b3c7486b
fixed typo ( #37036 )
2025-03-27 15:37:53 +00:00
cyyever
6cc9c8d7d1
Remove deprecated batch_size parameter ( #37007 )
2025-03-27 15:01:56 +00:00
Prem Kumar M
4cc65e990f
Replace default split function with jnp.split() in flax models ( #37001 )
...
Replace split with jnp's split function for flax models (#36854 )
2025-03-27 14:59:57 +00:00
cyyever
41a0e58e5b
Set weights_only in torch.load ( #36991 )
2025-03-27 14:55:50 +00:00
cyyever
de77f5b1ec
Fix typing for None valued variables ( #37004 )
...
Fix typing for None-able variables
2025-03-27 14:46:32 +00:00
cyyever
8c5e29bad5
Avoid unnecessary device operations in loss computing ( #36950 )
...
* Avoid unnecessary tensor copy in loss computing
* Add type
2025-03-27 14:45:14 +00:00
湛露先生
471cf1de63
clean pipeline question_answering. ( #36986 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-27 14:35:33 +00:00
Joao Gante
29f322d04d
[generate, cache] handle more complex device maps ( #37014 )
2025-03-27 14:33:20 +00:00
eustlb
fb8e6c50e4
[audio utils] fix fft_bin_width computation ( #36603 )
...
* fix fft_bin_width computation
* update docstring + enforce correct params
* update test with correct value
* udpate test
* update feature extractors for concerned models
* update
* make
* udpate docstring
* udpate docstring
2025-03-27 15:20:02 +01:00
Raushan Turganbay
e97c760006
[chat templates} support loading audio from video ( #36955 )
...
* add audio from video
* typos
* delete print
* comments
2025-03-27 14:46:11 +01:00
Pavel Iakubovskii
c7bc79bd2a
Fixup for distill_any_depth conversion script ( #37043 )
...
* Fixup
* trigger
2025-03-27 13:29:25 +00:00
Sungyoon Jeong
d1eafe8d4e
Optimize to_py_obj
for python-native numeric lists and scalars ( #36885 )
...
* Optimize to_py_obj for python-native numeric lists and scalars
* Fix bug that tuple is not converted to list
* Try np.array for more robust type checking
* Apply review and add tests for to_py_obj
2025-03-27 14:16:46 +01:00
jiqing-feng
0e56fb69a2
fix pegasus init weights and other copied models ( #36844 )
...
* fix pegasus init weights
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix the rest of models
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix informer init
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* init weight before checking
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix roformer tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix roformer tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-03-27 14:14:30 +01:00
Parteek
7e813f9cf0
Add Distill Any Depth ( #36614 )
...
* Added conversion Script
* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Updated Conversion Script
* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-03-27 13:10:03 +00:00
Mohamed Mekkouri
92429057d9
Skip FP8 linear tests For device capability < 9.0( #37008 )
...
* skip fp8 linear
* add capability check
* format
2025-03-27 12:38:37 +01:00
hoshi-hiyouga
279c2e302a
remove redundant code in trainer ( #36994 )
...
* Update optimization.py
* Update optimization.py
2025-03-27 11:35:15 +01:00
Yih-Dar
d13c390d01
Mark 2 tests as flaky for now ( #37038 )
...
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-27 10:59:47 +01:00
Kyle Sayers
d6d930a64b
[Modeling] Load FP8 safetensors such as DeepSeek ( #36828 )
...
support loading fp8
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-03-27 10:47:10 +01:00
Michael Goin
927ce1d39f
Fix PixtralProcessor patch_size when spatial_merge_size is used ( #37019 )
2025-03-27 10:46:23 +01:00
Abu Bakr Soliman
49b5ab6a27
Support QuestionAnswering Module for ModernBert based models. ( #35566 )
...
* push ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* update __init__ loading
* set imports for ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* remove debugging logs
* update init_weights method
* remove custom initialization for ModernBertForQuestionAnswering
* apply make fix-copies
* apply make style
* apply make fix-copies
* append ModernBertForQuestionAnswering to the pipeline supported models
* remove unused file
* remove invalid autoload value
* update en/model_doc/modernbert.md
* apply make fixup command
* make fixup
* Update dummies
* update usage tips for ModernBertForQuestionAnswering
* update usage tips for ModernBertForQuestionAnswering
* add init
* add lint
* add consistency
* update init test
* change text to trigger stuck text
* use self.loss_function instead of custom loss
By @Cyrilvallez
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Update modeling_modernbert.py
make comparable commit to even it out
* Match whitespace
* whitespace
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Orion Weller <wellerorion@gmail.com>
Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-03-26 21:24:18 +01:00
Yao Matrix
5b08db8844
fix transformers_cli import relative path issue ( #36989 )
...
* fix transformers_cli relative import path issue
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-26 18:45:56 +00:00
Steven Liu
3a8ec8c467
[docs] Attention mask image ( #36970 )
...
add image
2025-03-26 10:11:34 -07:00
cyyever
2b550c47b2
Remove deprecated training arguments ( #36946 )
...
* Remove deprecated training arguments
* More fixes
* More fixes
* More fixes
2025-03-26 16:44:48 +00:00
yaswant19
be7490af52
Updated tests 🤗
2025-03-26 21:58:02 +05:30
yaswant19
cf4a128c6d
More fixes
2025-03-26 21:57:46 +05:30
Afanti
44715225e3
fix typos in the code comments and error messages ( #36993 )
...
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
2025-03-26 16:09:48 +00:00
Marc Sun
79d6f9fd70
Log the correct learning rate ( #36973 )
...
* fix learning rate log
* fix lr log
* add lr
2025-03-26 16:52:00 +01:00
Mohamed Mekkouri
13d36e89fe
Fix device_map check for ggml files ( #37003 )
...
fix
2025-03-26 16:24:57 +01:00
Josh Marshall
021006e1b0
Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support. ( #36975 )
...
* Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.
Related to https://github.com/bitsandbytes-foundation/bitsandbytes/issues/1573 and https://github.com/huggingface/transformers/issues/36949 , this resolves a bug in allowing ROCm/HIP support in bitsandbytes.
* Related to bitsandbytes-foundation/bitsandbytes#1573 and huggingface#36949 , this resolves a bug in the biteandbytes integration, allowing ROCm/HIP support in bitsandbytes.
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-26 16:18:08 +01:00
Cyril Vallez
788e1092e9
Allow easy registration of custom attention functions ( #36889 )
...
* Update modeling_utils.py
* style
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* add to init
* Update modeling_utils.py
* style
* update
* Update modeling_utils.py
* Update modeling_utils.py
* style
* Add some doc
* Update _toctree.yml
* readd it for tgi/vllm compat
* CIs
* CIs
2025-03-26 16:15:06 +01:00
ivarflakstad
ad5d40de9c
Fix get_device_properties ( #36997 )
...
Fix remove remnant self from get_device_properties
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-26 15:46:34 +01:00