transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Yaswanth Gali	fa814b0250	Merge branch 'main' into add-aimv2-model	2025-03-29 08:55:53 +05:30
yaswant19	da7bb61274	Updated testcase	2025-03-29 07:43:11 +05:30
yaswant19	b893bc8762	Refactor	2025-03-29 07:43:04 +05:30
Yih-Dar	b7fc2daf8b	Kenlm (#37091 ) * kenlm * kenlm * kenlm * kenlm --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 21:42:54 +01:00
Joao Gante	bab605dd04	[Cache] rename dtype attribute 🚨 🚨 (#37044 ) * yoink * same pattern in all cache	2025-03-28 19:08:02 +01:00
Joao Gante	9fd9476005	[generate] beam search -- fix output cropping (#37080 ) * handle jagged beams * better comment * bart -- beam search tests print special tokens * more bart test updates * more tests! * better comment	2025-03-28 18:57:51 +01:00
湛露先生	257bc670fb	fixed typo. (#37057 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-03-28 17:12:14 +00:00
Cyril Vallez	2bea6bf24e	Fix AttentionInterface following feedback (#37010 ) * up * typo * update doc * Update attention_interface.md	2025-03-28 18:00:35 +01:00
Cyril Vallez	a86dad56bc	Fix state_dict map location when quantized (#37086 ) * Update modeling_utils.py * Update modeling_utils.py	2025-03-28 17:57:16 +01:00
Zach Mueller	d6064754ea	Update w/ new account (#37084 ) * Update w/ new account * DS	2025-03-28 12:43:00 -04:00
Yih-Dar	581cf96e0c	fix tied weigths issue (#37031 ) * fix * comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 16:36:44 +01:00
Minho Ryu	eca74d1367	[WIP] add deepseek-v3 (#35926 ) * init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit `f264f800d0`. * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>	2025-03-28 15:56:59 +01:00
Raushan Turganbay	52cc204dd7	[blip-2] Fix dtype mismatch when keep in fp32 (#37068 ) * fix fp32 BLIP2 * no need to reorder that * check for `Noneness` as well before casting dtype	2025-03-28 15:52:11 +01:00
cyyever	aa3778afc2	Change deprecated PT functions (#37041 ) Change deprecated functions	2025-03-28 14:26:22 +00:00
湛露先生	c90e6e9625	Fix some typos about benchmark scripts. (#37027 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-03-28 14:10:20 +00:00
Yih-Dar	1fcaad6df9	Use `lru_cache` for tokenization tests (#36818 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 15:09:35 +01:00
jp	3af425d4c6	fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' (#37026 ) * Add image_token_id and video_token_id handling in Llava processors * fix: image to video * fix: correct image and video token ID handling in Llava processors * fix: improve image and video token ID handling in Llava processors	2025-03-28 10:46:24 +01:00
Manuel Faysse	064cd7cdac	Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) (#36891 ) * fix sdpa implementation * ruff * also modify 2_5 for consistency	2025-03-28 09:54:21 +01:00
Perry Gibson	348f3285c5	fix: Fully remove legacy cache from Llama (#36958 ) * bug: fully remove legacy cache from Llama * bug: fix CI issues * bug: update jetmoe model * bug: apply =check_modular_conversion.py= fix * bug: apply make fix-copies * bug: fix ruff * PR suggestions * Remove trailing commas in auto-gen files * Trivial new line removal	2025-03-27 17:22:44 +00:00
Finn-Ole Höner	d6b3c7486b	fixed typo (#37036 )	2025-03-27 15:37:53 +00:00
cyyever	6cc9c8d7d1	Remove deprecated batch_size parameter (#37007 )	2025-03-27 15:01:56 +00:00
Prem Kumar M	4cc65e990f	Replace default split function with jnp.split() in flax models (#37001 ) Replace split with jnp's split function for flax models (#36854)	2025-03-27 14:59:57 +00:00
cyyever	41a0e58e5b	Set weights_only in torch.load (#36991 )	2025-03-27 14:55:50 +00:00
cyyever	de77f5b1ec	Fix typing for None valued variables (#37004 ) Fix typing for None-able variables	2025-03-27 14:46:32 +00:00
cyyever	8c5e29bad5	Avoid unnecessary device operations in loss computing (#36950 ) * Avoid unnecessary tensor copy in loss computing * Add type	2025-03-27 14:45:14 +00:00
湛露先生	471cf1de63	clean pipeline question_answering. (#36986 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-03-27 14:35:33 +00:00
Joao Gante	29f322d04d	[generate, cache] handle more complex device maps (#37014 )	2025-03-27 14:33:20 +00:00
eustlb	fb8e6c50e4	[audio utils] fix fft_bin_width computation (#36603 ) * fix fft_bin_width computation * update docstring + enforce correct params * update test with correct value * udpate test * update feature extractors for concerned models * update * make * udpate docstring * udpate docstring	2025-03-27 15:20:02 +01:00
Raushan Turganbay	e97c760006	[chat templates} support loading audio from video (#36955 ) * add audio from video * typos * delete print * comments	2025-03-27 14:46:11 +01:00
Pavel Iakubovskii	c7bc79bd2a	Fixup for distill_any_depth conversion script (#37043 ) * Fixup * trigger	2025-03-27 13:29:25 +00:00
Sungyoon Jeong	d1eafe8d4e	Optimize `to_py_obj` for python-native numeric lists and scalars (#36885 ) * Optimize to_py_obj for python-native numeric lists and scalars * Fix bug that tuple is not converted to list * Try np.array for more robust type checking * Apply review and add tests for to_py_obj	2025-03-27 14:16:46 +01:00
jiqing-feng	0e56fb69a2	fix pegasus init weights and other copied models (#36844 ) * fix pegasus init weights Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix the rest of models Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix informer init Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * init weight before checking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix roformer tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix roformer tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-03-27 14:14:30 +01:00
Parteek	7e813f9cf0	Add Distill Any Depth (#36614 ) * Added conversion Script * Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Updated Conversion Script * Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-03-27 13:10:03 +00:00
Mohamed Mekkouri	92429057d9	Skip FP8 linear tests For device capability < 9.0(#37008 ) * skip fp8 linear * add capability check * format	2025-03-27 12:38:37 +01:00
hoshi-hiyouga	279c2e302a	remove redundant code in trainer (#36994 ) * Update optimization.py * Update optimization.py	2025-03-27 11:35:15 +01:00
Yih-Dar	d13c390d01	Mark 2 tests as flaky for now (#37038 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-27 10:59:47 +01:00
Kyle Sayers	d6d930a64b	[Modeling] Load FP8 safetensors such as DeepSeek (#36828 ) support loading fp8 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-03-27 10:47:10 +01:00
Michael Goin	927ce1d39f	Fix PixtralProcessor patch_size when spatial_merge_size is used (#37019 )	2025-03-27 10:46:23 +01:00
Abu Bakr Soliman	49b5ab6a27	Support QuestionAnswering Module for ModernBert based models. (#35566 ) * push ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * update __init__ loading * set imports for ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * remove debugging logs * update init_weights method * remove custom initialization for ModernBertForQuestionAnswering * apply make fix-copies * apply make style * apply make fix-copies * append ModernBertForQuestionAnswering to the pipeline supported models * remove unused file * remove invalid autoload value * update en/model_doc/modernbert.md * apply make fixup command * make fixup * Update dummies * update usage tips for ModernBertForQuestionAnswering * update usage tips for ModernBertForQuestionAnswering * add init * add lint * add consistency * update init test * change text to trigger stuck text * use self.loss_function instead of custom loss By @Cyrilvallez Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update modeling_modernbert.py make comparable commit to even it out * Match whitespace * whitespace --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Orion Weller <wellerorion@gmail.com> Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-03-26 21:24:18 +01:00
Yao Matrix	5b08db8844	fix transformers_cli import relative path issue (#36989 ) * fix transformers_cli relative import path issue Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * fix style Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-03-26 18:45:56 +00:00
Steven Liu	3a8ec8c467	[docs] Attention mask image (#36970 ) add image	2025-03-26 10:11:34 -07:00
cyyever	2b550c47b2	Remove deprecated training arguments (#36946 ) * Remove deprecated training arguments * More fixes * More fixes * More fixes	2025-03-26 16:44:48 +00:00
yaswant19	be7490af52	Updated tests 🤗	2025-03-26 21:58:02 +05:30
yaswant19	cf4a128c6d	More fixes	2025-03-26 21:57:46 +05:30
Afanti	44715225e3	fix typos in the code comments and error messages (#36993 ) * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments	2025-03-26 16:09:48 +00:00
Marc Sun	79d6f9fd70	Log the correct learning rate (#36973 ) * fix learning rate log * fix lr log * add lr	2025-03-26 16:52:00 +01:00
Mohamed Mekkouri	13d36e89fe	Fix device_map check for ggml files (#37003 ) fix	2025-03-26 16:24:57 +01:00
Josh Marshall	021006e1b0	Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support. (#36975 ) * Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support. Related to https://github.com/bitsandbytes-foundation/bitsandbytes/issues/1573 and https://github.com/huggingface/transformers/issues/36949 , this resolves a bug in allowing ROCm/HIP support in bitsandbytes. * Related to bitsandbytes-foundation/bitsandbytes#1573 and huggingface#36949 , this resolves a bug in the biteandbytes integration, allowing ROCm/HIP support in bitsandbytes. --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-03-26 16:18:08 +01:00
Cyril Vallez	788e1092e9	Allow easy registration of custom attention functions (#36889 ) * Update modeling_utils.py * style * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * add to init * Update modeling_utils.py * style * update * Update modeling_utils.py * Update modeling_utils.py * style * Add some doc * Update _toctree.yml * readd it for tgi/vllm compat * CIs * CIs	2025-03-26 16:15:06 +01:00
ivarflakstad	ad5d40de9c	Fix get_device_properties (#36997 ) Fix remove remnant self from get_device_properties Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-03-26 15:46:34 +01:00

1 2 3 4 5 ...

18445 Commits