transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-13 09:40:06 +06:00

Author	SHA1	Message	Date
Robin Kahlow	d1efaf0318	RWKV: fix mask warning typo (#37114 ) rwkv: fix mask warning typo	2025-03-31 11:07:51 +02:00
Thien Tran	19919689b2	Fix Gemma3 embedding scaling (#37109 ) fix gemma3 embedding	2025-03-31 11:04:02 +02:00
huismiling	d0b65bb479	[MLU] Fix FA2 check error, remove deepspeed-mlu deps. (#36159 ) * add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker * Cambricon support SDPA and flash_attn * MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu * Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support. * fix testing errors. * Merge branch 'hf/main' into main * fix get_device_count error. * fix mlu testing utils. * fix code quality and style. * switch to @require_torch_multi_accelerator	2025-03-31 11:02:49 +02:00
jiqing-feng	ad63d20dff	fix whisper re-compile (#36712 ) * fix whisper re-compile Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix copy Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix copies Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert useless changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-03-31 11:01:51 +02:00
jiqing-feng	286393fbb1	enable tp on CPU (#36299 ) * enable tp on CPU Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * get rank from cpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable TP tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * em print Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix model id Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix conflict Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix index and add doc Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-03-31 10:55:47 +02:00
Qubitium-ModelCloud	4705b04c74	Fix 4090/ada not detected as having FP8 support (#37067 ) fix 4090/ada not detected as having FP8 support Signed-off-by: Qubitium <qubitium@modelcloud.ai> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-03-31 10:53:48 +02:00
efsotr	2b4734bd49	Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037 ) * support passing flash_attn_kwargs when gradient_checkpointing is enabled * make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py	2025-03-31 10:53:02 +02:00
Yuan Wu	bd41b9c1ac	Gaudi: Fix the pipeline failed issue with hpu device (#36990 ) * Gaudi: fix the issue of is_torch_hpu_available() returns false Signed-off-by: yuanwu <yuan.wu@intel.com> * Fix make fixup Signed-off-by: yuanwu <yuan.wu@intel.com> * Add comments for the implicit behavior of import Signed-off-by: yuanwu <yuan.wu@intel.com> * Update src/transformers/utils/import_utils.py * Update src/transformers/utils/import_utils.py --------- Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>	2025-03-31 10:23:47 +02:00
Bo Zheng	6acd5aecb3	Adding Qwen3 and Qwen3MoE (#36878 ) * Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2025-03-31 09:50:49 +02:00
MinJu-Ha	0d6a60fe55	🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean (#36750 ) * fix: manual edits * fix: resolve suggestions * Update toctree.yml	2025-03-30 15:00:27 -07:00
Yih-Dar	b7fc2daf8b	Kenlm (#37091 ) * kenlm * kenlm * kenlm * kenlm --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 21:42:54 +01:00
Joao Gante	bab605dd04	[Cache] rename dtype attribute 🚨 🚨 (#37044 ) * yoink * same pattern in all cache	2025-03-28 19:08:02 +01:00
Joao Gante	9fd9476005	[generate] beam search -- fix output cropping (#37080 ) * handle jagged beams * better comment * bart -- beam search tests print special tokens * more bart test updates * more tests! * better comment	2025-03-28 18:57:51 +01:00
湛露先生	257bc670fb	fixed typo. (#37057 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-03-28 17:12:14 +00:00
Cyril Vallez	2bea6bf24e	Fix AttentionInterface following feedback (#37010 ) * up * typo * update doc * Update attention_interface.md	2025-03-28 18:00:35 +01:00
Cyril Vallez	a86dad56bc	Fix state_dict map location when quantized (#37086 ) * Update modeling_utils.py * Update modeling_utils.py	2025-03-28 17:57:16 +01:00
Zach Mueller	d6064754ea	Update w/ new account (#37084 ) * Update w/ new account * DS	2025-03-28 12:43:00 -04:00
Yih-Dar	581cf96e0c	fix tied weigths issue (#37031 ) * fix * comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 16:36:44 +01:00
Minho Ryu	eca74d1367	[WIP] add deepseek-v3 (#35926 ) * init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit `f264f800d0`. * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>	2025-03-28 15:56:59 +01:00
Raushan Turganbay	52cc204dd7	[blip-2] Fix dtype mismatch when keep in fp32 (#37068 ) * fix fp32 BLIP2 * no need to reorder that * check for `Noneness` as well before casting dtype	2025-03-28 15:52:11 +01:00
cyyever	aa3778afc2	Change deprecated PT functions (#37041 ) Change deprecated functions	2025-03-28 14:26:22 +00:00
湛露先生	c90e6e9625	Fix some typos about benchmark scripts. (#37027 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-03-28 14:10:20 +00:00
Yih-Dar	1fcaad6df9	Use `lru_cache` for tokenization tests (#36818 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 15:09:35 +01:00
jp	3af425d4c6	fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' (#37026 ) * Add image_token_id and video_token_id handling in Llava processors * fix: image to video * fix: correct image and video token ID handling in Llava processors * fix: improve image and video token ID handling in Llava processors	2025-03-28 10:46:24 +01:00
Manuel Faysse	064cd7cdac	Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) (#36891 ) * fix sdpa implementation * ruff * also modify 2_5 for consistency	2025-03-28 09:54:21 +01:00
Perry Gibson	348f3285c5	fix: Fully remove legacy cache from Llama (#36958 ) * bug: fully remove legacy cache from Llama * bug: fix CI issues * bug: update jetmoe model * bug: apply =check_modular_conversion.py= fix * bug: apply make fix-copies * bug: fix ruff * PR suggestions * Remove trailing commas in auto-gen files * Trivial new line removal	2025-03-27 17:22:44 +00:00
Finn-Ole Höner	d6b3c7486b	fixed typo (#37036 )	2025-03-27 15:37:53 +00:00
cyyever	6cc9c8d7d1	Remove deprecated batch_size parameter (#37007 )	2025-03-27 15:01:56 +00:00
Prem Kumar M	4cc65e990f	Replace default split function with jnp.split() in flax models (#37001 ) Replace split with jnp's split function for flax models (#36854)	2025-03-27 14:59:57 +00:00
cyyever	41a0e58e5b	Set weights_only in torch.load (#36991 )	2025-03-27 14:55:50 +00:00
cyyever	de77f5b1ec	Fix typing for None valued variables (#37004 ) Fix typing for None-able variables	2025-03-27 14:46:32 +00:00
cyyever	8c5e29bad5	Avoid unnecessary device operations in loss computing (#36950 ) * Avoid unnecessary tensor copy in loss computing * Add type	2025-03-27 14:45:14 +00:00
湛露先生	471cf1de63	clean pipeline question_answering. (#36986 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-03-27 14:35:33 +00:00
Joao Gante	29f322d04d	[generate, cache] handle more complex device maps (#37014 )	2025-03-27 14:33:20 +00:00
eustlb	fb8e6c50e4	[audio utils] fix fft_bin_width computation (#36603 ) * fix fft_bin_width computation * update docstring + enforce correct params * update test with correct value * udpate test * update feature extractors for concerned models * update * make * udpate docstring * udpate docstring	2025-03-27 15:20:02 +01:00
Raushan Turganbay	e97c760006	[chat templates} support loading audio from video (#36955 ) * add audio from video * typos * delete print * comments	2025-03-27 14:46:11 +01:00
Pavel Iakubovskii	c7bc79bd2a	Fixup for distill_any_depth conversion script (#37043 ) * Fixup * trigger	2025-03-27 13:29:25 +00:00
Sungyoon Jeong	d1eafe8d4e	Optimize `to_py_obj` for python-native numeric lists and scalars (#36885 ) * Optimize to_py_obj for python-native numeric lists and scalars * Fix bug that tuple is not converted to list * Try np.array for more robust type checking * Apply review and add tests for to_py_obj	2025-03-27 14:16:46 +01:00
jiqing-feng	0e56fb69a2	fix pegasus init weights and other copied models (#36844 ) * fix pegasus init weights Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix the rest of models Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix informer init Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * init weight before checking Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix roformer tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix roformer tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-03-27 14:14:30 +01:00
Parteek	7e813f9cf0	Add Distill Any Depth (#36614 ) * Added conversion Script * Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Updated Conversion Script * Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-03-27 13:10:03 +00:00
Mohamed Mekkouri	92429057d9	Skip FP8 linear tests For device capability < 9.0(#37008 ) * skip fp8 linear * add capability check * format	2025-03-27 12:38:37 +01:00
hoshi-hiyouga	279c2e302a	remove redundant code in trainer (#36994 ) * Update optimization.py * Update optimization.py	2025-03-27 11:35:15 +01:00
Yih-Dar	d13c390d01	Mark 2 tests as flaky for now (#37038 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-27 10:59:47 +01:00
Kyle Sayers	d6d930a64b	[Modeling] Load FP8 safetensors such as DeepSeek (#36828 ) support loading fp8 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-03-27 10:47:10 +01:00
Michael Goin	927ce1d39f	Fix PixtralProcessor patch_size when spatial_merge_size is used (#37019 )	2025-03-27 10:46:23 +01:00
Abu Bakr Soliman	49b5ab6a27	Support QuestionAnswering Module for ModernBert based models. (#35566 ) * push ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * update __init__ loading * set imports for ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * remove debugging logs * update init_weights method * remove custom initialization for ModernBertForQuestionAnswering * apply make fix-copies * apply make style * apply make fix-copies * append ModernBertForQuestionAnswering to the pipeline supported models * remove unused file * remove invalid autoload value * update en/model_doc/modernbert.md * apply make fixup command * make fixup * Update dummies * update usage tips for ModernBertForQuestionAnswering * update usage tips for ModernBertForQuestionAnswering * add init * add lint * add consistency * update init test * change text to trigger stuck text * use self.loss_function instead of custom loss By @Cyrilvallez Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update modeling_modernbert.py make comparable commit to even it out * Match whitespace * whitespace --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Orion Weller <wellerorion@gmail.com> Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-03-26 21:24:18 +01:00
Yao Matrix	5b08db8844	fix transformers_cli import relative path issue (#36989 ) * fix transformers_cli relative import path issue Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * fix style Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-03-26 18:45:56 +00:00
Steven Liu	3a8ec8c467	[docs] Attention mask image (#36970 ) add image	2025-03-26 10:11:34 -07:00
cyyever	2b550c47b2	Remove deprecated training arguments (#36946 ) * Remove deprecated training arguments * More fixes * More fixes * More fixes	2025-03-26 16:44:48 +00:00
Afanti	44715225e3	fix typos in the code comments and error messages (#36993 ) * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments * chore: enhance code comments	2025-03-26 16:09:48 +00:00

... 18 19 20 21 22 ...

19383 Commits