transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Poedator	7c62e69326	`GPT2Model` StaticCache support (#35761 ) * initial GPT2 changes * causal_mask support * return_legacy_cache * cleanup * fix1 * outputs shape fixes * gpt2 return fix * pkv, attn fixes * fix dual_head * is_causal arg fix * decision transformer updated * style fix * batch_size from inputs_embeds * DecisionTransformerModel fixes * cross-attn support + cache warning * x-attn @decision * EDCache proper init * simplified logic in `if use_cache:` for GPT2Model * @deprecate_kwarg for DecisionTr attn fwd * @deprecate_kwarg in gpt2 * deprecation version updated to 4.51 * kwargs in gradient_checkpointing_fn * rename next_cache to past_key_values * attention_mask prep * +cache_position in GPT2DoubleHeadsModel * undo kwargs in gradient checkpointing * moved up `if self.gradient_checkpointing` * consistency in decision_transformer * pastkv, cache_pos in grad_checkpt args * rm _reorder_cache * output_attentions streamlined * decision_transformer consistency * return_legacy_cache improved * ClvpForCausalLM used for legacy cache test now * is_causal fixed * attn_output cleanup * consistency @ decision_transformer * Updated deprecation notice version to 4.52 * upd deprecation * consistent legacy cache code in decision transformers\ * next_cache -> past_kv in decision_tr * cache support flags in decision_transf * rm legacy cache warning * consistency in cache init for decision transf * no Static Cache for Decision Transformer --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-04-24 14:46:35 +02:00
Joao Gante	9f927c8250	[cache] fix `HybridCache` init when `device` is passed (#37718 ) fix device init	2025-04-24 13:36:52 +01:00
amd-xiaoyu12	4fee320926	Expand quantized data type support for tensor parallelism (#37719 ) Update tensor_parallel.py Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>	2025-04-24 14:34:32 +02:00
Yih-Dar	0f7940bb3f	Update `MllamaForConditionalGenerationIntegrationTest` (#37750 ) * fix 1 * fix 2 * fix 3 * fix 4 * fix 5 * fix 6 * trigger CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-24 14:29:46 +02:00
Yih-Dar	7e6f36cd38	Skip all `AriaForConditionalGenerationIntegrationTest` on `T4` (#37746 ) * skip * ruff * trigger CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-24 14:11:56 +02:00
Zhen	0327d0f7f2	[performance_optim] define flash attention mask on NPU device directly (#37698 ) Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-04-24 14:06:47 +02:00
Cyril Vallez	14e28bd721	Correctly raise errors when downloading tokenizer files (#37740 ) * first try * Update tokenization_utils_base.py * Update tokenization_utils_base.py * standardize	2025-04-24 12:53:07 +02:00
BakerBunker	0ec0495967	Fix `embeds_to_talker` device in Qwen2.5-Omni (#37739 ) Fix `embeds_to_talker` device Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-04-24 12:49:57 +02:00
NanoCode012	72e4844059	fix: learning_rate logged as tensor causing save issue with deepspeed (#37704 ) * fix: learning_rate logged as tensor causing save issue with deepspeed * chore: lint --------- Co-authored-by: NanoCode012 <chanvichet@Chanvichets-MacBook-Pro.local> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 12:20:47 +02:00
Raushan Turganbay	1cfcbfcab8	[VLMs] fix flash-attention tests (#37603 ) * fix one test * fa2 ln test * remove keys from config recursively * fix * fixup	2025-04-24 11:48:11 +02:00
Mohamed Mekkouri	02baa61fab	Make sure torch_is_available before using torch.distributed (#37693 ) fix	2025-04-24 11:31:35 +02:00
Fanli Lin	864e9636ff	[tests] fix `test_nemotron_8b_generation_sdpa` (#37665 ) add max_new_tokens	2025-04-24 11:28:35 +02:00
Mohamed Mekkouri	9b3bf4a206	Fix torchao doc examples (#37697 ) fix Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 11:10:27 +02:00
BakerBunker	3ed56bea0f	Fix inference bugs in Qwen2.5 Omni (#37701 ) * Init `SinusoidsPositionEmbedding` with float to avoid precision problem * fix hidden_state for talker * Update modular_qwen2_5_omni.py * Move hidden processing out from thinker * fixup --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-04-24 10:51:44 +02:00
jiqing-feng	b7f7aa78a0	Fix Aria tests (#37444 ) * update aria tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add cuda tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check output for each device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix style Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix style Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu output Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add comments and use assert list equal Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm pad token assign Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-04-24 10:51:29 +02:00
Daksh Maheshwari	b6d65e40b2	Add Fast Image Processor for MobileNetV1 (#37111 ) * fast image processor template for MobileNetV1 via transformers-cli * Add fast image processors and unify tests for slow/fast image processor classes * added loop over image_processor_list for all tests and removed boilerplate comments. --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-23 15:55:41 -04:00
Vinh H. Pham	dea1919be4	Add Fast Image Processor for PoolFormer (#37182 ) * support poolformer fast image processor * support test for crop_pct=None * run make style * Apply suggestions from code review * rename test --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-23 15:55:33 -04:00
Parteek	b491f128d6	Add Fast PVT Processor (#37204 ) * Add Fast PVT Processor * Update image_processing_pvt_fast.py * Update image_processing_pvt_fast.py * remove kwargs --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-23 15:55:20 -04:00
Yao Matrix	19e9079dc1	enable 4 test_trainer cases on XPU (#37645 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-04-23 21:29:42 +02:00
Yoni Gozlan	5cd6b64059	Process inputs directly in apply_chat_template in image-text-to-text pipeline (#35616 ) * tokenize inputs directly in apply_chat_template * refactor processing * revert changes processing llava * Update docs * fix issue with str being iterable * add test chat text only * change function name	2025-04-23 13:31:33 -04:00
Joao Gante	80ea2c05c2	[tests, `qwen2_5_omni`] fix flaky tests (#37721 )	2025-04-23 17:54:12 +01:00
Pedro Cuenca	63c6331387	Qwen 2.5 Omni: apply video defaults (#37660 ) * Apply video defaults for min_pixels and max_pixels * fps kwarg should not be a list * Update test to account for new resizing	2025-04-23 17:08:11 +02:00
Raushan Turganbay	1e9087368c	[internvl] fix chat template (#37656 ) * fix chat template * update * update conversion * rename `fake_image_token` in tests	2025-04-23 16:56:36 +02:00
Matt	9ec8be56dd	TransfoXL is deprecated, don't keep it in tested examples! (#37707 ) * TransfoXL is deprecated, so we should remove it from examples that get tested * Remove the tokenizer too * Trigger tests	2025-04-23 14:59:38 +01:00
Joao Gante	be9b0e8521	[CI] add back `sacrebleu` (and document why) (#37700 ) * example test * add back dep * dev-ci * dev-ci	2025-04-23 14:45:00 +01:00
Matt	1d7d7a942e	Add maintainers for ROCm/Intel XPU/Ascend NPU (#37678 ) * Add maintainers for ROCm/Intel XPU/Ascend NPU * Correct capitalization for usernames * Update .github/ISSUE_TEMPLATE/bug-report.yml Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> * Update .github/ISSUE_TEMPLATE/bug-report.yml Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> * Trigger tests --------- Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>	2025-04-23 14:28:32 +01:00
Joao Gante	cc9a245e6d	[cleanup] remove `/model_cards` 🧹 🧹 (#37685 ) rm model_cards	2025-04-23 12:45:27 +01:00
Yih-Dar	ca790303f7	Pin torch == 2.6 on PR CI docker images for now (#37695 ) pin 2.6 on CircleCi images Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-23 11:47:23 +02:00
Yao Matrix	12f65ee752	enable cpu offloading for Bark on xpu (#37599 ) * enable cpu offloading of bark modeling on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * remove debug print Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix review comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> * enhance test Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update * add deprecate message Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update * update * trigger CI --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-23 11:37:15 +02:00
Shahruk Hossain	4f9893cbbc	fix: remove classmethod from `Qwen2_5OmniConfig.get_text_config` (#37690 ) - Since the `get_text_config` references an instance variable within the class (`self.thinker_config`), the `get_text_config` method should not be a classmethod. - Before this fix, users were getting the following error: ''' AttributeError: type object 'Qwen2_5OmniConfig' has no attribute 'thinker_config' '''	2025-04-23 09:30:57 +02:00
Vishesh-Mistry	1d9743edc2	Updated model card for mbart and mbart50 (#37619 ) * new card for mbart and mbart50 * removed comment BADGES * Update mBart overview Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix typo (MBart to mBart) Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * maybe fix typo * update typo and combine notes * changed notes * changed the example sentence * fixed grammatical error and removed some lines from notes example * missed one word * removed documentation resources and added some lines of example code back in notes. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-22 12:26:47 -07:00
Jinyong Lee	fbfa1dd4db	🌐 [i18n-KO] Translated `siglip.md` to Korean (#37145 ) * docs: ko: siglip.md * feat: nmt draft * fix: manual edits * chore: Correct document title to kebab-case format Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Convert unnatural language to natural Korean Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>	2025-04-22 12:23:19 -07:00
Yao Matrix	ece79b0688	enable blip2 and emu3 cases on XPU (#37662 ) * enable blip2 and emu3 modeling cases on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * remove extra new line Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-22 18:37:09 +02:00
Ken J	ca4c114dc4	Add counters for dataset classes (#37636 ) * add counters for dataset classes * fix failed code style	2025-04-22 17:30:43 +01:00
NielsRogge	d47cdae27e	[Docs] Move models to appropriate section (#37338 ) * Move models * update --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-22 18:23:14 +02:00
Deepak Sahu	dbfccd3c92	typo update in the parameter name (#37655 ) See L118 and L143 for the class attribute `hidden_dim`	2025-04-22 18:14:20 +02:00
Joao Gante	de8916dde6	[docs] only build `en` docs in push CI (#37677 )	2025-04-22 17:05:11 +01:00
Joao Gante	0f8c34b0a0	[cleanup] remove old scripts in `/scripts` 🧹 🧹 (#37676 ) * rm old files * not this one	2025-04-22 16:59:03 +01:00
Yao Matrix	6673081b21	enable 6 granite cases on xpu (#37569 ) * enable 6 granite cases on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * make them all pass on A100 Signed-off-by: N <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: N <matrix.yao@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-22 17:55:02 +02:00
Yao Matrix	9167461a7d	enable mllama cases on xpu (#37644 ) * enable mllama testing on xpu Signed-off-by: YAO Matrix <matrix.yao@intel.com> * more mllama cases enabling Signed-off-by: YAO Matrix <matrix.yao@intel.com> * make cases pass on A100 Signed-off-by: N <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: N <matrix.yao@intel.com>	2025-04-22 17:39:10 +02:00
Mohamed Mekkouri	de182ba269	Refactor bitsandbytes doc (#37668 ) * doc * torch ops * fix * nits * Update docs/source/en/quantization/bitsandbytes.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-22 16:13:25 +02:00
Antonin Stefanutti	dde9b03e3b	Fix no_split_modules for Llama4 pretrained models (#37673 )	2025-04-22 16:05:12 +02:00
Marc Sun	9481e9e9f1	Fix autoround docs (#37675 ) * fix * empty	2025-04-22 15:33:13 +02:00
Mohamed Mekkouri	38c406844e	Fixing quantization tests (#37650 ) * fix * style * add capability check	2025-04-22 13:59:57 +02:00
Wenhua Cheng	b3492ff9f7	Add AutoRound quantization support (#37393 ) * add auto-round support * Update src/transformers/quantizers/auto.py Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> * fix style issue Signed-off-by: wenhuach <wenhuach87@gmail.com> * tiny change * tiny change * refine ut and doc * revert unnecessary change * tiny change * try to fix style issue * try to fix style issue * try to fix style issue * try to fix style issue * try to fix style issue * try to fix style issue * try to fix style issue * fix doc issue * Update tests/quantization/autoround/test_auto_round.py * fix comments * Update tests/quantization/autoround/test_auto_round.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/autoround/test_auto_round.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update doc * Update src/transformers/quantizers/quantizer_auto_round.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update * update * fix * try to fix style issue * Update src/transformers/quantizers/auto.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update docs/source/en/quantization/auto_round.md Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update docs/source/en/quantization/auto_round.md Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update docs/source/en/quantization/auto_round.md Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * update * fix style issue * update doc * update doc * Refine the doc * refine doc * revert one change * set sym to True by default * Enhance the unit test's robustness. * update * add torch dtype * tiny change * add awq convert test * fix typo * update * fix packing format issue * use one gpu --------- Signed-off-by: wenhuach <wenhuach87@gmail.com> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Shen, Haihao <haihao.shen@intel.com>	2025-04-22 13:56:54 +02:00
Cyril Vallez	9608908639	Correct warm-up with fp8 (#37670 ) * start clean warmup for quantizers * style --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-22 13:12:49 +02:00
Cyril Vallez	6614209b96	Fix duplicated weights in fp8 quantization (#37667 ) * fix fp8 * Update quantizer_finegrained_fp8.py * fix circular import * Update quantizer_finegrained_fp8.py	2025-04-22 13:12:27 +02:00
Raushan Turganbay	dcf6df5b0d	[qwen-omni] fix training (#37517 ) * fix * add text config * fixup * fix docs	2025-04-22 12:36:07 +02:00
Pavel Iakubovskii	9167fadab9	Introduce GradientCheckpointingLayer (#37223 ) * GradientCheckpointingLayer * trigger * Move GC layer to a separate file * Update import * Expose and document GC layer * Fix dummy * Apply to llama-based models * Update modulars * Update a few more models for consistency * Update glm4 * Update Janus	2025-04-22 11:33:31 +01:00
Manuel de Prada Corral	413f9bbf80	Fixes #37219 : RecurrentGemma crashes for inputs longer than sliding window length (#37613 ) * fix: RecurrentGemma crashes during inference for inputs longer than sliding window width * fix recurrentgemma tests; add long test bigger than context window	2025-04-22 12:21:16 +02:00

1 2 3 4 5 ...

18769 Commits