transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Yuanyuan Chen	da4ff2a5f5	Add Optional to remaining types (#37808 ) More Optional typing Signed-off-by: cyy <cyyever@outlook.com>	2025-04-28 14:20:45 +01:00
Benjamin Bossan	1a9188a54e	FIX: Faulty PEFT tests (#37757 ) Two PEFT tests are actually failing: tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_delete_adapter tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_peft_pipeline_no_warning This must have been going on for some time but was apparently never noticed. The cause is that the tests themselves are faulty, the PEFT integration is correct in these cases. test_delete_adapter The first faulty test was introduced by #34650. AFAICT, it should never have passed in the first place, the PEFT integration logic was not changed in the meantime. At this point, the logs for the PR CI are gone, so I'm not sure if the test passed back then or not. test_peft_pipeline_no_warning This test was introduced in #36783 and should also never have passed, as the self.assertNoLogs context manager only returns None, thus the assert should never have worked (mea culpa for suggesting this code snippet). Here too, the CI logs are deleted by now, so I can't check if the test already failed back then.	2025-04-28 15:10:46 +02:00
Mohamed Mekkouri	b262680af4	Add Bitnet model (#37742 ) * Adding BitNet b1.58 Model * Add testing code for BitNet * Fix format issues * Fix docstring format issues * Fix docstring * Fix docstring * Fix: weight back to uint8 * Fix * Fix format issues * Remove copy comments * Add model link to the docstring * Fix: set tie_word_embeddings default to false * Update * Generate modeling file * Change config name for automatically generating modeling file. * Generate modeling file * Fix class name * Change testing branch * Remove unused param * Fix config docstring * Add docstring for BitNetQuantConfig. * Fix docstring * Update docs/source/en/model_doc/bitnet.md Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update docs/source/en/model_doc/bitnet.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update bitnet config * Update explanation between online and offline mode * Remove space * revert changes * more revert * spaces * update * fix-copies * doc fix * fix minor nits * empty * small nit * empty --------- Co-authored-by: Shuming Ma <shumingma@pku.edu.cn> Co-authored-by: shumingma <shmingm@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-28 15:08:46 +02:00
NielsRogge	82862ce443	[RT-DETR] Improve docs (#37814 ) Fix docs	2025-04-28 13:19:24 +02:00
Li Haoru	97e57b2545	Fix: Correct tensor shape comment in Mamba modeling (#37801 ) * Fix: Correct tensor shape comment in Mamba modeling * Update src/transformers/models/mamba/modeling_mamba.py * Update src/transformers/models/mamba/modeling_mamba.py --------- Co-authored-by: ShadyPi <11342288+shadypi@user.noreply.gitee.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-04-28 11:56:42 +01:00
Ken J	33493542aa	[doc] fix the code examples in qwen doc (#37803 )	2025-04-28 11:56:32 +01:00
co63oc	d5fa7d2d19	Fix typos in strings and comments (#37799 )	2025-04-28 11:39:11 +01:00
Mohamed Mekkouri	f466603963	Define warmup allocator for torchao quantization (#37764 ) * torchao allocator * add comment --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-28 10:45:55 +02:00
Yuan Wu	a41b6d9b5c	Fix the fsdp config cannot work issue. (#37549 ) * Fix the fsdp config cannot work issue. Signed-off-by: yuanwu <yuan.wu@intel.com> * Check the fsdp_config type Signed-off-by: yuanwu <yuan.wu@intel.com> * Add the accelerate_fsdp_config test Signed-off-by: yuanwu <yuan.wu@intel.com> * fix error of make style Signed-off-by: yuanwu <yuan.wu@intel.com> * Add key check Signed-off-by: yuanwu <yuan.wu@intel.com> --------- Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-28 10:44:51 +02:00
Guang Yang	816b37010c	Gemma3 is Torch Exportable (#37728 ) * Gemma3 is Torch Exportable * Expand the support to other mdoels using HybridCache --------- Co-authored-by: Guang Yang <guangyang@fb.com>	2025-04-28 09:36:46 +02:00
SR	397a5ede33	Fix error message in `hub.py` (#37796 ) Fix error message	2025-04-25 14:03:06 -07:00
martin-harmonic	6ce675ee81	fix performance issue in convert_ids_to_tokens (#37773 )	2025-04-25 22:00:50 +02:00
saswatmeher	57c620bf8a	chore: update SigLIP2 model card (#37624 ) * update siglip2 model card * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * address comments * separate naflex and fixres variant * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/siglip2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-25 12:46:17 -07:00
Minki Kim	eb4afdd1fb	[i18n-KO] Translated `keypoint_detection.md` to Korean (#36649 ) * fix: manual edits * fix: manual edits * fix: manual edits * Update docs/source/ko/tasks/keypoint_detection.md Anchor lower modify Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/keypoint_detection.md connect letter Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/keypoint_detection.md modify to usual words Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/keypoint_detection.md modify extension word Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ko/tasks/keypoint_detection.md modify to usual words Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/keypoint_detection.md modify to usual words Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/tasks/keypoint_detection.md modify to usual representation Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-25 12:24:12 -07:00
jiqing-feng	555693fbfa	fix mpt test of different outputs from cuda (#37691 ) * fix mpt test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix mpt tests with Expectations Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix output Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-04-25 18:04:56 +02:00
Cyril Vallez	0cfbf9c95b	Force torch>=2.6 with torch.load to avoid vulnerability issue (#37785 ) * fix all main files * fix test files * oups forgot modular * add link * update message	2025-04-25 16:57:09 +02:00
Cyril Vallez	eefc86aa31	Fix tensor parallel with non-floating dtypes (#37790 ) fix	2025-04-25 15:48:16 +02:00
co63oc	214062201e	Fix typos in strings and comments (#37784 ) * Fix typos in strings and comments * Fix	2025-04-25 13:47:25 +01:00
Cyril Vallez	ba3bd37253	Align gpt2 mask preparation to #37612 (#37787 ) Update modeling_gpt2.py	2025-04-25 12:50:30 +02:00
Yih-Dar	50d231a806	unpin pytest<8 (#37768 ) * pytest 8 * pytest 8 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-25 12:34:33 +02:00
Raushan Turganbay	79d4bc761d	[causal mask] fix preparation with multi-gpu (#37612 ) * fix multi-gpu * forgot non-copied models * fixup	2025-04-25 09:34:18 +02:00
김가영	7bb619d710	🌐 [i18n-KO] Translated `roberta.md` to Korean (#37069 ) * docs: ko: roberta.md * fix: manual edits * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>	2025-04-24 10:00:24 -07:00
AfafEL	cfe666919e	Update model card for Gemma (#37674 ) * Update Gemma model card * Updated after review * Update following review	2025-04-24 09:58:46 -07:00
Mohamed Mekkouri	b2d70e9c49	Fix auto-round hfoption (#37759 ) fix	2025-04-24 18:19:38 +02:00
lewtun	acdbe627e3	Guard DeepSpeed imports (#37755 ) * Guard DeepSpeed imports * Fix import * Import deepspeed consistently --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 18:16:34 +02:00
Joao Gante	af6d2756d9	[deps] pin max `torch` version (#37760 ) pin max pt version :(	2025-04-24 16:18:25 +01:00
co63oc	0302aa1c6e	Fix typos in comments (#37694 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-04-24 15:59:56 +01:00
Wing Lian	af000ceb92	Fix load of rng state for resuming training from checkpoint (#37162 ) Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 16:55:34 +02:00
Cyril Vallez	0af0a5f969	Fix tied weight loading with TP and loading sub state_dicts (#37758 ) Update modeling_utils.py	2025-04-24 16:47:40 +02:00
flashJd	3af24f7e27	Refine parameter type annotations (#37666 )	2025-04-24 15:37:13 +01:00
Kaiwen	22e3da92b7	Fix wrong input shapes in doc-string of models (#37729 ) * Fix wrong position_ids shape in doc Supported by ClvpDecoder.forward, line 1212--1215: src/transformers/models/clvp/modeling_clvp.py: 1212 if inputs_embeds is None: 1213 inputs_embeds = self.input_embeds_layer(input_ids) 1214 position_embeds = self.position_embeds_layer(position_ids) 1215 inputs_embeds = inputs_embeds + position_embeds * Fix possibly wrong input_ids shape in doc Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead. * Fix possibly wrong inputs_embeds shape in doc Supported by CTRLModel.forward, line 448--449: src/transformers/models/ctrl/modeling_ctrl.py: 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. * Fix possibly wrong token_type_ids shape in doc Supported by CTRLModel.forward, line 441--460: src/transformers/models/ctrl/modeling_ctrl.py: 441 if token_type_ids is not None: 442 token_type_ids = token_type_ids.view(-1, input_shape[-1]) 443 token_type_embeds = self.w(token_type_ids) 444 token_type_embeds = np.sqrt(self.d_model_size) 445 else: 446 token_type_embeds = 0 447 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) 450 # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded 451 seq_len = input_shape[-1] 452 mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device) 453 454 inputs_embeds = np.sqrt(self.d_model_size) 455 456 # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually. 457 self.pos_encoding = self.pos_encoding.to(device) 458 pos_embeds = self.pos_encoding[position_ids, :] 459 460 hidden_states = inputs_embeds + pos_embeds + token_type_embeds This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. * Fix possibly wrong position_ids shape in doc Supported by CTRLModel.forward, line 448--460: src/transformers/models/ctrl/modeling_ctrl.py: 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) 450 # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded 451 seq_len = input_shape[-1] 452 mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device) 453 454 inputs_embeds = np.sqrt(self.d_model_size) 455 456 # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually. 457 self.pos_encoding = self.pos_encoding.to(device) 458 pos_embeds = self.pos_encoding[position_ids, :] 459 460 hidden_states = inputs_embeds + pos_embeds + token_type_embeds This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. Fix wrong token_type_ids shape in doc Supported by TFCTRLMainLayer.call, line 376--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 376 if token_type_ids is not None: 377 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 378 token_type_embeds = self.w(token_type_ids) 379 token_type_embeds = tf.math.sqrt(tf.cast(self.d_model_size, dtype=token_type_embeds.dtype)) 380 else: 381 token_type_embeds = tf.constant(0.0) 382 position_ids = tf.reshape(position_ids, [-1, shape_list(position_ids)[-1]]) 383 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds = tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFCTRLMainLayer.call, line 384--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds = tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds Fix wrong inputs_embeds shape in doc Supported by TFCTRLMainLayer.call, line 384--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds = tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds Fix wrong inputs_embeds shape in doc Supported by ClvpDecoder.forward, line 1212--1213: src/transformers/models/clvp/modeling_clvp.py: 1212 if inputs_embeds is None: 1213 inputs_embeds = self.input_embeds_layer(input_ids) * Fix wrong position_ids shape in doc Supported by FlaxGemmaPreTrainedModel.__call__, line 502--508: src/transformers/models/gemma/modeling_flax_gemma.py: 502 batch_size, sequence_length = input_ids.shape 503 504 if position_ids is None: 505 if past_key_values is not None: 506 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 507 508 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by FlaxGPT2PreTrainedModel.__call__, line 482--488: src/transformers/models/gpt2/modeling_flax_gpt2.py: 482 batch_size, sequence_length = input_ids.shape 483 484 if position_ids is None: 485 if past_key_values is not None: 486 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 487 488 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by GPT2Model.forward, line 918--921: src/transformers/models/gpt2/modeling_gpt2.py: 918 if inputs_embeds is None: 919 inputs_embeds = self.wte(input_ids) 920 position_embeds = self.wpe(position_ids) 921 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) * Fix wrong inputs_embeds shape in doc Supported by GPT2Model.forward, line 918--919: src/transformers/models/gpt2/modeling_gpt2.py: 918 if inputs_embeds is None: 919 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPT2LMHeadModel.forward, line 1156--1157: src/transformers/models/gpt2/modeling_gpt2.py: 1156 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 1157 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong labels shape in doc Supported by GPT2DoubleHeadsModel.forward, line 1314--1315: src/transformers/models/gpt2/modeling_gpt2.py: 1314 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 1315 `labels = input_ids`. Indices are selected in `[-100, 0, ..., config.vocab_size - 1]`. All labels set to * Fix wrong token_type_ids shape in doc Supported by TFGPT2MainLayer.call, line 486--500: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) 489 490 position_embeds = self.wpe(position_ids) 491 492 if token_type_ids is not None: 493 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 494 token_type_embeds = self.wte(token_type_ids) 495 else: 496 token_type_embeds = tf.constant(0.0) 497 498 position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype) 499 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 500 hidden_states = inputs_embeds + position_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFGPT2MainLayer.call, line 486--500: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) 489 490 position_embeds = self.wpe(position_ids) 491 492 if token_type_ids is not None: 493 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 494 token_type_embeds = self.wte(token_type_ids) 495 else: 496 token_type_embeds = tf.constant(0.0) 497 498 position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype) 499 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 500 hidden_states = inputs_embeds + position_embeds + token_type_embeds * Fix wrong inputs_embeds shape in doc Supported by TFGPT2MainLayer.call, line 486--488: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) * Fix wrong position_ids shape in doc Supported by GPTBigCodeModel.forward, line 962--965: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 962 if inputs_embeds is None: 963 inputs_embeds = self.wte(input_ids) 964 position_embeds = self.wpe(position_ids) 965 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) * Fix wrong inputs_embeds shape in doc Supported by GPTBigCodeModel.forward, line 962--963: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 962 if inputs_embeds is None: 963 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPTBigCodeForCausalLM.forward, line 1158--1159: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 1158 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 1159 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong position_ids shape in doc Supported by FlaxGPTNeoModule.__call__, line 549--552: src/transformers/models/gpt_neo/modeling_flax_gpt_neo.py: 549 input_embeds = self.wte(input_ids.astype("i4")) 550 position_embeds = self.wpe(position_ids.astype("i4")) 551 552 hidden_states = input_embeds + position_embeds * Fix wrong position_ids shape in doc Supported by GPTNeoModel.forward, line 685--720: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 685 if inputs_embeds is None: 686 inputs_embeds = self.wte(input_ids) 687 688 # kept for BC (non `Cache` `past_key_values` inputs) 689 return_legacy_cache = False 690 if use_cache and not isinstance(past_key_values, Cache): 691 return_legacy_cache = True 692 if past_key_values is None: 693 past_key_values = DynamicCache() 694 else: 695 past_key_values = DynamicCache.from_legacy_cache(past_key_values) 696 logger.warning_once( 697 "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and " 698 "will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class " 699 "(https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)" 700 ) 701 702 seq_length = inputs_embeds.shape[1] 703 if cache_position is None: 704 past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0 705 cache_position = torch.arange(past_seen_tokens, past_seen_tokens + seq_length, device=inputs_embeds.device) 706 707 if position_ids is None: 708 position_ids = cache_position.unsqueeze(0) 709 710 causal_mask = self._update_causal_mask( 711 attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions 712 ) 713 714 # Prepare head mask if needed 715 # 1.0 in head_mask indicate we keep the head 716 # attention_probs has shape bsz x num_heads x N x N 717 # head_mask has shape n_layer x batch x num_heads x N x N 718 head_mask = self.get_head_mask(head_mask, self.config.num_layers) 719 position_embeds = self.wpe(position_ids) 720 hidden_states = inputs_embeds + position_embeds * Fix wrong inputs_embeds shape in doc Supported by GPTNeoModel.forward, line 685--686: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 685 if inputs_embeds is None: 686 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPTNeoForCausalLM.forward, line 968--969: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 968 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 969 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong position_ids shape in doc Supported by FlaxGPTJPreTrainedModel.__call__, line 455--461: src/transformers/models/gptj/modeling_flax_gptj.py: 455 batch_size, sequence_length = input_ids.shape 456 457 if position_ids is None: 458 if past_key_values is not None: 459 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 460 461 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong token_type_ids shape in doc Supported by TFGPTJMainLayer.call, line 482--493: src/transformers/models/gptj/modeling_tf_gptj.py: 482 if inputs_embeds is None: 483 check_embeddings_within_bounds(input_ids, self.wte.vocab_size) 484 inputs_embeds = self.wte(input_ids, mode="embedding") 485 486 if token_type_ids is not None: 487 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 488 token_type_embeds = self.wte(token_type_ids, mode="embedding") 489 else: 490 token_type_embeds = tf.constant(0.0) 491 492 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 493 hidden_states = inputs_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFGPTJMainLayer.call, line 434--449: src/transformers/models/gptj/modeling_tf_gptj.py: 434 elif input_ids is not None: 435 input_shape = shape_list(input_ids) 436 input_ids = tf.reshape(input_ids, [-1, input_shape[-1]]) 437 elif inputs_embeds is not None: 438 input_shape = shape_list(inputs_embeds)[:-1] 439 else: 440 raise ValueError("You have to specify either input_ids or inputs_embeds") 441 442 if past_key_values is None: 443 past_length = 0 444 past_key_values = [None] * len(self.h) 445 else: 446 past_length = shape_list(past_key_values[0][0])[-2] 447 448 if position_ids is None: 449 position_ids = tf.expand_dims(tf.range(past_length, input_shape[-1] + past_length), axis=0) * Fix wrong inputs_embeds shape in doc Supported by TFGPTJMainLayer.call, line 482--484: src/transformers/models/gptj/modeling_tf_gptj.py: 482 if inputs_embeds is None: 483 check_embeddings_within_bounds(input_ids, self.wte.vocab_size) 484 inputs_embeds = self.wte(input_ids, mode="embedding") * Fix wrong labels shape in doc Supported by TFGPTJForCausalLM.call, line 812--813: src/transformers/models/gptj/modeling_tf_gptj.py: 812 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 813 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix possibly wrong input_ids shape in doc Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead. * Fix possibly wrong token_type_ids shape in doc Supported by ImageGPTModel.forward, line 773--780: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) 775 position_embeds = self.wpe(position_ids) 776 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) 777 778 if token_type_ids is not None: 779 token_type_embeds = self.wte(token_type_ids) 780 hidden_states = hidden_states + token_type_embeds This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong position_ids shape in doc Supported by ImageGPTModel.forward, line 773--776: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) 775 position_embeds = self.wpe(position_ids) 776 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong inputs_embeds shape in doc Supported by ImageGPTModel.forward, line 773--774: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong labels shape in doc Supported by ImageGPTForCausalImageModeling.forward, line 923--924: src/transformers/models/imagegpt/modeling_imagegpt.py: 923 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 924 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong labels shape in doc Supported by ImageGPTModel.forward, line 665--666: src/transformers/models/imagegpt/modeling_imagegpt.py: 665 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 666 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix wrong position_ids shape in doc Supported by FlaxLlamaPreTrainedModel.__call__, line 484--490: src/transformers/models/llama/modeling_flax_llama.py: 484 batch_size, sequence_length = input_ids.shape 485 486 if position_ids is None: 487 if past_key_values is not None: 488 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 489 490 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by FlaxMistralPreTrainedModel.__call__, line 478--484: src/transformers/models/mistral/modeling_flax_mistral.py: 478 batch_size, sequence_length = input_ids.shape 479 480 if position_ids is None: 481 if past_key_values is not None: 482 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 483 484 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))	2025-04-24 15:36:03 +01:00
Joao Gante	4d64c38593	[generate] fix default autocompile case on gpu (#37756 )	2025-04-24 15:08:38 +01:00
robert	43bb4c0456	Fix qwen2_5 get_rope_index tensor device locations (#37597 ) * Fix qwen2_5 get_rope_index tensor device locations * simpler fix * edit right file for modular model * add a test * try normalizing type to fix non-video * fix some imports * add a video forward test with dummy input	2025-04-24 16:04:38 +02:00
Prem Kumar M	dd2649fa98	updated hidden_features for FlaxDinov2SwiGLUFFN in Dinov2 (#37747 ) Flax Dinov2: updated hidden_features in FlaxDinov2SwiGLUFFN Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-04-24 14:30:31 +01:00
Joao Gante	8bdd4f2acd	[generate] skip compilation on cpu offload (#37709 ) * skip compilation on cpu offload * add test * better logic * docstring * boolean logic * add disk offload check * warn users if compilation options are set but compilation doesn happen * fix test --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 14:08:17 +01:00
Poedator	7c62e69326	`GPT2Model` StaticCache support (#35761 ) * initial GPT2 changes * causal_mask support * return_legacy_cache * cleanup * fix1 * outputs shape fixes * gpt2 return fix * pkv, attn fixes * fix dual_head * is_causal arg fix * decision transformer updated * style fix * batch_size from inputs_embeds * DecisionTransformerModel fixes * cross-attn support + cache warning * x-attn @decision * EDCache proper init * simplified logic in `if use_cache:` for GPT2Model * @deprecate_kwarg for DecisionTr attn fwd * @deprecate_kwarg in gpt2 * deprecation version updated to 4.51 * kwargs in gradient_checkpointing_fn * rename next_cache to past_key_values * attention_mask prep * +cache_position in GPT2DoubleHeadsModel * undo kwargs in gradient checkpointing * moved up `if self.gradient_checkpointing` * consistency in decision_transformer * pastkv, cache_pos in grad_checkpt args * rm _reorder_cache * output_attentions streamlined * decision_transformer consistency * return_legacy_cache improved * ClvpForCausalLM used for legacy cache test now * is_causal fixed * attn_output cleanup * consistency @ decision_transformer * Updated deprecation notice version to 4.52 * upd deprecation * consistent legacy cache code in decision transformers\ * next_cache -> past_kv in decision_tr * cache support flags in decision_transf * rm legacy cache warning * consistency in cache init for decision transf * no Static Cache for Decision Transformer --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-04-24 14:46:35 +02:00
Joao Gante	9f927c8250	[cache] fix `HybridCache` init when `device` is passed (#37718 ) fix device init	2025-04-24 13:36:52 +01:00
amd-xiaoyu12	4fee320926	Expand quantized data type support for tensor parallelism (#37719 ) Update tensor_parallel.py Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>	2025-04-24 14:34:32 +02:00
Yih-Dar	0f7940bb3f	Update `MllamaForConditionalGenerationIntegrationTest` (#37750 ) * fix 1 * fix 2 * fix 3 * fix 4 * fix 5 * fix 6 * trigger CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-24 14:29:46 +02:00
Yih-Dar	7e6f36cd38	Skip all `AriaForConditionalGenerationIntegrationTest` on `T4` (#37746 ) * skip * ruff * trigger CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-24 14:11:56 +02:00
Zhen	0327d0f7f2	[performance_optim] define flash attention mask on NPU device directly (#37698 ) Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-04-24 14:06:47 +02:00
Cyril Vallez	14e28bd721	Correctly raise errors when downloading tokenizer files (#37740 ) * first try * Update tokenization_utils_base.py * Update tokenization_utils_base.py * standardize	2025-04-24 12:53:07 +02:00
BakerBunker	0ec0495967	Fix `embeds_to_talker` device in Qwen2.5-Omni (#37739 ) Fix `embeds_to_talker` device Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-04-24 12:49:57 +02:00
NanoCode012	72e4844059	fix: learning_rate logged as tensor causing save issue with deepspeed (#37704 ) * fix: learning_rate logged as tensor causing save issue with deepspeed * chore: lint --------- Co-authored-by: NanoCode012 <chanvichet@Chanvichets-MacBook-Pro.local> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 12:20:47 +02:00
Raushan Turganbay	1cfcbfcab8	[VLMs] fix flash-attention tests (#37603 ) * fix one test * fa2 ln test * remove keys from config recursively * fix * fixup	2025-04-24 11:48:11 +02:00
Mohamed Mekkouri	02baa61fab	Make sure torch_is_available before using torch.distributed (#37693 ) fix	2025-04-24 11:31:35 +02:00
Fanli Lin	864e9636ff	[tests] fix `test_nemotron_8b_generation_sdpa` (#37665 ) add max_new_tokens	2025-04-24 11:28:35 +02:00
Mohamed Mekkouri	9b3bf4a206	Fix torchao doc examples (#37697 ) fix Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 11:10:27 +02:00
BakerBunker	3ed56bea0f	Fix inference bugs in Qwen2.5 Omni (#37701 ) * Init `SinusoidsPositionEmbedding` with float to avoid precision problem * fix hidden_state for talker * Update modular_qwen2_5_omni.py * Move hidden processing out from thinker * fixup --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-04-24 10:51:44 +02:00
jiqing-feng	b7f7aa78a0	Fix Aria tests (#37444 ) * update aria tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add cuda tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check output for each device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix style Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix style Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu output Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add comments and use assert list equal Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm pad token assign Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-04-24 10:51:29 +02:00

1 2 3 4 5 ...

18854 Commits