transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-12 09:10:05 +06:00

Author	SHA1	Message	Date
김가영	7bb619d710	🌐 [i18n-KO] Translated `roberta.md` to Korean (#37069 ) * docs: ko: roberta.md * fix: manual edits * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>	2025-04-24 10:00:24 -07:00
AfafEL	cfe666919e	Update model card for Gemma (#37674 ) * Update Gemma model card * Updated after review * Update following review	2025-04-24 09:58:46 -07:00
Mohamed Mekkouri	b2d70e9c49	Fix auto-round hfoption (#37759 ) fix	2025-04-24 18:19:38 +02:00
lewtun	acdbe627e3	Guard DeepSpeed imports (#37755 ) * Guard DeepSpeed imports * Fix import * Import deepspeed consistently --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 18:16:34 +02:00
Joao Gante	af6d2756d9	[deps] pin max `torch` version (#37760 ) pin max pt version :(	2025-04-24 16:18:25 +01:00
co63oc	0302aa1c6e	Fix typos in comments (#37694 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-04-24 15:59:56 +01:00
Wing Lian	af000ceb92	Fix load of rng state for resuming training from checkpoint (#37162 ) Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 16:55:34 +02:00
Cyril Vallez	0af0a5f969	Fix tied weight loading with TP and loading sub state_dicts (#37758 ) Update modeling_utils.py	2025-04-24 16:47:40 +02:00
flashJd	3af24f7e27	Refine parameter type annotations (#37666 )	2025-04-24 15:37:13 +01:00
Kaiwen	22e3da92b7	Fix wrong input shapes in doc-string of models (#37729 ) * Fix wrong position_ids shape in doc Supported by ClvpDecoder.forward, line 1212--1215: src/transformers/models/clvp/modeling_clvp.py: 1212 if inputs_embeds is None: 1213 inputs_embeds = self.input_embeds_layer(input_ids) 1214 position_embeds = self.position_embeds_layer(position_ids) 1215 inputs_embeds = inputs_embeds + position_embeds * Fix possibly wrong input_ids shape in doc Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead. * Fix possibly wrong inputs_embeds shape in doc Supported by CTRLModel.forward, line 448--449: src/transformers/models/ctrl/modeling_ctrl.py: 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. * Fix possibly wrong token_type_ids shape in doc Supported by CTRLModel.forward, line 441--460: src/transformers/models/ctrl/modeling_ctrl.py: 441 if token_type_ids is not None: 442 token_type_ids = token_type_ids.view(-1, input_shape[-1]) 443 token_type_embeds = self.w(token_type_ids) 444 token_type_embeds = np.sqrt(self.d_model_size) 445 else: 446 token_type_embeds = 0 447 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) 450 # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded 451 seq_len = input_shape[-1] 452 mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device) 453 454 inputs_embeds = np.sqrt(self.d_model_size) 455 456 # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually. 457 self.pos_encoding = self.pos_encoding.to(device) 458 pos_embeds = self.pos_encoding[position_ids, :] 459 460 hidden_states = inputs_embeds + pos_embeds + token_type_embeds This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. * Fix possibly wrong position_ids shape in doc Supported by CTRLModel.forward, line 448--460: src/transformers/models/ctrl/modeling_ctrl.py: 448 if inputs_embeds is None: 449 inputs_embeds = self.w(input_ids) 450 # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded 451 seq_len = input_shape[-1] 452 mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device) 453 454 inputs_embeds = np.sqrt(self.d_model_size) 455 456 # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually. 457 self.pos_encoding = self.pos_encoding.to(device) 458 pos_embeds = self.pos_encoding[position_ids, :] 459 460 hidden_states = inputs_embeds + pos_embeds + token_type_embeds This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a. Fix wrong token_type_ids shape in doc Supported by TFCTRLMainLayer.call, line 376--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 376 if token_type_ids is not None: 377 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 378 token_type_embeds = self.w(token_type_ids) 379 token_type_embeds = tf.math.sqrt(tf.cast(self.d_model_size, dtype=token_type_embeds.dtype)) 380 else: 381 token_type_embeds = tf.constant(0.0) 382 position_ids = tf.reshape(position_ids, [-1, shape_list(position_ids)[-1]]) 383 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds = tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFCTRLMainLayer.call, line 384--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds = tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds Fix wrong inputs_embeds shape in doc Supported by TFCTRLMainLayer.call, line 384--394: src/transformers/models/ctrl/modeling_tf_ctrl.py: 384 if inputs_embeds is None: 385 check_embeddings_within_bounds(input_ids, self.w.input_dim) 386 inputs_embeds = self.w(input_ids) 387 seq_len = input_shape[-1] 388 mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0) 389 390 inputs_embeds = tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype)) 391 392 pos_embeds = tf.gather(self.pos_encoding, position_ids) 393 pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype) 394 hidden_states = inputs_embeds + pos_embeds + token_type_embeds Fix wrong inputs_embeds shape in doc Supported by ClvpDecoder.forward, line 1212--1213: src/transformers/models/clvp/modeling_clvp.py: 1212 if inputs_embeds is None: 1213 inputs_embeds = self.input_embeds_layer(input_ids) * Fix wrong position_ids shape in doc Supported by FlaxGemmaPreTrainedModel.__call__, line 502--508: src/transformers/models/gemma/modeling_flax_gemma.py: 502 batch_size, sequence_length = input_ids.shape 503 504 if position_ids is None: 505 if past_key_values is not None: 506 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 507 508 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by FlaxGPT2PreTrainedModel.__call__, line 482--488: src/transformers/models/gpt2/modeling_flax_gpt2.py: 482 batch_size, sequence_length = input_ids.shape 483 484 if position_ids is None: 485 if past_key_values is not None: 486 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 487 488 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by GPT2Model.forward, line 918--921: src/transformers/models/gpt2/modeling_gpt2.py: 918 if inputs_embeds is None: 919 inputs_embeds = self.wte(input_ids) 920 position_embeds = self.wpe(position_ids) 921 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) * Fix wrong inputs_embeds shape in doc Supported by GPT2Model.forward, line 918--919: src/transformers/models/gpt2/modeling_gpt2.py: 918 if inputs_embeds is None: 919 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPT2LMHeadModel.forward, line 1156--1157: src/transformers/models/gpt2/modeling_gpt2.py: 1156 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 1157 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong labels shape in doc Supported by GPT2DoubleHeadsModel.forward, line 1314--1315: src/transformers/models/gpt2/modeling_gpt2.py: 1314 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 1315 `labels = input_ids`. Indices are selected in `[-100, 0, ..., config.vocab_size - 1]`. All labels set to * Fix wrong token_type_ids shape in doc Supported by TFGPT2MainLayer.call, line 486--500: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) 489 490 position_embeds = self.wpe(position_ids) 491 492 if token_type_ids is not None: 493 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 494 token_type_embeds = self.wte(token_type_ids) 495 else: 496 token_type_embeds = tf.constant(0.0) 497 498 position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype) 499 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 500 hidden_states = inputs_embeds + position_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFGPT2MainLayer.call, line 486--500: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) 489 490 position_embeds = self.wpe(position_ids) 491 492 if token_type_ids is not None: 493 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 494 token_type_embeds = self.wte(token_type_ids) 495 else: 496 token_type_embeds = tf.constant(0.0) 497 498 position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype) 499 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 500 hidden_states = inputs_embeds + position_embeds + token_type_embeds * Fix wrong inputs_embeds shape in doc Supported by TFGPT2MainLayer.call, line 486--488: src/transformers/models/gpt2/modeling_tf_gpt2.py: 486 if inputs_embeds is None: 487 check_embeddings_within_bounds(input_ids, self.config.vocab_size) 488 inputs_embeds = self.wte(input_ids) * Fix wrong position_ids shape in doc Supported by GPTBigCodeModel.forward, line 962--965: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 962 if inputs_embeds is None: 963 inputs_embeds = self.wte(input_ids) 964 position_embeds = self.wpe(position_ids) 965 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) * Fix wrong inputs_embeds shape in doc Supported by GPTBigCodeModel.forward, line 962--963: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 962 if inputs_embeds is None: 963 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPTBigCodeForCausalLM.forward, line 1158--1159: src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: 1158 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 1159 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong position_ids shape in doc Supported by FlaxGPTNeoModule.__call__, line 549--552: src/transformers/models/gpt_neo/modeling_flax_gpt_neo.py: 549 input_embeds = self.wte(input_ids.astype("i4")) 550 position_embeds = self.wpe(position_ids.astype("i4")) 551 552 hidden_states = input_embeds + position_embeds * Fix wrong position_ids shape in doc Supported by GPTNeoModel.forward, line 685--720: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 685 if inputs_embeds is None: 686 inputs_embeds = self.wte(input_ids) 687 688 # kept for BC (non `Cache` `past_key_values` inputs) 689 return_legacy_cache = False 690 if use_cache and not isinstance(past_key_values, Cache): 691 return_legacy_cache = True 692 if past_key_values is None: 693 past_key_values = DynamicCache() 694 else: 695 past_key_values = DynamicCache.from_legacy_cache(past_key_values) 696 logger.warning_once( 697 "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and " 698 "will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class " 699 "(https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)" 700 ) 701 702 seq_length = inputs_embeds.shape[1] 703 if cache_position is None: 704 past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0 705 cache_position = torch.arange(past_seen_tokens, past_seen_tokens + seq_length, device=inputs_embeds.device) 706 707 if position_ids is None: 708 position_ids = cache_position.unsqueeze(0) 709 710 causal_mask = self._update_causal_mask( 711 attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions 712 ) 713 714 # Prepare head mask if needed 715 # 1.0 in head_mask indicate we keep the head 716 # attention_probs has shape bsz x num_heads x N x N 717 # head_mask has shape n_layer x batch x num_heads x N x N 718 head_mask = self.get_head_mask(head_mask, self.config.num_layers) 719 position_embeds = self.wpe(position_ids) 720 hidden_states = inputs_embeds + position_embeds * Fix wrong inputs_embeds shape in doc Supported by GPTNeoModel.forward, line 685--686: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 685 if inputs_embeds is None: 686 inputs_embeds = self.wte(input_ids) * Fix wrong labels shape in doc Supported by GPTNeoForCausalLM.forward, line 968--969: src/transformers/models/gpt_neo/modeling_gpt_neo.py: 968 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 969 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix wrong position_ids shape in doc Supported by FlaxGPTJPreTrainedModel.__call__, line 455--461: src/transformers/models/gptj/modeling_flax_gptj.py: 455 batch_size, sequence_length = input_ids.shape 456 457 if position_ids is None: 458 if past_key_values is not None: 459 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 460 461 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong token_type_ids shape in doc Supported by TFGPTJMainLayer.call, line 482--493: src/transformers/models/gptj/modeling_tf_gptj.py: 482 if inputs_embeds is None: 483 check_embeddings_within_bounds(input_ids, self.wte.vocab_size) 484 inputs_embeds = self.wte(input_ids, mode="embedding") 485 486 if token_type_ids is not None: 487 token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]]) 488 token_type_embeds = self.wte(token_type_ids, mode="embedding") 489 else: 490 token_type_embeds = tf.constant(0.0) 491 492 token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype) 493 hidden_states = inputs_embeds + token_type_embeds * Fix wrong position_ids shape in doc Supported by TFGPTJMainLayer.call, line 434--449: src/transformers/models/gptj/modeling_tf_gptj.py: 434 elif input_ids is not None: 435 input_shape = shape_list(input_ids) 436 input_ids = tf.reshape(input_ids, [-1, input_shape[-1]]) 437 elif inputs_embeds is not None: 438 input_shape = shape_list(inputs_embeds)[:-1] 439 else: 440 raise ValueError("You have to specify either input_ids or inputs_embeds") 441 442 if past_key_values is None: 443 past_length = 0 444 past_key_values = [None] * len(self.h) 445 else: 446 past_length = shape_list(past_key_values[0][0])[-2] 447 448 if position_ids is None: 449 position_ids = tf.expand_dims(tf.range(past_length, input_shape[-1] + past_length), axis=0) * Fix wrong inputs_embeds shape in doc Supported by TFGPTJMainLayer.call, line 482--484: src/transformers/models/gptj/modeling_tf_gptj.py: 482 if inputs_embeds is None: 483 check_embeddings_within_bounds(input_ids, self.wte.vocab_size) 484 inputs_embeds = self.wte(input_ids, mode="embedding") * Fix wrong labels shape in doc Supported by TFGPTJForCausalLM.call, line 812--813: src/transformers/models/gptj/modeling_tf_gptj.py: 812 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 813 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` * Fix possibly wrong input_ids shape in doc Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead. * Fix possibly wrong token_type_ids shape in doc Supported by ImageGPTModel.forward, line 773--780: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) 775 position_embeds = self.wpe(position_ids) 776 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) 777 778 if token_type_ids is not None: 779 token_type_embeds = self.wte(token_type_ids) 780 hidden_states = hidden_states + token_type_embeds This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong position_ids shape in doc Supported by ImageGPTModel.forward, line 773--776: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) 775 position_embeds = self.wpe(position_ids) 776 hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device) This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong inputs_embeds shape in doc Supported by ImageGPTModel.forward, line 773--774: src/transformers/models/imagegpt/modeling_imagegpt.py: 773 if inputs_embeds is None: 774 inputs_embeds = self.wte(input_ids) This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong labels shape in doc Supported by ImageGPTForCausalImageModeling.forward, line 923--924: src/transformers/models/imagegpt/modeling_imagegpt.py: 923 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 924 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix possibly wrong labels shape in doc Supported by ImageGPTModel.forward, line 665--666: src/transformers/models/imagegpt/modeling_imagegpt.py: 665 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set 666 `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3. * Fix wrong position_ids shape in doc Supported by FlaxLlamaPreTrainedModel.__call__, line 484--490: src/transformers/models/llama/modeling_flax_llama.py: 484 batch_size, sequence_length = input_ids.shape 485 486 if position_ids is None: 487 if past_key_values is not None: 488 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 489 490 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length)) * Fix wrong position_ids shape in doc Supported by FlaxMistralPreTrainedModel.__call__, line 478--484: src/transformers/models/mistral/modeling_flax_mistral.py: 478 batch_size, sequence_length = input_ids.shape 479 480 if position_ids is None: 481 if past_key_values is not None: 482 raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.") 483 484 position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))	2025-04-24 15:36:03 +01:00
Joao Gante	4d64c38593	[generate] fix default autocompile case on gpu (#37756 )	2025-04-24 15:08:38 +01:00
robert	43bb4c0456	Fix qwen2_5 get_rope_index tensor device locations (#37597 ) * Fix qwen2_5 get_rope_index tensor device locations * simpler fix * edit right file for modular model * add a test * try normalizing type to fix non-video * fix some imports * add a video forward test with dummy input	2025-04-24 16:04:38 +02:00
Prem Kumar M	dd2649fa98	updated hidden_features for FlaxDinov2SwiGLUFFN in Dinov2 (#37747 ) Flax Dinov2: updated hidden_features in FlaxDinov2SwiGLUFFN Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-04-24 14:30:31 +01:00
Joao Gante	8bdd4f2acd	[generate] skip compilation on cpu offload (#37709 ) * skip compilation on cpu offload * add test * better logic * docstring * boolean logic * add disk offload check * warn users if compilation options are set but compilation doesn happen * fix test --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 14:08:17 +01:00
Poedator	7c62e69326	`GPT2Model` StaticCache support (#35761 ) * initial GPT2 changes * causal_mask support * return_legacy_cache * cleanup * fix1 * outputs shape fixes * gpt2 return fix * pkv, attn fixes * fix dual_head * is_causal arg fix * decision transformer updated * style fix * batch_size from inputs_embeds * DecisionTransformerModel fixes * cross-attn support + cache warning * x-attn @decision * EDCache proper init * simplified logic in `if use_cache:` for GPT2Model * @deprecate_kwarg for DecisionTr attn fwd * @deprecate_kwarg in gpt2 * deprecation version updated to 4.51 * kwargs in gradient_checkpointing_fn * rename next_cache to past_key_values * attention_mask prep * +cache_position in GPT2DoubleHeadsModel * undo kwargs in gradient checkpointing * moved up `if self.gradient_checkpointing` * consistency in decision_transformer * pastkv, cache_pos in grad_checkpt args * rm _reorder_cache * output_attentions streamlined * decision_transformer consistency * return_legacy_cache improved * ClvpForCausalLM used for legacy cache test now * is_causal fixed * attn_output cleanup * consistency @ decision_transformer * Updated deprecation notice version to 4.52 * upd deprecation * consistent legacy cache code in decision transformers\ * next_cache -> past_kv in decision_tr * cache support flags in decision_transf * rm legacy cache warning * consistency in cache init for decision transf * no Static Cache for Decision Transformer --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-04-24 14:46:35 +02:00
Joao Gante	9f927c8250	[cache] fix `HybridCache` init when `device` is passed (#37718 ) fix device init	2025-04-24 13:36:52 +01:00
amd-xiaoyu12	4fee320926	Expand quantized data type support for tensor parallelism (#37719 ) Update tensor_parallel.py Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>	2025-04-24 14:34:32 +02:00
Yih-Dar	0f7940bb3f	Update `MllamaForConditionalGenerationIntegrationTest` (#37750 ) * fix 1 * fix 2 * fix 3 * fix 4 * fix 5 * fix 6 * trigger CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-24 14:29:46 +02:00
Yih-Dar	7e6f36cd38	Skip all `AriaForConditionalGenerationIntegrationTest` on `T4` (#37746 ) * skip * ruff * trigger CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-24 14:11:56 +02:00
Zhen	0327d0f7f2	[performance_optim] define flash attention mask on NPU device directly (#37698 ) Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-04-24 14:06:47 +02:00
Cyril Vallez	14e28bd721	Correctly raise errors when downloading tokenizer files (#37740 ) * first try * Update tokenization_utils_base.py * Update tokenization_utils_base.py * standardize	2025-04-24 12:53:07 +02:00
BakerBunker	0ec0495967	Fix `embeds_to_talker` device in Qwen2.5-Omni (#37739 ) Fix `embeds_to_talker` device Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-04-24 12:49:57 +02:00
NanoCode012	72e4844059	fix: learning_rate logged as tensor causing save issue with deepspeed (#37704 ) * fix: learning_rate logged as tensor causing save issue with deepspeed * chore: lint --------- Co-authored-by: NanoCode012 <chanvichet@Chanvichets-MacBook-Pro.local> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 12:20:47 +02:00
Raushan Turganbay	1cfcbfcab8	[VLMs] fix flash-attention tests (#37603 ) * fix one test * fa2 ln test * remove keys from config recursively * fix * fixup	2025-04-24 11:48:11 +02:00
Mohamed Mekkouri	02baa61fab	Make sure torch_is_available before using torch.distributed (#37693 ) fix	2025-04-24 11:31:35 +02:00
Fanli Lin	864e9636ff	[tests] fix `test_nemotron_8b_generation_sdpa` (#37665 ) add max_new_tokens	2025-04-24 11:28:35 +02:00
Mohamed Mekkouri	9b3bf4a206	Fix torchao doc examples (#37697 ) fix Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 11:10:27 +02:00
BakerBunker	3ed56bea0f	Fix inference bugs in Qwen2.5 Omni (#37701 ) * Init `SinusoidsPositionEmbedding` with float to avoid precision problem * fix hidden_state for talker * Update modular_qwen2_5_omni.py * Move hidden processing out from thinker * fixup --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-04-24 10:51:44 +02:00
jiqing-feng	b7f7aa78a0	Fix Aria tests (#37444 ) * update aria tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add cuda tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check outputs for cpu and cuda and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check output for each device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix style Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix style Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu output Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add comments and use assert list equal Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm pad token assign Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-04-24 10:51:29 +02:00
Daksh Maheshwari	b6d65e40b2	Add Fast Image Processor for MobileNetV1 (#37111 ) * fast image processor template for MobileNetV1 via transformers-cli * Add fast image processors and unify tests for slow/fast image processor classes * added loop over image_processor_list for all tests and removed boilerplate comments. --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-23 15:55:41 -04:00
Vinh H. Pham	dea1919be4	Add Fast Image Processor for PoolFormer (#37182 ) * support poolformer fast image processor * support test for crop_pct=None * run make style * Apply suggestions from code review * rename test --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-23 15:55:33 -04:00
Parteek	b491f128d6	Add Fast PVT Processor (#37204 ) * Add Fast PVT Processor * Update image_processing_pvt_fast.py * Update image_processing_pvt_fast.py * remove kwargs --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-23 15:55:20 -04:00
Yao Matrix	19e9079dc1	enable 4 test_trainer cases on XPU (#37645 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-04-23 21:29:42 +02:00
Yoni Gozlan	5cd6b64059	Process inputs directly in apply_chat_template in image-text-to-text pipeline (#35616 ) * tokenize inputs directly in apply_chat_template * refactor processing * revert changes processing llava * Update docs * fix issue with str being iterable * add test chat text only * change function name	2025-04-23 13:31:33 -04:00
Joao Gante	80ea2c05c2	[tests, `qwen2_5_omni`] fix flaky tests (#37721 )	2025-04-23 17:54:12 +01:00
Pedro Cuenca	63c6331387	Qwen 2.5 Omni: apply video defaults (#37660 ) * Apply video defaults for min_pixels and max_pixels * fps kwarg should not be a list * Update test to account for new resizing	2025-04-23 17:08:11 +02:00
Raushan Turganbay	1e9087368c	[internvl] fix chat template (#37656 ) * fix chat template * update * update conversion * rename `fake_image_token` in tests	2025-04-23 16:56:36 +02:00
Matt	9ec8be56dd	TransfoXL is deprecated, don't keep it in tested examples! (#37707 ) * TransfoXL is deprecated, so we should remove it from examples that get tested * Remove the tokenizer too * Trigger tests	2025-04-23 14:59:38 +01:00
Joao Gante	be9b0e8521	[CI] add back `sacrebleu` (and document why) (#37700 ) * example test * add back dep * dev-ci * dev-ci	2025-04-23 14:45:00 +01:00
Matt	1d7d7a942e	Add maintainers for ROCm/Intel XPU/Ascend NPU (#37678 ) * Add maintainers for ROCm/Intel XPU/Ascend NPU * Correct capitalization for usernames * Update .github/ISSUE_TEMPLATE/bug-report.yml Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> * Update .github/ISSUE_TEMPLATE/bug-report.yml Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> * Trigger tests --------- Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>	2025-04-23 14:28:32 +01:00
Joao Gante	cc9a245e6d	[cleanup] remove `/model_cards` 🧹 🧹 (#37685 ) rm model_cards	2025-04-23 12:45:27 +01:00
Yih-Dar	ca790303f7	Pin torch == 2.6 on PR CI docker images for now (#37695 ) pin 2.6 on CircleCi images Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-23 11:47:23 +02:00
Yao Matrix	12f65ee752	enable cpu offloading for Bark on xpu (#37599 ) * enable cpu offloading of bark modeling on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * remove debug print Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix review comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> * enhance test Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update * add deprecate message Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update * update * trigger CI --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-23 11:37:15 +02:00
Shahruk Hossain	4f9893cbbc	fix: remove classmethod from `Qwen2_5OmniConfig.get_text_config` (#37690 ) - Since the `get_text_config` references an instance variable within the class (`self.thinker_config`), the `get_text_config` method should not be a classmethod. - Before this fix, users were getting the following error: ''' AttributeError: type object 'Qwen2_5OmniConfig' has no attribute 'thinker_config' '''	2025-04-23 09:30:57 +02:00
Vishesh-Mistry	1d9743edc2	Updated model card for mbart and mbart50 (#37619 ) * new card for mbart and mbart50 * removed comment BADGES * Update mBart overview Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix typo (MBart to mBart) Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * maybe fix typo * update typo and combine notes * changed notes * changed the example sentence * fixed grammatical error and removed some lines from notes example * missed one word * removed documentation resources and added some lines of example code back in notes. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-22 12:26:47 -07:00
Jinyong Lee	fbfa1dd4db	🌐 [i18n-KO] Translated `siglip.md` to Korean (#37145 ) * docs: ko: siglip.md * feat: nmt draft * fix: manual edits * chore: Correct document title to kebab-case format Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Convert unnatural language to natural Korean Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>	2025-04-22 12:23:19 -07:00
Yao Matrix	ece79b0688	enable blip2 and emu3 cases on XPU (#37662 ) * enable blip2 and emu3 modeling cases on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * remove extra new line Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-22 18:37:09 +02:00
Ken J	ca4c114dc4	Add counters for dataset classes (#37636 ) * add counters for dataset classes * fix failed code style	2025-04-22 17:30:43 +01:00
NielsRogge	d47cdae27e	[Docs] Move models to appropriate section (#37338 ) * Move models * update --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-22 18:23:14 +02:00
Deepak Sahu	dbfccd3c92	typo update in the parameter name (#37655 ) See L118 and L143 for the class attribute `hidden_dim`	2025-04-22 18:14:20 +02:00

... 11 12 13 14 15 ...

19383 Commits