transformers

Shawn Tan 5ab0f447ab GraniteMoeHybrid: Allow for only shared expert case. (#38801 ) * Allow for only shared expert case. * Style	2025-06-16 16:15:42 +01:00
..
commands	[add-new-model-like] Robust search & proper outer '),' in tokenizer mapping (#38703 )	2025-06-10 12:25:12 +00:00
data	Allow `mlm_probability` to be set to `None` when `mlm=False` in DataCollatorForLanguageModeling (#38522 ) (#38537 )	2025-06-05 13:54:12 +01:00
generation	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
integrations	Fix peft integration (#38841 )	2025-06-16 10:39:25 +02:00
kernels	Use `deformable_detr` kernel from the Hub (#36853 )	2025-03-21 13:08:47 +01:00
loss	Add V-JEPA for video classification model (#38788 )	2025-06-13 17:56:15 +01:00
models	GraniteMoeHybrid: Allow for only shared expert case. (#38801 )	2025-06-16 16:15:42 +01:00
onnx	Use OSError (#38712 )	2025-06-10 12:13:49 +00:00
pipelines	[BugFix] QA pipeline edge case: `align_to_words=True` in `QuestionAnsweringPipeline` can lead to duplicate answers (#38761 )	2025-06-16 15:01:22 +00:00
quantizers	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
sagemaker	[Refactor] Relative imports wherever we can (#21880 )	2023-03-02 09:45:42 +01:00
utils	Add V-JEPA for video classification model (#38788 )	2025-06-13 17:56:15 +01:00
__init__.py	🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866 )	2025-05-22 11:38:26 +02:00
activations_tf.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
activations.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
audio_utils.py	Add CSM model (#36719 )	2025-05-07 10:20:13 -04:00
cache_utils.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
configuration_utils.py	Add support for MiniMax's MiniMax-Text-01 (#35831 )	2025-06-04 09:38:40 +02:00
convert_graph_to_onnx.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
convert_pytorch_checkpoint_to_tf2.py	Set weights_only in torch.load (#36991 )	2025-03-27 14:55:50 +00:00
convert_slow_tokenizer.py	Add optional RMSNorm support to BitNet quantization (config + layers) (#38087 )	2025-05-16 12:38:06 +02:00
convert_slow_tokenizers_checkpoints_to_fast.py	Use pyupgrade --py39-plus to improve code (#36843 )	2025-03-20 14:39:44 +00:00
convert_tf_hub_seq_to_seq_bert_to_pytorch.py	Use pyupgrade --py39-plus to improve code (#36843 )	2025-03-20 14:39:44 +00:00
debug_utils.py	Fix typos in comments (#37694 )	2025-04-24 15:59:56 +01:00
dependency_versions_check.py	⚠️ Time to say goodbye to py37 (#24091 )	2023-06-28 07:22:39 +02:00
dependency_versions_table.py	build: 📌 Remove upper bound on PyTorch (#38789 )	2025-06-12 16:34:13 +02:00
dynamic_module_utils.py	🚨 🚨 Fix custom code saving (#37716 )	2025-05-26 17:37:30 +01:00
feature_extraction_sequence_utils.py	Use pyupgrade --py39-plus to improve code (#36843 )	2025-03-20 14:39:44 +00:00
feature_extraction_utils.py	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
file_utils.py	[core] Large/full refactor of `from_pretrained` (#36033 )	2025-03-12 13:39:25 +01:00
hf_argparser.py	Fix Optional type annotation (#36841 )	2025-03-26 13:53:44 +00:00
hyperparameter_search.py	Fix Optional type annotation (#36841 )	2025-03-26 13:53:44 +00:00
image_processing_base.py	🚨 🚨 Fix custom code saving (#37716 )	2025-05-26 17:37:30 +01:00
image_processing_utils_fast.py	Add args support for fast image processors (#37018 )	2025-05-16 12:01:46 -04:00
image_processing_utils.py	Add Optional to remaining types (#37808 )	2025-04-28 14:20:45 +01:00
image_transforms.py	🔴 Video processors as a separate class (#35206 )	2025-05-12 11:55:51 +02:00
image_utils.py	🔴 Video processors as a separate class (#35206 )	2025-05-12 11:55:51 +02:00
keras_callbacks.py	Use pyupgrade --py39-plus to improve code (#36843 )	2025-03-20 14:39:44 +00:00
masking_utils.py	Fix masking utils (#38783 )	2025-06-12 11:00:46 +02:00
model_debugging_utils.py	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
modelcard.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
modeling_attn_mask_utils.py	Fix attention mask expansion when converting to executorch (#38637 )	2025-06-09 15:00:55 +00:00
modeling_flash_attention_utils.py	Initialize flash attn flag (#38768 )	2025-06-12 14:06:13 +00:00
modeling_flax_outputs.py	Add Optional to types (#37163 )	2025-04-03 16:38:01 +01:00
modeling_flax_pytorch_utils.py	Use OSError (#38712 )	2025-06-10 12:13:49 +00:00
modeling_flax_utils.py	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
modeling_gguf_pytorch_utils.py	Support loading Gemma3 QAT GGUF models (#37649 )	2025-04-22 11:23:17 +02:00
modeling_layers.py	Introduce GradientCheckpointingLayer (#37223 )	2025-04-22 11:33:31 +01:00
modeling_outputs.py	Fix `past_key_values` type hint in model output types (#37953 )	2025-05-13 13:36:49 +00:00
modeling_rope_utils.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
modeling_tf_outputs.py	Add Optional to types (#37163 )	2025-04-03 16:38:01 +01:00
modeling_tf_pytorch_utils.py	Stop TF weight rename reDOS (#38325 )	2025-05-26 16:58:51 +01:00
modeling_tf_utils.py	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
modeling_utils.py	add default mapping to peft integration	2025-06-16 10:23:51 +02:00
optimization_tf.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
optimization.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
processing_utils.py	[video processors] support frame sampling within processors (#38105 )	2025-06-12 09:34:30 +00:00
py.typed	Add py.typed (#37022 )	2025-04-02 14:17:27 +01:00
pytorch_utils.py	protect dtensor import (#38496 )	2025-05-30 17:36:00 +02:00
safetensors_conversion.py	Change back to `Thread` for SF conversion (#35236 )	2024-12-12 16:05:04 +01:00
testing_utils.py	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
tf_utils.py	Add Optional to types (#37163 )	2025-04-03 16:38:01 +01:00
time_series_utils.py	Use pyupgrade --py39-plus to improve code (#36843 )	2025-03-20 14:39:44 +00:00
tokenization_utils_base.py	refactor create_token_type_ids_from_sequences (#37681 )	2025-06-12 23:24:43 +02:00
tokenization_utils_fast.py	refactor can_save_slow_tokenizer (#37722 )	2025-05-23 17:29:38 +02:00
tokenization_utils.py	Add Optional to types (#37163 )	2025-04-03 16:38:01 +01:00
trainer_callback.py	Fix Optional type annotation (#36841 )	2025-03-26 13:53:44 +00:00
trainer_pt_utils.py	Remove Deprecated `verbose` arg in LayerWiseDummyScheduler (#38197 )	2025-05-19 13:49:11 +00:00
trainer_seq2seq.py	[generation] Less verbose warnings by default (#38179 )	2025-05-19 10:03:37 +00:00
trainer_utils.py	update seed_worker to set seed based on worker_id and rank (#37980 )	2025-05-12 15:59:16 +00:00
trainer.py	Fix trainer.py not showing signature columns (#38465 )	2025-06-13 15:39:29 +00:00
training_args_seq2seq.py	[docs] Remove sortish_sampler (#35539 )	2025-01-07 12:06:19 -08:00
training_args_tf.py	Use pyupgrade --py39-plus to improve code (#36843 )	2025-03-20 14:39:44 +00:00
training_args.py	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
video_processing_utils.py	[video processors] support frame sampling within processors (#38105 )	2025-06-12 09:34:30 +00:00
video_utils.py	[video processors] support frame sampling within processors (#38105 )	2025-06-12 09:34:30 +00:00

commands

[add-new-model-like] Robust search & proper outer '),' in tokenizer mapping (#38703 )

2025-06-10 12:25:12 +00:00

data

Allow mlm_probability to be set to None when mlm=False in DataCollatorForLanguageModeling (#38522 ) (#38537 )

2025-06-05 13:54:12 +01:00

generation

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

integrations

Fix peft integration (#38841 )

2025-06-16 10:39:25 +02:00

kernels

Use deformable_detr kernel from the Hub (#36853 )

2025-03-21 13:08:47 +01:00

loss

Add V-JEPA for video classification model (#38788 )

2025-06-13 17:56:15 +01:00

models

GraniteMoeHybrid: Allow for only shared expert case. (#38801 )

2025-06-16 16:15:42 +01:00

onnx

Use OSError (#38712 )

2025-06-10 12:13:49 +00:00

pipelines

[BugFix] QA pipeline edge case: align_to_words=True in QuestionAnsweringPipeline can lead to duplicate answers (#38761 )

2025-06-16 15:01:22 +00:00

quantizers

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

sagemaker

[Refactor] Relative imports wherever we can (#21880 )

2023-03-02 09:45:42 +01:00

utils

Add V-JEPA for video classification model (#38788 )

2025-06-13 17:56:15 +01:00

__init__.py

🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866 )

2025-05-22 11:38:26 +02:00

activations_tf.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

activations.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

audio_utils.py

Add CSM model (#36719 )

2025-05-07 10:20:13 -04:00

cache_utils.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

configuration_utils.py

Add support for MiniMax's MiniMax-Text-01 (#35831 )

2025-06-04 09:38:40 +02:00

convert_graph_to_onnx.py

Update ruff to 0.11.2 (#36962 )

2025-03-25 16:00:11 +01:00

convert_pytorch_checkpoint_to_tf2.py

Set weights_only in torch.load (#36991 )

2025-03-27 14:55:50 +00:00

convert_slow_tokenizer.py

Add optional RMSNorm support to BitNet quantization (config + layers) (#38087 )

2025-05-16 12:38:06 +02:00

convert_slow_tokenizers_checkpoints_to_fast.py

Use pyupgrade --py39-plus to improve code (#36843 )

2025-03-20 14:39:44 +00:00

convert_tf_hub_seq_to_seq_bert_to_pytorch.py

Use pyupgrade --py39-plus to improve code (#36843 )

2025-03-20 14:39:44 +00:00

debug_utils.py

Fix typos in comments (#37694 )

2025-04-24 15:59:56 +01:00

dependency_versions_check.py

⚠️ Time to say goodbye to py37 (#24091 )

2023-06-28 07:22:39 +02:00

dependency_versions_table.py

build: 📌 Remove upper bound on PyTorch (#38789 )

2025-06-12 16:34:13 +02:00

dynamic_module_utils.py

🚨 🚨 Fix custom code saving (#37716 )

2025-05-26 17:37:30 +01:00

feature_extraction_sequence_utils.py

Use pyupgrade --py39-plus to improve code (#36843 )

2025-03-20 14:39:44 +00:00

feature_extraction_utils.py

Deprecate TF + JAX (#38758 )

2025-06-11 17:28:06 +01:00

file_utils.py

[core] Large/full refactor of from_pretrained (#36033 )

2025-03-12 13:39:25 +01:00

hf_argparser.py

Fix Optional type annotation (#36841 )

2025-03-26 13:53:44 +00:00

hyperparameter_search.py

Fix Optional type annotation (#36841 )

2025-03-26 13:53:44 +00:00

image_processing_base.py

🚨 🚨 Fix custom code saving (#37716 )

2025-05-26 17:37:30 +01:00

image_processing_utils_fast.py

Add args support for fast image processors (#37018 )

2025-05-16 12:01:46 -04:00

image_processing_utils.py

Add Optional to remaining types (#37808 )

2025-04-28 14:20:45 +01:00

image_transforms.py

🔴 Video processors as a separate class (#35206 )

2025-05-12 11:55:51 +02:00

image_utils.py

🔴 Video processors as a separate class (#35206 )

2025-05-12 11:55:51 +02:00

keras_callbacks.py

Use pyupgrade --py39-plus to improve code (#36843 )

2025-03-20 14:39:44 +00:00

masking_utils.py

Fix masking utils (#38783 )

2025-06-12 11:00:46 +02:00

model_debugging_utils.py

Remove all traces of low_cpu_mem_usage (#38792 )

2025-06-12 16:39:33 +02:00

modelcard.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

modeling_attn_mask_utils.py

Fix attention mask expansion when converting to executorch (#38637 )

2025-06-09 15:00:55 +00:00

modeling_flash_attention_utils.py

Initialize flash attn flag (#38768 )

2025-06-12 14:06:13 +00:00

modeling_flax_outputs.py

Add Optional to types (#37163 )

2025-04-03 16:38:01 +01:00

modeling_flax_pytorch_utils.py

Use OSError (#38712 )

2025-06-10 12:13:49 +00:00

modeling_flax_utils.py

Deprecate TF + JAX (#38758 )

2025-06-11 17:28:06 +01:00

modeling_gguf_pytorch_utils.py

Support loading Gemma3 QAT GGUF models (#37649 )

2025-04-22 11:23:17 +02:00

modeling_layers.py

Introduce GradientCheckpointingLayer (#37223 )

2025-04-22 11:33:31 +01:00

modeling_outputs.py

Fix past_key_values type hint in model output types (#37953 )

2025-05-13 13:36:49 +00:00

modeling_rope_utils.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

modeling_tf_outputs.py

Add Optional to types (#37163 )

2025-04-03 16:38:01 +01:00

modeling_tf_pytorch_utils.py

Stop TF weight rename reDOS (#38325 )

2025-05-26 16:58:51 +01:00

modeling_tf_utils.py

Deprecate TF + JAX (#38758 )

2025-06-11 17:28:06 +01:00

modeling_utils.py

add default mapping to peft integration

2025-06-16 10:23:51 +02:00

optimization_tf.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

optimization.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

processing_utils.py

[video processors] support frame sampling within processors (#38105 )

2025-06-12 09:34:30 +00:00

py.typed

Add py.typed (#37022 )

2025-04-02 14:17:27 +01:00

pytorch_utils.py

protect dtensor import (#38496 )

2025-05-30 17:36:00 +02:00

safetensors_conversion.py

Change back to Thread for SF conversion (#35236 )

2024-12-12 16:05:04 +01:00

testing_utils.py

Expectation fixes and added AMD expectations (#38729 )

2025-06-13 16:14:58 +02:00

tf_utils.py

Add Optional to types (#37163 )

2025-04-03 16:38:01 +01:00

time_series_utils.py

Use pyupgrade --py39-plus to improve code (#36843 )

2025-03-20 14:39:44 +00:00

tokenization_utils_base.py

refactor create_token_type_ids_from_sequences (#37681 )

2025-06-12 23:24:43 +02:00

tokenization_utils_fast.py

refactor can_save_slow_tokenizer (#37722 )

2025-05-23 17:29:38 +02:00

tokenization_utils.py

Add Optional to types (#37163 )

2025-04-03 16:38:01 +01:00

trainer_callback.py

Fix Optional type annotation (#36841 )

2025-03-26 13:53:44 +00:00

trainer_pt_utils.py

Remove Deprecated verbose arg in LayerWiseDummyScheduler (#38197 )

2025-05-19 13:49:11 +00:00

trainer_seq2seq.py

[generation] Less verbose warnings by default (#38179 )

2025-05-19 10:03:37 +00:00

trainer_utils.py

update seed_worker to set seed based on worker_id and rank (#37980 )

2025-05-12 15:59:16 +00:00

trainer.py

Fix trainer.py not showing signature columns (#38465 )

2025-06-13 15:39:29 +00:00

training_args_seq2seq.py

[docs] Remove sortish_sampler (#35539 )

2025-01-07 12:06:19 -08:00

training_args_tf.py

Use pyupgrade --py39-plus to improve code (#36843 )

2025-03-20 14:39:44 +00:00

training_args.py

Use HF papers (#38184 )

2025-06-13 11:07:09 +00:00

video_processing_utils.py

[video processors] support frame sampling within processors (#38105 )

2025-06-12 09:34:30 +00:00

video_utils.py

[video processors] support frame sampling within processors (#38105 )

2025-06-12 09:34:30 +00:00