NielsRogge
11d0feacce
[AutoModelForMaskGeneration] Remove duplicate code ( #38622 )
...
Remove duplicate code
2025-06-25 10:00:13 +02:00
efsotr
3ee72af6b6
Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 ( #37332 )
...
* Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1
* fix code format
* add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1
* add assert loss is not nan
2025-06-25 07:58:34 +00:00
ranzhejiang
ae32f1ad11
Add zero dim tensor check when using flash_attention ( #38280 )
...
* Add zero dim tensor check when using flash_attention
Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>
* Add zero dim tensor check when using flash_attention
Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>
---------
Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>
2025-06-25 09:48:50 +02:00
StevenBucaille
ca402e2116
[LightGlue] Fixed attribute usage from descriptor_dim to keypoint_detector_descriptor_dim ( #39021 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
fix: fix descriptor dimension handling in LightGlue model
2025-06-24 23:32:07 +01:00
Marcel Ambo Ndowah
48b6ef0238
Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code,… ( #38954 )
...
* Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.)
* Update quicktour.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-24 11:48:15 -07:00
Dmitry
ea9a30923e
[HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre-processing ( #39002 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* ThreadPool instead of Pool for parallel pre-processing
* ThreadPool only if hpu available
2025-06-24 20:24:50 +02:00
ivarflakstad
995666edb5
Skip sdpa dispatch on flash test due to unsupported head dims ( #39010 )
2025-06-24 20:16:56 +02:00
ivarflakstad
f367c6337d
Update self-comment-ci.yml user list ( #39014 )
...
add ivarflakstad to self-comment-ci.yml
2025-06-24 20:13:36 +02:00
Tugsbayasgalan Manlaibaatar
67d36dc1d7
Fix bugs in DynamicCache ( #37880 )
...
* Fix bugs in DynamicCache
* Updarte
* Update
* Lint
* lint
* Rename test
* update
* update
2025-06-24 19:43:40 +02:00
eustlb
6bdd4ec952
Add kyutai stt ( #38909 )
...
* first draft
* cleaner version
* udpate tests + modeling
* add tests
* init
* udpate test_modeling_common
* fix tests
* csm Processor draft
* convertion update
* mimi cache padding convolutions draft
* mimi streaming udpates
* update mimi padding cache test
* udpate cache padding mimi test
* make style mimi
* updates generate moshi asr
* moshi asr integration tests (single + batched)
* update tests
* update conversion script
* good default sliding window value
* udpdate generate
* update test checkpoint
* nit
* fix mimi
* fix codec prefix
* revert
* revert
* update config
* update config
* unnecessary mimi input restriction
* remove delay in tokens
* remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask
* test update
* modular update
* make style
* nit
* rename
* create codec model generation config at init
* remove delay
* max_new_tokens/length warning
* correct conv1 padding cache import for modular
* nit
* fix on encoder_past_key_values
* convert modular
* move frame_size to config
* move frame_size to config
* update test name
* handle first token is bos
* better handling of max_new_tokens
* fix
* fix batch size in test input prep
* update docstring
* convert modular
* make style
* make style
* add feature extractor
* correct modular convention name for feature_extraction file
* update convertion script
* doc processor
* update doc
* udpate init
* update model type
* fixes
* update tests
* fix
* make
* add doc
* nit
* fix
* doc
* auto mappings
* doc
* nit
* convert modular
* doc
* nit
* extend _keep_in_fp32_modules to enforce fp32
* renaming to stt
* doc update + test update
* doc fixes
* doc fix
* doc fix
* fix musicgen tests
* fix musicgen tests
* make style
* fix musicgen tests
* correct frame_rate config param for mimi
* update mimi test
* revert update mimi test
* enforce cpu test
* move cache init in cache class
* convert modular
* docstring update
* update model id
* feature_extractor -> feature_extraction (SEW)
* convert modular
* update model id
2025-06-24 18:01:15 +02:00
Mohamed Mekkouri
08bf7f1afe
Add kernelize to transformers ( #38205 )
...
* fix
* fix
* fix flow
* remove non compiling path
* change
* style
* fix
* update
* update pin
* revert
2025-06-24 17:38:54 +02:00
Avihu Dekel
be10d4df60
Granite speech - minor fixes to support training with the HF trainer ( #38833 )
...
* ensure the query is updated during training
avoid unused parameters that DDP does not like
* avoid a crash when `kwargs` contain `padding=True`
trainers often pass this argument automatically
* minor
* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)
* minor - most feature extractors has a `sampling_rate` property
2025-06-24 17:06:52 +02:00
Cyril Vallez
e1e11b0299
Fix undeterministic order in modular dependencies ( #39005 )
...
* sort correctly
* Update modeling_minimax.py
* Update modular_model_converter.py
2025-06-24 17:04:33 +02:00
7mile
bdf5fb70aa
Skip non-selected experts for qwen3_moe ( #38133 )
...
* fix(qwen3moe): skip experts with no workload
* avoid tolist and also update other moe models
* fix: should squeeze 0-dim only
2025-06-24 16:33:48 +02:00
Tanuj Rai
719058c625
Update attention_visualizer.py ( #37860 )
2025-06-24 16:21:36 +02:00
Mylon Jones
9f42c1f192
Added scikit-learn to the example image-classification requirements.txt ( #37506 )
...
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-24 15:24:02 +02:00
Cyril Vallez
1636a7bcb9
Fixes for Arcee model ( #39001 )
...
* fix modular
* Update modular_arcee.py
* fix
2025-06-24 15:23:52 +02:00
Crystalcareai
71de20b818
Add Arcee model support ( #38621 )
...
* Add Arcee model support to transformers
- Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer
- Use LlamaTokenizer for tokenization
- Add FX graph support for Arcee models
- Create lazy loading module structure for Arcee
* feat: update YARN scaling and RoPE validation for Arcee model
* feat: add auto_docstring checkpoint config to Arcee model classes
* docs: add pre-trained model weights reference to Arcee configuration files
* refactor: move RoPE utilities to dedicated modeling_rope_utils module
* Add comprehensive test suite for Arcee model
- Add test_modeling_arcee.py following standard transformers test patterns
- Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add specific test for ReLU² activation in ArceeMLP
- Add RoPE scaling tests including YARN support
- Follow CausalLMModelTest pattern used by similar models
* Add documentation for Arcee model
- Add comprehensive model documentation with usage examples
- Include all model variants in autodoc
- Add to table of contents in proper alphabetical order
- Fixes documentation coverage for Arcee model classes
* Make style/fixup
* fix copyright year
* Sync modular conversion
* revert in legacy supported models in src/transformers/utils/fx
* cleaned redundant code in modular_arcee.py
* cleaned testing
* removed pretraining tp
* fix styles
* integration testing
---------
Co-authored-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>
2025-06-24 15:05:29 +02:00
Anton Vlasjuk
23c89a6732
[Attention
] Small fix on output attentions ( #38948 )
...
small fix
2025-06-24 14:42:10 +02:00
Dianana
4f650040a6
Removing extra space in large command for speech-pretraining example ( #38705 )
...
Removing extra space in Large command
2025-06-24 12:24:56 +00:00
Raushan Turganbay
d3d835d4fc
[qwen] refactor attentions for vision/audio ( #38930 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* refactor attentions in vision/audio
* remove fa2 import
* make config the only args
* pass along kwargs from modality encoders
* style
2025-06-24 10:53:52 +02:00
vb
2e4c045540
🔴 Update default dtype
for pipelines to auto
( #38882 )
...
* check typing
* Fallback to fp32 if auto not supported.
* up.
* feedback from review.
* make style.
2025-06-24 10:39:18 +02:00
casinca
21cb353b7b
[docs] Typos - Single GPU efficient training features ( #38964 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Typos
- corrected bf16 training argument
- corrected header for SDPA
* improved readability for SDPA suggested by @stevhliu
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-23 12:33:10 -07:00
Yih-Dar
f9be71b34d
Fix rag
( #38585 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:42:46 +02:00
Yusuf Shihata
9eac19eb59
[Feature] Support is_split_into_words
in the TokenClassificationPipeline
. ( #38818 )
...
* some fixes
* some fixes
* now the pipeline can take list of tokens as input and is_split_into_words argument
* now the pipeline can take list of tokens as input and is_split_into_words argument
* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input
* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input
* solving test problems
* some fixes
* some fixes
* modify tests
* aligning start and end correctly
* adding tests
* some formatting
* some formatting
* some fixes
* some fixes
* some fixes
* resolve conflicts
* removing unimportant lines
* removing unimportant lines
* generalize to other languages
* generalize to other languages
* generalize to other languages
* generalize to other languages
2025-06-23 15:31:32 +00:00
Yih-Dar
2ce02b98bf
fix mistral
and mistral3
tests ( #38978 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:07:18 +02:00
Yoni Gozlan
b6b4d43d6d
Add support for auto_docstring with model outputs ( #38242 )
...
* experiment auto_docstring model outputs
* Fix PatchTSMixer
* Add check model output docstring to check_auto_docstring and fix all model outputs docstring
* add reordering of docstring in check_docstrings
* add check for redundant docstring in check_docstrings, remove redundant docstrings
* refactor check_auto_docstring
* make style
* fix copies
* remove commented code
* change List-> list Tuple-> tuple in docstrings
* fix modular
* make style
* Fix modular vipllava
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-23 10:39:41 -04:00
kallewoof
0c98f24889
fix: add __bool__ operator to tokenizer to avoid bloated asserts ( #38899 )
...
* fix: add __bool__ operator to tokenizer to avoid bloated asserts
When a user does 'assert tokenizer' to ensure that the tokenizer is not None, they inadvertently set off a rather expensive process in the '__len__()' operator. This fix adds a trivial '__bool__()' that returns True, so that a None tokenizer asserts and an actual tokenizer returns True when asserted, without calling length op.
* typo
2025-06-23 14:32:16 +00:00
Yoni Gozlan
d29482cc91
Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors ( #38157 )
...
* add working idefics2 fast and improvements for fast nested images processing
* add fast image processors idefics 3 and smolvlm
* cleanup tests
* fic doc idefics2
* PR review and fix issues after merge
* Force providing disable_grouping to group_images_by_shape
* simplify group_images_by_shape
* fix modular
* Fix nits after review
2025-06-23 14:17:25 +00:00
Rémi Ouazan
1a96127e46
Break tie in Expectations and gemma3 fixes ( #38943 )
...
* Added major / minor version to Expectations ordering
* Added fixes to gemma3
* Style
2025-06-23 15:13:27 +02:00
Pavel Iakubovskii
84d19be41e
Apply GradientCheckpointingLayer to the whole repo ( #38913 )
...
* first batch (4)
* align
* altclip
* beit
* bert
* yolos
* dino, pvt_v2
* bark, bart, bert_generation
* big_bird, biogpt
* blnderbot, bloom
* bridgetower
* camambert, canine, chameleon
* chinese clip, clap, clip
* codegen, conditional detr, convbert
* dab_detr, data2vec
* dbrx, deberta
* deberta, decicion_tranformer, deformable_detr
* deit, deta, mctct
* detr, dinov2, distilbert
* donut, dpt, electra
* ernie, esm, falcon
* flava, fnet, falcon_mamba
* focalnet, git, gpt2
* gpt - bigcode, neo, neox
* gptj, groupvit
* idefics2, idefics3
* ijepa, imagegpt, internvl
* jetmoe, kosmos2, layoutlm
* layoutlm2-3, led
* lilt, longformer, longt5, luke
* m2m, mamba1-2
* marian, markuplm, mask2former
* maskformer
* mbart, megatron_bert, mimi
* mixtral, mlcd
* mobilevit1-2, modernbert
* moshi, mpt, mra
* mt5, musicgen
* mvp, nemotron
* nllb_moe
* nystromformer, omdet_turbo
* opt, owlvit, owlv2
* pegasus, pegasus_x, presimmon
* phimoe, pix2struct, pixtral
* plbart, pop2piano, prophetnet
* qwen2*
* qwen2, qwen3 moe, rec gemma
* rembert
* roberta
* roberta prelayernorm
* roc_bert, roformer, rwkv
* sam, sam_hq
* seggpt, smolvlm, speech_to_text
* splinter, stablelm, swin
* swin2sr, switch_transformer, t5, table_transformer
* tapas, time_series_tranformer, timesformer
* trocr, tvp, umt5
* videomae, vilt, visual_bert
* vit, vit_mae, vit_msn
* vitpose_backbone, vits, vivit
* whisper. x_clip, xglm
* xlm_roberta, xmod
* yoso
* zamba
* vitdet, wav2vec2, wav2vec2_bert
* unispeech, wav2vec_conformer
* wavlm
* speecht5
* swinv2
* sew / _d
* seamless_mt4 / _v2
* deprecated models update
* bros
* gemma2, gemma3
* got, hiera, hubert, llama4, mllama, oneformer, phi, olmoe, informer
* fixup
* Add use_cache=False and past_key_value=None to GradientCheckpointingLayer
* fixup
* fix prophetnet
* fix bigbird_pegasus
* fix blenderbot
* fix mbart
* fix mvp
* fix zamba2
* fix bart
* fix blenderbot_small
* fix codegen
* Update gradient checkpointing layer to support more past_key_values arg names
* fix data2vec vision
* fix deformable_detr
* fix gptj
* fix led
* fix m2m_100
* add comment
* fix nnlb_moe
* Fix pegasus_x
* fix plbart
* udop
* fix-copies: beit, wav2vec2
* fix gpt_bigcode
* fixup
* fix t5
* fix switch_transformers
* fix longt5
* fix mt5
* update tapas
* fix blip2
* update blip
* fix musicgen
* fix gpt2, trocr
* fix copies
* !!! Revert zamba, mllama
* update autoformer
* update bros
* update args / kwargs for BERT and copies
* 2nd round of updates
* update conditional detr
* Pass encoder_hidden_states as positional arg
* Update to pass encoder_decoder_position_bias as positional arg
* fixup
* biogpt modular
* modular gemma2
* modular gemma3
* modular gpt_neox
* modular informer
* modular internvl
* modular mixtral
* modular mlcd
* modular modernbert
* modular phi
* modular qwen2_5_omni
* modular qwen2_5_vl
* modular sam_hq
* modular sew
* wav2vec2_bert
* modular wav2vec2_conformer
* modular wavlm
* fixup
* Update by modular instructblipvideo
* modular data2vec_audio
* nit modular mistral
* apply modular minimax
* fix modular moonshine
* revert zamba2
* fix mask2former
* refactor idefics
2025-06-23 14:24:48 +02:00
Cyril Vallez
07aab1af1e
Remove dead protected imports ( #38980 )
...
* remove them
* more
2025-06-23 13:44:50 +02:00
Cyril Vallez
74f5e4a1fa
[modular] CLI allows positional arguments, and more defaults names for the optional arg ( #38979 )
...
* More defaults
* Update modular_model_converter.py
2025-06-23 12:40:01 +02:00
Vensen
334bf913dc
Fix(informer): Correct tensor shape for input_size=1 ( #38856 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Fix(time_series): Correct scaler tensor shape in base model
The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.
This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.
Fixes #38745
* Fix(time_series): Correct scaler tensor shape in base model
The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.
This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.
Fixes #38745
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-06-23 11:50:51 +02:00
Benoqtr
c184550daf
Fix DTensor import compatibility for PyTorch < 2.5 ( #38836 )
2025-06-23 11:25:56 +02:00
Ilyas Moutawwakil
984ff89e73
Gaudi3 CI ( #38790 )
2025-06-23 10:56:51 +02:00
DongKyu Kang
2166b6b4ff
Update blip model card ( #38513 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled
* Update docs/source/en/model_doc/blip.md
* fix(docs/source/en/model_doc/blip.md): fix redundent typo error
* fix (docs/source/en/model_doc/blip.md): modify of review contents
* fix(docs/source/en/model_doc/blip.md): modify code block
* Update blip.md
---------
Co-authored-by: devkade <mouseku@moana-master>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-20 13:46:19 -07:00
Manuel de Prada Corral
166e823f77
Fix custom generate from local directory ( #38916 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
New model PR merged notification / Notify new model (push) Has been cancelled
Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled
Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled
Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled
Fix custom generate from local directory:
1. Create parent dirs before copying files (custom_generate dir)
2. Correctly copy relative imports to the submodule file.
3. Update docs.
2025-06-20 17:36:57 +01:00
Yih-Dar
3d34b92116
Switch to use A10 progressively ( #38936 )
...
* try
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 16:10:35 +00:00
Yih-Dar
b8059e1f8f
Fix more flaky test_initialization
( #38932 )
...
* try
* try
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 17:28:32 +02:00
Cyril Vallez
5ee60f970a
Correctly raise error for awq quantization ( #38945 )
...
fix warning
2025-06-20 17:18:06 +02:00
Ákos Hadnagy
8ac2d75353
Pin PyTorch extras for AMD containers ( #38941 )
...
* Pin additional Torch packages
* Remove unused def
---------
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-06-20 12:17:21 +00:00
Pavel Iakubovskii
9120567b02
Add kwargs for timm.create_model in TimmWrapper ( #38860 )
...
* Add init kwargs for timm wrapper
* model_init_kwargs -> model_args
* add save-load test
* fixup
2025-06-20 12:00:09 +00:00
Raushan Turganbay
ff95974bc6
[static cache] fix device map per layer in VLMs ( #38488 )
...
return lm as decoder
2025-06-20 13:49:29 +02:00
Cyril Vallez
aa42987c1e
Remove ALL_LAYERNORM_LAYERS
( #38922 )
...
* remove it everywhere
* Update trainer_pt_utils.py
* Update trainer_pt_utils.py
* style
* sort list in test
* CIs
* use recursion same way as before (for intermediate layer names)
2025-06-20 12:06:48 +02:00
Yao Matrix
38a9b70786
add pytorch-xpu Dockerfile ( #38875 )
...
* first commit
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* use rls pytorch
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-20 11:42:44 +02:00
Rémi Ouazan
9bcdd5cde9
Modernbert fixes ( #38912 )
...
* Removed deprecated argument in modernbert RotaryEmbedding
* Skip test_sdpa_can_dispatch_on_flash for modernbert
---------
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-20 11:22:32 +02:00
Yih-Dar
31d30b7224
Skip some tests for now ( #38931 )
...
* try
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 11:05:49 +02:00
Cyril Vallez
0725cd6953
Remove deprecated classes in modeling_utils.py ( #38919 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* remove deprecated classes
* style
2025-06-19 19:25:20 +02:00
Hamza Benchekroun
797860c68c
feat: add flexible Liger Kernel configuration to TrainingArguments ( #38911 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* feat: add flexible Liger Kernel configuration to TrainingArguments
Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.
Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag
Example usage:
```python
TrainingArguments(
use_liger_kernel=True,
liger_kernel_config={
"rope": True,
"swiglu": True,
"cross_entropy": False,
"fused_linear_cross_entropy": True
}
)
Closes #38905
* Address comments and update Liger section in Trainer docs
2025-06-19 15:54:08 +00:00