transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-30 09:42:22 +06:00

Author	SHA1	Message	Date
Marc Sun	3c322c9cdf	fix gemma3 grad acc (#37208 ) * fix gemma3 grad acc * fix * fix * fix * fix * rmv print * rm * Update setup.py * Apply style fixes * propagate the changes --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>	2025-06-25 16:28:44 +02:00
Umar Butler	860b898d03	fix: astronomical loss with ModernBERT when using gradient checkpointing (#38982 ) (#38983 ) * fix: astronomical loss with ModernBERT when using gradient checkpointing * update the modling fix --------- Co-authored-by: Arthur <arthur.zucker@gmail.com>	2025-06-25 16:11:18 +02:00
EduardDurech	a2eb75c891	Support for Flash Attention 3 (#38972 ) * Support `flash_attn_3` Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper - Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...` An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated Based on https://github.com/huggingface/transformers/pull/36190 which has model implementations and examples which could be merged * Add tests for Flash Attention 2 and 3 parity * ci fix * FA2 compatibiity - `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids` - Remove bettertransformer check in Flash Attention 3 - Merge tests - Add licensing * ci fix * Test naming consistency * ci fix * Deprecation warning for `prepare_fa2_from_position_ids` * ci fix	2025-06-25 14:39:27 +02:00
Yuan Wu	de98fb25a3	Fix the seamless_m4t cannot work on Gaudi (#38363 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Fix the seamless_m4t cannot work on Gaudi Signed-off-by: yuanwu <yuan.wu@intel.com> * Refine the patch Signed-off-by: yuanwu <yuan.wu@intel.com> * Fix seamless_m4t_v2 crash Signed-off-by: yuanwu <yuan.wu@intel.com> * Use the patched_gather Signed-off-by: yuanwu <yuan.wu@intel.com> * Remove debug logs Signed-off-by: yuanwu <yuan.wu@intel.com> * Remove useless modifications Signed-off-by: yuanwu <yuan.wu@intel.com> * Add hpu check Signed-off-by: yuanwu <yuan.wu@intel.com> * Add comments Signed-off-by: yuanwu <yuan.wu@intel.com> --------- Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>	2025-06-25 12:40:01 +02:00
redmoe-moutain	7503cb9113	[Model] add dots1 (#38143 ) * add dots1 * address comments * fix * add link to dots1 doc * format --------- Co-authored-by: taishan <rgtjf1@163.com>	2025-06-25 11:38:25 +02:00
Biao Zhang	3ef8896906	Encoder-Decoder Gemma (#38332 ) * Initial submit * Fix bugs: 1. add __init__ file 2. tied word embedding 3. support flash/flex attention 4. model saving and loading * Code refactor: * Rename encdecgemma to t5gemma. * Split attention into self- and cross-attention * Split stack into encoder and decoder * Add test cases * Add auto configuration * Update configurations. * Fix bugs related to copy and attribute checks * Fix type union * Fix merge errors * run ruff format * Run make style and update tests. * Add t5gemma model doc. * ruff and style formatting. * Add missed module config. * Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.). * Update model doc. * Minor updates following Arthur's comments: * replace docstrings with auto_docstrings * remove checkpoint layers * remove deprecate_kwargs * fix rebase errors * Fix docstring issues. * fix t5gemma doc issue. * run ruff format * Updates: * split encoder-only model out * make t5gemmamodel encoder-decoder only * update token and sequence classification * update tests	2025-06-25 09:05:10 +00:00
Yuxuan Zhang	af9870265e	GLM-4.1V Model support (#38431 ) * 20250508 Model Architecture * Update modeling_glm4v.py * Update modeling_glm4v.py * Update modeling_glm4v.py * update 1447 * 0526 * update * format * problem * update * update with only image embed diff * Final * upload * update * 1 * upload with ruff * update * update * work * 1 * 1 * update with new note * 2 * Update convert_glm4v_mgt_weights_to_hf.py * Update tokenization_auto.py * update with new format * remove rmsnrom * draft with videos * draft * update * update * fix for review problem * try to remove min_pixel * update * for test * remove timestamps * remove item * update with remove * change * update 2200 * update * Delete app.py * format * update * Update test_video_processing_glm4v.py * 1 * 2 * use new name * Update test_video_processing_glm4v.py * remove docs * change * update for image processors update * 2108 * 2128 * Update modular_glm4v.py * 1 * update some * update * rename * 1 * remove tests output * 2 * add configuration * update * Update test_video_processing_glm4v.py * fix simple forward tests * update with modular * 1 * fix more tests * fix generation test * fix beam search and init * modular changed * fix beam search in case of single-image/video. Fails if multiple visuals per text * update processor * update test * pass * fix beam search * update * param correct * Update convert_glm4v_mgt_weights_to_hf.py * 1 * Update test_modeling_glm4v.py * 4 * 2 * 2123 video process * 2 * revert * 1 * 2 * revert processing * update preprocesor * changed * 1 * update * update * 6 * update * update * update * Delete tmp.txt * config * Update video_processing_glm4v.py * apply modular correctly * move functions * fix order * update the longest_edge * style * simplify a lot * fix random order of classes * skip integration tests * correctly fix the tests * fix TP plan --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-06-25 10:43:05 +02:00
null-pointer-access	7b3807387b	Drop unnecessary tokens in GPT2Model generation (#39016 ) Drop unnecessary tokens in GPT2Model generation. Co-authored-by: Yi Pan <conlesspan@outlook.com>	2025-06-25 08:29:00 +00:00
Raushan Turganbay	e212ff9e6a	[video processor] support torchcodec and decrease cuda memory usage (#38880 ) * don't move the whole video to GPU * add torchcodec * add tests * make style * instrucblip as well * consistency * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/video_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-06-25 08:23:37 +00:00
NielsRogge	11d0feacce	[AutoModelForMaskGeneration] Remove duplicate code (#38622 ) Remove duplicate code	2025-06-25 10:00:13 +02:00
efsotr	3ee72af6b6	Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 (#37332 ) * Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 * fix code format * add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1 * add assert loss is not nan	2025-06-25 07:58:34 +00:00
ranzhejiang	ae32f1ad11	Add zero dim tensor check when using flash_attention (#38280 ) * Add zero dim tensor check when using flash_attention Signed-off-by: ranzhejiang <zhejiang.ran@intel.com> * Add zero dim tensor check when using flash_attention Signed-off-by: ranzhejiang <zhejiang.ran@intel.com> --------- Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>	2025-06-25 09:48:50 +02:00
StevenBucaille	ca402e2116	[LightGlue] Fixed attribute usage from descriptor_dim to keypoint_detector_descriptor_dim (#39021 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details fix: fix descriptor dimension handling in LightGlue model	2025-06-24 23:32:07 +01:00
Marcel Ambo Ndowah	48b6ef0238	Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code,… (#38954 ) * Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.) * Update quicktour.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-24 11:48:15 -07:00
Dmitry	ea9a30923e	[HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre-processing (#39002 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * ThreadPool instead of Pool for parallel pre-processing * ThreadPool only if hpu available	2025-06-24 20:24:50 +02:00
ivarflakstad	995666edb5	Skip sdpa dispatch on flash test due to unsupported head dims (#39010 )	2025-06-24 20:16:56 +02:00
ivarflakstad	f367c6337d	Update self-comment-ci.yml user list (#39014 ) add ivarflakstad to self-comment-ci.yml	2025-06-24 20:13:36 +02:00
Tugsbayasgalan Manlaibaatar	67d36dc1d7	Fix bugs in DynamicCache (#37880 ) * Fix bugs in DynamicCache * Updarte * Update * Lint * lint * Rename test * update * update	2025-06-24 19:43:40 +02:00
eustlb	6bdd4ec952	Add kyutai stt (#38909 ) * first draft * cleaner version * udpate tests + modeling * add tests * init * udpate test_modeling_common * fix tests * csm Processor draft * convertion update * mimi cache padding convolutions draft * mimi streaming udpates * update mimi padding cache test * udpate cache padding mimi test * make style mimi * updates generate moshi asr * moshi asr integration tests (single + batched) * update tests * update conversion script * good default sliding window value * udpdate generate * update test checkpoint * nit * fix mimi * fix codec prefix * revert * revert * update config * update config * unnecessary mimi input restriction * remove delay in tokens * remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask * test update * modular update * make style * nit * rename * create codec model generation config at init * remove delay * max_new_tokens/length warning * correct conv1 padding cache import for modular * nit * fix on encoder_past_key_values * convert modular * move frame_size to config * move frame_size to config * update test name * handle first token is bos * better handling of max_new_tokens * fix * fix batch size in test input prep * update docstring * convert modular * make style * make style * add feature extractor * correct modular convention name for feature_extraction file * update convertion script * doc processor * update doc * udpate init * update model type * fixes * update tests * fix * make * add doc * nit * fix * doc * auto mappings * doc * nit * convert modular * doc * nit * extend _keep_in_fp32_modules to enforce fp32 * renaming to stt * doc update + test update * doc fixes * doc fix * doc fix * fix musicgen tests * fix musicgen tests * make style * fix musicgen tests * correct frame_rate config param for mimi * update mimi test * revert update mimi test * enforce cpu test * move cache init in cache class * convert modular * docstring update * update model id * feature_extractor -> feature_extraction (SEW) * convert modular * update model id	2025-06-24 18:01:15 +02:00
Mohamed Mekkouri	08bf7f1afe	Add kernelize to transformers (#38205 ) * fix * fix * fix flow * remove non compiling path * change * style * fix * update * update pin * revert	2025-06-24 17:38:54 +02:00
Avihu Dekel	be10d4df60	Granite speech - minor fixes to support training with the HF trainer (#38833 ) * ensure the query is updated during training avoid unused parameters that DDP does not like * avoid a crash when `kwargs` contain `padding=True` trainers often pass this argument automatically * minor * Remove mel_spec lazy init, and rename to mel_filters. this ensures save_pretrained will not crash when saving the processor during training `d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)` * minor - most feature extractors has a `sampling_rate` property	2025-06-24 17:06:52 +02:00
Cyril Vallez	e1e11b0299	Fix undeterministic order in modular dependencies (#39005 ) * sort correctly * Update modeling_minimax.py * Update modular_model_converter.py	2025-06-24 17:04:33 +02:00
7mile	bdf5fb70aa	Skip non-selected experts for qwen3_moe (#38133 ) * fix(qwen3moe): skip experts with no workload * avoid tolist and also update other moe models * fix: should squeeze 0-dim only	2025-06-24 16:33:48 +02:00
Tanuj Rai	719058c625	Update attention_visualizer.py (#37860 )	2025-06-24 16:21:36 +02:00
Mylon Jones	9f42c1f192	Added scikit-learn to the example image-classification requirements.txt (#37506 ) Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-06-24 15:24:02 +02:00
Cyril Vallez	1636a7bcb9	Fixes for Arcee model (#39001 ) * fix modular * Update modular_arcee.py * fix	2025-06-24 15:23:52 +02:00
Crystalcareai	71de20b818	Add Arcee model support (#38621 ) * Add Arcee model support to transformers - Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer - Use LlamaTokenizer for tokenization - Add FX graph support for Arcee models - Create lazy loading module structure for Arcee * feat: update YARN scaling and RoPE validation for Arcee model * feat: add auto_docstring checkpoint config to Arcee model classes * docs: add pre-trained model weights reference to Arcee configuration files * refactor: move RoPE utilities to dedicated modeling_rope_utils module * Add comprehensive test suite for Arcee model - Add test_modeling_arcee.py following standard transformers test patterns - Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add specific test for ReLU² activation in ArceeMLP - Add RoPE scaling tests including YARN support - Follow CausalLMModelTest pattern used by similar models * Add documentation for Arcee model - Add comprehensive model documentation with usage examples - Include all model variants in autodoc - Add to table of contents in proper alphabetical order - Fixes documentation coverage for Arcee model classes * Make style/fixup * fix copyright year * Sync modular conversion * revert in legacy supported models in src/transformers/utils/fx * cleaned redundant code in modular_arcee.py * cleaned testing * removed pretraining tp * fix styles * integration testing --------- Co-authored-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>	2025-06-24 15:05:29 +02:00
Anton Vlasjuk	23c89a6732	[`Attention`] Small fix on output attentions (#38948 ) small fix	2025-06-24 14:42:10 +02:00
Dianana	4f650040a6	Removing extra space in large command for speech-pretraining example (#38705 ) Removing extra space in Large command	2025-06-24 12:24:56 +00:00
Raushan Turganbay	d3d835d4fc	[qwen] refactor attentions for vision/audio (#38930 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * refactor attentions in vision/audio * remove fa2 import * make config the only args * pass along kwargs from modality encoders * style	2025-06-24 10:53:52 +02:00
vb	2e4c045540	🔴 Update default `dtype` for pipelines to `auto` (#38882 ) * check typing * Fallback to fp32 if auto not supported. * up. * feedback from review. * make style.	2025-06-24 10:39:18 +02:00
casinca	21cb353b7b	[docs] Typos - Single GPU efficient training features (#38964 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Typos - corrected bf16 training argument - corrected header for SDPA * improved readability for SDPA suggested by @stevhliu Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-23 12:33:10 -07:00
Yih-Dar	f9be71b34d	Fix `rag` (#38585 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-23 17:42:46 +02:00
Yusuf Shihata	9eac19eb59	[Feature] Support `is_split_into_words` in the `TokenClassificationPipeline`. (#38818 ) * some fixes * some fixes * now the pipeline can take list of tokens as input and is_split_into_words argument * now the pipeline can take list of tokens as input and is_split_into_words argument * now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input * now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input * solving test problems * some fixes * some fixes * modify tests * aligning start and end correctly * adding tests * some formatting * some formatting * some fixes * some fixes * some fixes * resolve conflicts * removing unimportant lines * removing unimportant lines * generalize to other languages * generalize to other languages * generalize to other languages * generalize to other languages	2025-06-23 15:31:32 +00:00
Yih-Dar	2ce02b98bf	fix `mistral` and `mistral3` tests (#38978 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-23 17:07:18 +02:00
Yoni Gozlan	b6b4d43d6d	Add support for auto_docstring with model outputs (#38242 ) * experiment auto_docstring model outputs * Fix PatchTSMixer * Add check model output docstring to check_auto_docstring and fix all model outputs docstring * add reordering of docstring in check_docstrings * add check for redundant docstring in check_docstrings, remove redundant docstrings * refactor check_auto_docstring * make style * fix copies * remove commented code * change List-> list Tuple-> tuple in docstrings * fix modular * make style * Fix modular vipllava --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-06-23 10:39:41 -04:00
kallewoof	0c98f24889	fix: add __bool__ operator to tokenizer to avoid bloated asserts (#38899 ) * fix: add __bool__ operator to tokenizer to avoid bloated asserts When a user does 'assert tokenizer' to ensure that the tokenizer is not None, they inadvertently set off a rather expensive process in the '__len__()' operator. This fix adds a trivial '__bool__()' that returns True, so that a None tokenizer asserts and an actual tokenizer returns True when asserted, without calling length op. * typo	2025-06-23 14:32:16 +00:00
Yoni Gozlan	d29482cc91	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 ) * add working idefics2 fast and improvements for fast nested images processing * add fast image processors idefics 3 and smolvlm * cleanup tests * fic doc idefics2 * PR review and fix issues after merge * Force providing disable_grouping to group_images_by_shape * simplify group_images_by_shape * fix modular * Fix nits after review	2025-06-23 14:17:25 +00:00
Rémi Ouazan	1a96127e46	Break tie in Expectations and gemma3 fixes (#38943 ) * Added major / minor version to Expectations ordering * Added fixes to gemma3 * Style	2025-06-23 15:13:27 +02:00
Pavel Iakubovskii	84d19be41e	Apply GradientCheckpointingLayer to the whole repo (#38913 ) * first batch (4) * align * altclip * beit * bert * yolos * dino, pvt_v2 * bark, bart, bert_generation * big_bird, biogpt * blnderbot, bloom * bridgetower * camambert, canine, chameleon * chinese clip, clap, clip * codegen, conditional detr, convbert * dab_detr, data2vec * dbrx, deberta * deberta, decicion_tranformer, deformable_detr * deit, deta, mctct * detr, dinov2, distilbert * donut, dpt, electra * ernie, esm, falcon * flava, fnet, falcon_mamba * focalnet, git, gpt2 * gpt - bigcode, neo, neox * gptj, groupvit * idefics2, idefics3 * ijepa, imagegpt, internvl * jetmoe, kosmos2, layoutlm * layoutlm2-3, led * lilt, longformer, longt5, luke * m2m, mamba1-2 * marian, markuplm, mask2former * maskformer * mbart, megatron_bert, mimi * mixtral, mlcd * mobilevit1-2, modernbert * moshi, mpt, mra * mt5, musicgen * mvp, nemotron * nllb_moe * nystromformer, omdet_turbo * opt, owlvit, owlv2 * pegasus, pegasus_x, presimmon * phimoe, pix2struct, pixtral * plbart, pop2piano, prophetnet * qwen2* * qwen2, qwen3 moe, rec gemma * rembert * roberta * roberta prelayernorm * roc_bert, roformer, rwkv * sam, sam_hq * seggpt, smolvlm, speech_to_text * splinter, stablelm, swin * swin2sr, switch_transformer, t5, table_transformer * tapas, time_series_tranformer, timesformer * trocr, tvp, umt5 * videomae, vilt, visual_bert * vit, vit_mae, vit_msn * vitpose_backbone, vits, vivit * whisper. x_clip, xglm * xlm_roberta, xmod * yoso * zamba * vitdet, wav2vec2, wav2vec2_bert * unispeech, wav2vec_conformer * wavlm * speecht5 * swinv2 * sew / _d * seamless_mt4 / _v2 * deprecated models update * bros * gemma2, gemma3 * got, hiera, hubert, llama4, mllama, oneformer, phi, olmoe, informer * fixup * Add use_cache=False and past_key_value=None to GradientCheckpointingLayer * fixup * fix prophetnet * fix bigbird_pegasus * fix blenderbot * fix mbart * fix mvp * fix zamba2 * fix bart * fix blenderbot_small * fix codegen * Update gradient checkpointing layer to support more past_key_values arg names * fix data2vec vision * fix deformable_detr * fix gptj * fix led * fix m2m_100 * add comment * fix nnlb_moe * Fix pegasus_x * fix plbart * udop * fix-copies: beit, wav2vec2 * fix gpt_bigcode * fixup * fix t5 * fix switch_transformers * fix longt5 * fix mt5 * update tapas * fix blip2 * update blip * fix musicgen * fix gpt2, trocr * fix copies * !!! Revert zamba, mllama * update autoformer * update bros * update args / kwargs for BERT and copies * 2nd round of updates * update conditional detr * Pass encoder_hidden_states as positional arg * Update to pass encoder_decoder_position_bias as positional arg * fixup * biogpt modular * modular gemma2 * modular gemma3 * modular gpt_neox * modular informer * modular internvl * modular mixtral * modular mlcd * modular modernbert * modular phi * modular qwen2_5_omni * modular qwen2_5_vl * modular sam_hq * modular sew * wav2vec2_bert * modular wav2vec2_conformer * modular wavlm * fixup * Update by modular instructblipvideo * modular data2vec_audio * nit modular mistral * apply modular minimax * fix modular moonshine * revert zamba2 * fix mask2former * refactor idefics	2025-06-23 14:24:48 +02:00
Cyril Vallez	07aab1af1e	Remove dead protected imports (#38980 ) * remove them * more	2025-06-23 13:44:50 +02:00
Cyril Vallez	74f5e4a1fa	[modular] CLI allows positional arguments, and more defaults names for the optional arg (#38979 ) * More defaults * Update modular_model_converter.py	2025-06-23 12:40:01 +02:00
Vensen	334bf913dc	Fix(informer): Correct tensor shape for input_size=1 (#38856 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Fix(time_series): Correct scaler tensor shape in base model The create_network_inputs function in TimeSeriesTransformerModel handled the scaler's loc and scale tensors inconsistently. When input_size=1, the tensors were not squeezed, leading to downstream dimension errors for models like Informer. This commit refactors the logic to unconditionally apply .squeeze(1), which correctly handles all input_size cases and fixes the bug at its source. Fixes #38745 * Fix(time_series): Correct scaler tensor shape in base model The create_network_inputs function in TimeSeriesTransformerModel handled the scaler's loc and scale tensors inconsistently. When input_size=1, the tensors were not squeezed, leading to downstream dimension errors for models like Informer. This commit refactors the logic to unconditionally apply .squeeze(1), which correctly handles all input_size cases and fixes the bug at its source. Fixes #38745 --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-06-23 11:50:51 +02:00
Benoqtr	c184550daf	Fix DTensor import compatibility for PyTorch < 2.5 (#38836 )	2025-06-23 11:25:56 +02:00
Ilyas Moutawwakil	984ff89e73	Gaudi3 CI (#38790 )	2025-06-23 10:56:51 +02:00
DongKyu Kang	2166b6b4ff	Update blip model card (#38513 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled Details Build documentation / build (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details Update Transformers metadata / build_and_package (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled Details * Update docs/source/en/model_doc/blip.md * fix(docs/source/en/model_doc/blip.md): fix redundent typo error * fix (docs/source/en/model_doc/blip.md): modify of review contents * fix(docs/source/en/model_doc/blip.md): modify code block * Update blip.md --------- Co-authored-by: devkade <mouseku@moana-master> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-20 13:46:19 -07:00
Manuel de Prada Corral	166e823f77	Fix custom generate from local directory (#38916 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details New model PR merged notification / Notify new model (push) Has been cancelled Details Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled Details Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled Details Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled Details Fix custom generate from local directory: 1. Create parent dirs before copying files (custom_generate dir) 2. Correctly copy relative imports to the submodule file. 3. Update docs.	2025-06-20 17:36:57 +01:00
Yih-Dar	3d34b92116	Switch to use A10 progressively (#38936 ) * try * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-20 16:10:35 +00:00
Yih-Dar	b8059e1f8f	Fix more flaky `test_initialization` (#38932 ) * try * try * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-20 17:28:32 +02:00
Cyril Vallez	5ee60f970a	Correctly raise error for awq quantization (#38945 ) fix warning	2025-06-20 17:18:06 +02:00

... 2 3 4 5 6 ...

19564 Commits