transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

Author	SHA1	Message	Date
Joao Gante	1d45d90e5d	[tests] remove TF tests (uses of `require_tf`) (#38944 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * remove uses of require_tf * remove redundant import guards * this class has no tests * nits * del tf rng comment	2025-06-25 17:29:10 +00:00
eustlb	551e48f182	[Kyutai-STT] correct model type + model id (#39035 ) * correct model type + model id * udpate doc * init fix * style !!!	2025-06-25 16:09:00 +00:00
Anton Lozhkov	dad0e87c79	Add SmolLM3 (#38755 ) * init smollm3 * integration tests * config quirks * docs stub * rests round 2 * tests round 3 * tests round 4 * bring SWA back * config checker pls * final checkpoint * style and copies * Update src/transformers/models/smollm3/modular_smollm3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/smollm3/modular_smollm3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-06-25 15:12:15 +00:00
Quentin Lhoest	858f9b71a8	Remove script datasets in tests (#38940 ) * remove trust_remote_code * again * Revert "Skip some tests for now (#38931)" This reverts commit `31d30b7224`. * again * style * again * again * style * fix integration test * fix tests * style * fix * fix * fix the last ones * style * last one * fix last * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-25 14:31:20 +00:00
EduardDurech	a2eb75c891	Support for Flash Attention 3 (#38972 ) * Support `flash_attn_3` Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper - Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...` An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated Based on https://github.com/huggingface/transformers/pull/36190 which has model implementations and examples which could be merged * Add tests for Flash Attention 2 and 3 parity * ci fix * FA2 compatibiity - `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids` - Remove bettertransformer check in Flash Attention 3 - Merge tests - Add licensing * ci fix * Test naming consistency * ci fix * Deprecation warning for `prepare_fa2_from_position_ids` * ci fix	2025-06-25 14:39:27 +02:00
redmoe-moutain	7503cb9113	[Model] add dots1 (#38143 ) * add dots1 * address comments * fix * add link to dots1 doc * format --------- Co-authored-by: taishan <rgtjf1@163.com>	2025-06-25 11:38:25 +02:00
Biao Zhang	3ef8896906	Encoder-Decoder Gemma (#38332 ) * Initial submit * Fix bugs: 1. add __init__ file 2. tied word embedding 3. support flash/flex attention 4. model saving and loading * Code refactor: * Rename encdecgemma to t5gemma. * Split attention into self- and cross-attention * Split stack into encoder and decoder * Add test cases * Add auto configuration * Update configurations. * Fix bugs related to copy and attribute checks * Fix type union * Fix merge errors * run ruff format * Run make style and update tests. * Add t5gemma model doc. * ruff and style formatting. * Add missed module config. * Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.). * Update model doc. * Minor updates following Arthur's comments: * replace docstrings with auto_docstrings * remove checkpoint layers * remove deprecate_kwargs * fix rebase errors * Fix docstring issues. * fix t5gemma doc issue. * run ruff format * Updates: * split encoder-only model out * make t5gemmamodel encoder-decoder only * update token and sequence classification * update tests	2025-06-25 09:05:10 +00:00
Yuxuan Zhang	af9870265e	GLM-4.1V Model support (#38431 ) * 20250508 Model Architecture * Update modeling_glm4v.py * Update modeling_glm4v.py * Update modeling_glm4v.py * update 1447 * 0526 * update * format * problem * update * update with only image embed diff * Final * upload * update * 1 * upload with ruff * update * update * work * 1 * 1 * update with new note * 2 * Update convert_glm4v_mgt_weights_to_hf.py * Update tokenization_auto.py * update with new format * remove rmsnrom * draft with videos * draft * update * update * fix for review problem * try to remove min_pixel * update * for test * remove timestamps * remove item * update with remove * change * update 2200 * update * Delete app.py * format * update * Update test_video_processing_glm4v.py * 1 * 2 * use new name * Update test_video_processing_glm4v.py * remove docs * change * update for image processors update * 2108 * 2128 * Update modular_glm4v.py * 1 * update some * update * rename * 1 * remove tests output * 2 * add configuration * update * Update test_video_processing_glm4v.py * fix simple forward tests * update with modular * 1 * fix more tests * fix generation test * fix beam search and init * modular changed * fix beam search in case of single-image/video. Fails if multiple visuals per text * update processor * update test * pass * fix beam search * update * param correct * Update convert_glm4v_mgt_weights_to_hf.py * 1 * Update test_modeling_glm4v.py * 4 * 2 * 2123 video process * 2 * revert * 1 * 2 * revert processing * update preprocesor * changed * 1 * update * update * 6 * update * update * update * Delete tmp.txt * config * Update video_processing_glm4v.py * apply modular correctly * move functions * fix order * update the longest_edge * style * simplify a lot * fix random order of classes * skip integration tests * correctly fix the tests * fix TP plan --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-06-25 10:43:05 +02:00
Raushan Turganbay	e212ff9e6a	[video processor] support torchcodec and decrease cuda memory usage (#38880 ) * don't move the whole video to GPU * add torchcodec * add tests * make style * instrucblip as well * consistency * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/video_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-06-25 08:23:37 +00:00
efsotr	3ee72af6b6	Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 (#37332 ) * Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 * fix code format * add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1 * add assert loss is not nan	2025-06-25 07:58:34 +00:00
ivarflakstad	995666edb5	Skip sdpa dispatch on flash test due to unsupported head dims (#39010 )	2025-06-24 20:16:56 +02:00
Tugsbayasgalan Manlaibaatar	67d36dc1d7	Fix bugs in DynamicCache (#37880 ) * Fix bugs in DynamicCache * Updarte * Update * Lint * lint * Rename test * update * update	2025-06-24 19:43:40 +02:00
eustlb	6bdd4ec952	Add kyutai stt (#38909 ) * first draft * cleaner version * udpate tests + modeling * add tests * init * udpate test_modeling_common * fix tests * csm Processor draft * convertion update * mimi cache padding convolutions draft * mimi streaming udpates * update mimi padding cache test * udpate cache padding mimi test * make style mimi * updates generate moshi asr * moshi asr integration tests (single + batched) * update tests * update conversion script * good default sliding window value * udpdate generate * update test checkpoint * nit * fix mimi * fix codec prefix * revert * revert * update config * update config * unnecessary mimi input restriction * remove delay in tokens * remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask * test update * modular update * make style * nit * rename * create codec model generation config at init * remove delay * max_new_tokens/length warning * correct conv1 padding cache import for modular * nit * fix on encoder_past_key_values * convert modular * move frame_size to config * move frame_size to config * update test name * handle first token is bos * better handling of max_new_tokens * fix * fix batch size in test input prep * update docstring * convert modular * make style * make style * add feature extractor * correct modular convention name for feature_extraction file * update convertion script * doc processor * update doc * udpate init * update model type * fixes * update tests * fix * make * add doc * nit * fix * doc * auto mappings * doc * nit * convert modular * doc * nit * extend _keep_in_fp32_modules to enforce fp32 * renaming to stt * doc update + test update * doc fixes * doc fix * doc fix * fix musicgen tests * fix musicgen tests * make style * fix musicgen tests * correct frame_rate config param for mimi * update mimi test * revert update mimi test * enforce cpu test * move cache init in cache class * convert modular * docstring update * update model id * feature_extractor -> feature_extraction (SEW) * convert modular * update model id	2025-06-24 18:01:15 +02:00
Crystalcareai	71de20b818	Add Arcee model support (#38621 ) * Add Arcee model support to transformers - Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer - Use LlamaTokenizer for tokenization - Add FX graph support for Arcee models - Create lazy loading module structure for Arcee * feat: update YARN scaling and RoPE validation for Arcee model * feat: add auto_docstring checkpoint config to Arcee model classes * docs: add pre-trained model weights reference to Arcee configuration files * refactor: move RoPE utilities to dedicated modeling_rope_utils module * Add comprehensive test suite for Arcee model - Add test_modeling_arcee.py following standard transformers test patterns - Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add specific test for ReLU² activation in ArceeMLP - Add RoPE scaling tests including YARN support - Follow CausalLMModelTest pattern used by similar models * Add documentation for Arcee model - Add comprehensive model documentation with usage examples - Include all model variants in autodoc - Add to table of contents in proper alphabetical order - Fixes documentation coverage for Arcee model classes * Make style/fixup * fix copyright year * Sync modular conversion * revert in legacy supported models in src/transformers/utils/fx * cleaned redundant code in modular_arcee.py * cleaned testing * removed pretraining tp * fix styles * integration testing --------- Co-authored-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>	2025-06-24 15:05:29 +02:00
Yih-Dar	f9be71b34d	Fix `rag` (#38585 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-23 17:42:46 +02:00
Yusuf Shihata	9eac19eb59	[Feature] Support `is_split_into_words` in the `TokenClassificationPipeline`. (#38818 ) * some fixes * some fixes * now the pipeline can take list of tokens as input and is_split_into_words argument * now the pipeline can take list of tokens as input and is_split_into_words argument * now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input * now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input * solving test problems * some fixes * some fixes * modify tests * aligning start and end correctly * adding tests * some formatting * some formatting * some fixes * some fixes * some fixes * resolve conflicts * removing unimportant lines * removing unimportant lines * generalize to other languages * generalize to other languages * generalize to other languages * generalize to other languages	2025-06-23 15:31:32 +00:00
Yih-Dar	2ce02b98bf	fix `mistral` and `mistral3` tests (#38978 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-23 17:07:18 +02:00
Yoni Gozlan	d29482cc91	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 ) * add working idefics2 fast and improvements for fast nested images processing * add fast image processors idefics 3 and smolvlm * cleanup tests * fic doc idefics2 * PR review and fix issues after merge * Force providing disable_grouping to group_images_by_shape * simplify group_images_by_shape * fix modular * Fix nits after review	2025-06-23 14:17:25 +00:00
Rémi Ouazan	1a96127e46	Break tie in Expectations and gemma3 fixes (#38943 ) * Added major / minor version to Expectations ordering * Added fixes to gemma3 * Style	2025-06-23 15:13:27 +02:00
Cyril Vallez	07aab1af1e	Remove dead protected imports (#38980 ) * remove them * more	2025-06-23 13:44:50 +02:00
Ilyas Moutawwakil	984ff89e73	Gaudi3 CI (#38790 )	2025-06-23 10:56:51 +02:00
Yih-Dar	b8059e1f8f	Fix more flaky `test_initialization` (#38932 ) * try * try * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-20 17:28:32 +02:00
Pavel Iakubovskii	9120567b02	Add kwargs for timm.create_model in TimmWrapper (#38860 ) * Add init kwargs for timm wrapper * model_init_kwargs -> model_args * add save-load test * fixup	2025-06-20 12:00:09 +00:00
Rémi Ouazan	9bcdd5cde9	Modernbert fixes (#38912 ) * Removed deprecated argument in modernbert RotaryEmbedding * Skip test_sdpa_can_dispatch_on_flash for modernbert --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-06-20 11:22:32 +02:00
Yih-Dar	31d30b7224	Skip some tests for now (#38931 ) * try * [test all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-20 11:05:49 +02:00
Hamza Benchekroun	797860c68c	feat: add flexible Liger Kernel configuration to TrainingArguments (#38911 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * feat: add flexible Liger Kernel configuration to TrainingArguments Add support for granular Liger Kernel configuration through a new `liger_kernel_config` parameter in TrainingArguments. This allows users to selectively enable/disable specific kernels (rope, swiglu, cross_entropy, etc.) instead of the current approach that rely on default configuration. Features: - Add `liger_kernel_config` dict parameter to TrainingArguments - Support selective kernel application for all supported models - Maintain full backward compatibility with existing `use_liger_kernel` flag Example usage: ```python TrainingArguments( use_liger_kernel=True, liger_kernel_config={ "rope": True, "swiglu": True, "cross_entropy": False, "fused_linear_cross_entropy": True } ) Closes #38905 * Address comments and update Liger section in Trainer docs	2025-06-19 15:54:08 +00:00
ivarflakstad	af6120b3eb	Skip sdpa tests if submodule does not support sdpa (#38907 )	2025-06-19 13:11:01 +00:00
Yih-Dar	5d26a38735	Fix `FalconMambaIntegrationTests` (#38566 ) * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-19 13:50:33 +02:00
Yao Matrix	a9ce8c69c9	align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284 ) * siwtch to device agnostic autocast in nemotron to align xpu behavior w/ cuda Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix issue Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt Signed-off-by: Matrix Yao <matrix.yao@intel.com> * refine Signed-off-by: Matrix Yao <matrix.yao@intel.com> * update get_autocast_gpu_dtype to device agnostic one Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix style Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-06-19 11:48:23 +00:00
Yih-Dar	b949747b54	Fix `fsmt` tests (#38904 ) * fix 1 * fix 2 * fix 3 * fix 4 * fix 5 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-19 10:56:34 +02:00
Sam Rae	b922b22ec2	36978 \| Fast image processor for DPT model (#37481 ) * chore: ran codegen script * test: test_image_processor_properties * test: test_image_processor_from_dict_with_kwargs * test: wip - test_padding * test: test_padding * test: test_keep_aspect_ratio * wip * test * test: wip * test: wip * test: test_call_segmentation_maps, wip * chore: tidy up * test: test_call_segmentation_maps * fix: test_save_load_fast_slow * test: reduce labels * chore: make fixup * chore: rm comment * chore: tidy * chore remove comment * refactor: no need to infer channel dimesnion * refactor: encapsulate logic for preparing segmentation maps * refactor: improve readability of segmentation_map preparation * improvement: batched version of pad_image * chore: fixup * docs * chore: make quality * chore: remove unecessary comment * fix: add SemanticSegmentationMixin * feat: add post_process_depth_estimation to fast dpt image processor * chore: fix formatting * remove max_height, max_width * fix: better way of processin segmentation maps - copied from Beit Fast processor * chore: formatting + remove TODO * chore: fixup styles * chore: remove unecessary line break * chore: core review suggestion to remove autodocstring * fix: add do_reduce_labels logic + refactor - refactor preprocess logic to make it consistent with other processors - add missing reduce labels logic * refactor: remove deprecated mixin * chore: fixup * use modular for dpt + final nit changes * fix style --------- Co-authored-by: Samuel Rae <samuelrae@Samuels-Air.fritz.box> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-06-18 17:33:29 +00:00
Yuanyuan Chen	1fc67a25c6	More PYUP fixes (#38883 ) More pyup fixes Signed-off-by: cyy <cyyever@outlook.com>	2025-06-18 14:38:08 +01:00
艾梦	cb0f604192	Fix HQQ model param device transfer issue (#38466 ) * Fix HQQ model param device transfer issue * modify a comment * clear the code and add test for hqq device/dtype * fix test hqq code quality of imports --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-06-18 15:09:00 +02:00
Yih-Dar	c77bcd889f	Fix `qwen3_moe` tests (#38865 ) * try 1 * try 2 * try 3 * try 4 * try 5 * try 6 * try 7 * try 8 * try 9 * try 10 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-18 14:36:03 +02:00
Cyril Vallez	5a95ed5ca0	🚨🚨 Fix initialization of Mask2Former (#38864 ) * Correctly fix init Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com> * add back the block, breaking BC but this is correct author's code * override the test for params needing it --------- Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>	2025-06-18 09:46:22 +02:00
Yih-Dar	309e8c96f2	Fix `phi4_multimodal` tests (#38816 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-18 09:39:17 +02:00
Yao Matrix	3526e25d3d	enable misc test cases on XPU (#38852 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * enable misc test cases on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * tweak bamba ground truth on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * remove print Signed-off-by: YAO Matrix <matrix.yao@intel.com> * one more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-06-18 09:20:49 +02:00
Matt	508a704055	No more Tuple, List, Dict (#38797 ) * No more Tuple, List, Dict * make fixup * More style fixes * Docstring fixes with regex replacement * Trigger tests * Redo fixes after rebase * Fix copies * [test all] * update * [test all] * update * [test all] * make style after rebase * Patch the hf_argparser test * Patch the hf_argparser test * style fixes * style fixes * style fixes * Fix docstrings in Cohere test * [test all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-17 19:37:18 +01:00
StevenBucaille	e5a9ce48f7	Add LightGlue model (#31718 ) * init * chore: various changes to LightGlue * chore: various changes to LightGlue * chore: various changes to LightGlue * chore: various changes to LightGlue * Fixed dynamo bug and image padding tests * refactor: applied refactoring changes from SuperGlue's concat, batch and stack functions to LightGlue file * tests: removed sdpa support and changed expected values * chore: added some docs and refactoring * chore: fixed copy to superpoint.image_processing_superpoint.convert_to_grayscale * feat: adding batch implementation * feat: added validation for preprocess and post process method to LightGlueImageProcessor * chore: changed convert_lightglue_to_hf script to comply with new standard * chore: changed lightglue test values to match new lightglue config pushed to hub * chore: simplified convert_lightglue_to_hf conversion map * feat: adding batching implementation * chore: make style * feat: added threshold to post_process_keypoint_matching method * fix: added missing instructions that turns keypoints back to absolute coordinate before matching forward * fix: added typehint and docs * chore: make style * [run-slow] lightglue * fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching * tests: added CUDA proof tests similar to SuperGlue * chore: various changes to modeling_lightglue.py - Added "Copies from" statements for copied functions from modeling_superglue.py - Added missing docstrings - Removed unused functions or classes - Removed unnecessary statements - Added missing typehints - Added comments to the main forward method * chore: various changes to convert_lightglue_to_hf.py - Added model saving - Added model reloading * chore: fixed imports in lightglue files * [run-slow] lightglue * chore: make style * [run-slow] lightglue * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * [run-slow] lightglue * chore: Applied some suggestions from review - Added missing typehints - Refactor "cuda" to device variable - Variable renaming - LightGlue output order changed - Make style * fix: added missing grayscale argument in image processor in case use of SuperPoint keypoint detector * fix: changed lightglue HF repo to lightglue_superpoint with grayscale default to True * refactor: make keypoints `(batch_size, num_keypoints, keypoint_dim)` through forward and unsqueeze only before attention layer * refactor: refactor do_layer_keypoint_pruning * tests: added tests with no early stop and keypoint pruning * refactor: various refactoring to modeling_lightglue.py - Removed unused functions - Renamed variables for consistency - Added comments for clarity - Set methods to private in LightGlueForKeypointMatching - Replaced tensor initialization to list then concatenation - Used more pythonic list comprehension for repetitive instructions * refactor: added comments and renamed filter_matches to get_matches_from_scores * tests: added copied from statement with superglue tests * docs: added comment to prepare_keypoint_matching_output function in tests * [run-slow] lightglue * refactor: reordered _concat_early_stopped_outputs in LightGlue class * [run-slow] lightglue * docs: added lightglue.md model doc * docs: added Optional typehint to LightGlueKeypointMatchingOutput * chore: removed pad_images function * chore: set do_grayscale default value to True in LightGlueImageProcessor * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * docs: added missing LightGlueConfig typehint in nn.Module __init__ methods * docs: removed unnecessary code in docs * docs: import SuperPointConfig only from a TYPE_CHECKING context * chore: use PretrainedConfig arguments `num_hidden_layers` and `num_attention_heads` instead of `num_layers` and `num_heads` * chore: added organization as arg in convert_lightglue_to_hf.py script * refactor: set device variable * chore: added "gelu" in LightGlueConfig as hidden_act parameter * docs: added comments to reshape.flip.reshape instruction to perform cross attention * refactor: used batched inference for keypoint detector forward pass * fix: added fix for SDPA tests * docs: fixed docstring for LightGlueImageProcessor * [run-slow] lightglue * refactor: removed unused line * refactor: added missing arguments in LightGlueConfig init method * docs: added missing LightGlueConfig typehint in init methods * refactor: added checkpoint url as default variable to verify models output only if it is the default url * fix: moved print message inside if statement * fix: added log assignment r removal in convert script * fix: got rid of confidence_thresholds as registered buffers * refactor: applied suggestions from SuperGlue PR * docs: changed copyright to 2025 * refactor: modular LightGlue * fix: removed unnecessary import * feat: added plot_keypoint_matching method to LightGlueImageProcessor with matplotlib soft dependency * fix: added missing import error for matplotlib * Updated convert script to push on ETH org * fix: added missing licence * fix: make fix-copies * refactor: use cohere apply_rotary_pos_emb function * fix: update model references to use ETH-CVG/lightglue_superpoint * refactor: add and use intermediate_size attribute in config to inherit CLIPMLP for LightGlueMLP * refactor: explicit variables instead of slicing * refactor: use can_return_tuple decorator in LightGlue model * fix: make fix-copies * docs: Update model references in `lightglue.md` to use the correct pretrained model from ETH-CVG * Refactor LightGlue configuration and processing classes - Updated type hints for `keypoint_detector_config` in `LightGlueConfig` to use `SuperPointConfig` directly. - Changed `size` parameter in `LightGlueImageProcessor` to be optional. - Modified `position_embeddings` in `LightGlueAttention` and `LightGlueAttentionBlock` to be optional tuples. - Cleaned up import statements across multiple files for better readability and consistency. * refactor: Update LightGlue configuration to enforce eager attention implementation - Added `attn_implementation="eager"` to `keypoint_detector_config` in `LightGlueConfig` and `LightGlueAttention` classes. - Removed unnecessary logging related to attention implementation fallback. - Cleaned up import statements for better readability. * refactor: renamed message into attention_output * fix: ensure device compatibility in LightGlueMatchAssignmentLayer descriptor normalization - Updated the normalization of `m_descriptors` to use the correct device for the tensor, ensuring compatibility across different hardware setups. * refactor: removed Conv layers from init_weights since LightGlue doesn't have any * refactor: replace add_start_docstrings with auto_docstring in LightGlue models - Updated LightGlue model classes to utilize the new auto_docstring utility for automatic documentation generation. - Removed legacy docstring handling to streamline the code and improve maintainability. * refactor: simplify LightGlue image processing tests by inheriting from SuperGlue - Refactored `LightGlueImageProcessingTester` and `LightGlueImageProcessingTest` to inherit from their SuperGlue counterparts, reducing code duplication. - Removed redundant methods and properties, streamlining the test setup and improving maintainability. * test: forced eager attention implementation to LightGlue model tests - Updated `LightGlueModelTester` to include `attn_implementation="eager"` in the model configuration. - This change aligns the test setup with the recent updates in LightGlue configuration for eager attention. * refactor: update LightGlue model references * fix: import error * test: enhance LightGlue image processing tests with setup method - Added a setup method in `LightGlueImageProcessingTest` to initialize `LightGlueImageProcessingTester`. - Included a docstring for `LightGlueImageProcessingTester` to clarify its purpose. * refactor: added LightGlue image processing implementation to modular file * refactor: moved attention blocks into the transformer layer * fix: added missing import * fix: added missing import in __all__ variable * doc: added comment about enforcing eager attention because of SuperPoint * refactor: added SuperPoint eager attention comment and moved functions to the closest they are used --------- Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-06-17 18:10:23 +02:00
Yih-Dar	2507169bf6	Fix `qwen3` tests (#38862 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * fix * update * update * update * update * update * update * format --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-17 15:21:36 +02:00
Yih-Dar	c61ca64aaa	Fix `qwen2_5_vl` tests (#38845 ) * fix * breakpoint() * breakpoint() * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-17 10:55:24 +02:00
Yusuf Shihata	a7593a1d1f	[BugFix] QA pipeline edge case: `align_to_words=True` in `QuestionAnsweringPipeline` can lead to duplicate answers (#38761 ) * fixing the problem align_to_words=True leading to duplicate solutions * adding tests * some fixes * some fixes * changing the handle_duplicate_answers=False by default * some fixese * some fixes * make the duplicate handling the default behaviour and merge duplicates * make the duplicate handling the default behaviour	2025-06-16 15:01:22 +00:00
Pavel Iakubovskii	9bec2654ed	Add V-JEPA for video classification model (#38788 ) * adding model and conversion scripts * add imports to test vjepa conversion * fix imports and make conversion work * fix computation for short side * replace attention with library attention function * cleanup more attention classes * remove config overrides * add test cases, fix some of the failing ones * fix the model outputs * fix outputs of the model per review * fix too big model test case * fix styling __init__.py * fix initialization test * remove all asserts per review * update sorting unsorting logic as per feedback * remove is_video per review * remove another is_video segment * remove unwanted stuff * small fixes * add docstrings for the model * revert adding vjepa2 config here * update styling * add config docstrings (wip) * fix dpr issue * removed test failing issues * update styles * merge predictor configs into main config * remove processing code, add video processor * remove permute which is not necessary now * fix styles * updated vjepa2 to be in video_processing_auto * update comment for preprocessing * test integration test and fix the outputs * update test values, change test to look at repeated frames for a given image * add a simple video processing test * refactoring pixel_values_videos and upload ckpts to original * fix torch_fx test cases * remove unused config * add all config docstrings * add more integration tests * add basic doc * revert unwanted styling changes * working make fixup * Fix model_type in config * Add ForVideoClassification model * update attention implementation to fit new hf standards * fix the preprocessing logic, ensure it matches the original model * remove use_rope logic, cleanup * fix docstrings * Further cleanup, update doc * Fix model prefix * fix get_vision_features * VJEPA2Embeddings style refactor * nit, style comment * change modules default values * Only `str` activation in config * GradientCheckpointingLayer * fixup * fix conversion script * Remove return_dict * remove None return typehint * Refactor VJEPA2Layer, remove use_SiLU * Fix fx tests * dpr -> drop_path_rates * move ModelOutput on top format docs bit * update docs * update docs * update doc example * remove prune_heads from model * remove unused config params * refactor embed signature * Add vjepa to docs * Fix config docstring * attention head * update defaults * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix import * Min refactoring * Update HUB_SOURCE and HUB_REPO in conversion script * Add missing headers * VJEPA -> V-JEPA in docs * Add image to doc * fix style * fix init weights * change checkpoint name in modeling tests * Initial cls head setup * remove rop attention from head (not needed) * remove swigluffn - not needed * Add siglip layer * Replace with siglip layer * Rename Siglip - VJEPA2 * remove unused modules * remove siglip mlp * nit * remove MLP * Refactor head cross attention * refactor VJEPA2HeadCrossAttentionLayer * nit renaming * fixup * remove commented code * Add cls head params to config * depth from config * move pooler + classifier to the model * Update for cls model signature * move layers, rename a bit * fix docs * update weights init * remove typehint for init * add to auto-mapping * enable tests * Add conversion script * fixup * add to docs * fix docs * nit * refactor for mapping * clean * Add integration test * Fixing multi gpu test * update not-split-modules * update video cls test tolerance * Increase test_inference_image tolerance * Update no-split modules for multi gpu * Apply suggestions from code review * fixing multi-gpu * fix docstring * Add cls snippet to docs * Update checkpoint	2025-06-13 17:56:15 +01:00
Matt	b82a45b3b4	Refactor DBRX tests to use CausalLMModelTest base classes (#38475 ) * Refactor DBRX tests to use CausalLMModelTest base classes - Changed DbrxModelTester to inherit from CausalLMModelTester - Changed DbrxModelTest to inherit from CausalLMModelTest - Removed duplicate methods that are already in base classes - Added required class attributes for model classes - Updated pipeline_model_mapping to include feature-extraction - Kept DBRX-specific configuration and test methods - Disabled RoPE tests as DBRX's rotary embedding doesn't accept config parameter This refactoring reduces code duplication and follows the pattern established in other causal LM model tests like Gemma. * Apply style fixes * Trigger tests * Refactor DBRX test * Make sure the DBRX-specific settings are handled * Use the attribute_map * Fix attribute map --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-06-13 16:22:12 +01:00
Rémi Ouazan	9ff246db00	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
Yih-Dar	e39172ecab	Fix `llava_next` tests (#38813 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-13 15:19:41 +02:00
Quentin Gallouédec	de24fb63ed	Use HF papers (#38184 ) * Use hf papers * Hugging Face papers * doi to hf papers * style	2025-06-13 11:07:09 +00:00
Guang Yang	7f00b325f8	Unbreak optimum-executorch (#38646 ) * Unbreak optimum-executorch * use static cache if has layer_types but no sliding_window * revert view on kv_arange --------- Co-authored-by: Guang Yang <guangyang@fb.com>	2025-06-13 11:13:32 +02:00
Cyril Vallez	4b8ec667e9	Remove all traces of `low_cpu_mem_usage` (#38792 ) * remove it from all py files * remove it from the doc * remove it from examples * style * remove traces of _fast_init * Update test_peft_integration.py * CIs	2025-06-12 16:39:33 +02:00
Yih-Dar	eea35a15b0	Fix `mllama` (#38704 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-12 16:15:35 +02:00
Yih-Dar	c87058beb8	Fix `llava_onevision` tests (#38791 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-12 15:06:49 +02:00
Yih-Dar	d4e7aa5526	Fix `qwen_2_5 omni` (#38658 ) * fix * fix * break style * break style * Apply style fixes * break style * Apply style fixes * fix modular --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-06-12 14:43:54 +02:00
Yih-Dar	89c46b648d	Skip some export tests on torch 2.7 (#38677 ) * skip * fix * better check * Update import_utils.py --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-06-12 12:47:15 +02:00
Raushan Turganbay	27459025b8	[video processors] support frame sampling within processors (#38105 ) * apply updates smolVLM (still needs workaround for chat template) * add other models * dump qwen omni for now, come back later * port qwen omni from their impl * wait, all qwens sample videos in same way! * clean up * make smolvlm backwards compatible and fix padding * dix some tests * fox smolvlm tests * more clean up and test fixing * delete unused arg * fix * address comments * style * fix test	2025-06-12 09:34:30 +00:00
Matt	9f563ada70	Deprecate TF + JAX (#38758 ) * Scatter deprecation warnings around * Delete the tests * Make logging work properly!	2025-06-11 17:28:06 +01:00
Marc Sun	11ad9be153	Better typing for num_items_in_batch (#38728 ) * fix * style * type checking ? * maybe this ? * fix * can't be an int anymore * fix	2025-06-11 16:26:41 +02:00
Pavel Iakubovskii	84710a4291	Add V-JEPA 2 (#38746 ) * adding model and conversion scripts * add imports to test vjepa conversion * fix imports and make conversion work * fix computation for short side * replace attention with library attention function * cleanup more attention classes * remove config overrides * add test cases, fix some of the failing ones * fix the model outputs * fix outputs of the model per review * fix too big model test case * fix styling __init__.py * fix initialization test * remove all asserts per review * update sorting unsorting logic as per feedback * remove is_video per review * remove another is_video segment * remove unwanted stuff * small fixes * add docstrings for the model * revert adding vjepa2 config here * update styling * add config docstrings (wip) * fix dpr issue * removed test failing issues * update styles * merge predictor configs into main config * remove processing code, add video processor * remove permute which is not necessary now * fix styles * updated vjepa2 to be in video_processing_auto * update comment for preprocessing * test integration test and fix the outputs * update test values, change test to look at repeated frames for a given image * add a simple video processing test * refactoring pixel_values_videos and upload ckpts to original * fix torch_fx test cases * remove unused config * add all config docstrings * add more integration tests * add basic doc * revert unwanted styling changes * working make fixup * Fix model_type in config * update attention implementation to fit new hf standards * fix the preprocessing logic, ensure it matches the original model * remove use_rope logic, cleanup * fix docstrings * Further cleanup, update doc * Fix model prefix * fix get_vision_features * VJEPA2Embeddings style refactor * nit, style comment * change modules default values * Only `str` activation in config * GradientCheckpointingLayer * fixup * fix conversion script * Remove return_dict * remove None return typehint * Refactor VJEPA2Layer, remove use_SiLU * Fix fx tests * dpr -> drop_path_rates * move ModelOutput on top format docs bit * update docs * update docs * update doc example * remove prune_heads from model * remove unused config params * refactor embed signature * Add vjepa to docs * Fix config docstring * update defaults * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix import * Min refactoring * Update HUB_SOURCE and HUB_REPO in conversion script * Add missing headers * VJEPA -> V-JEPA in docs * Add image to doc * fix style * fix init weights * change checkpoint name in modeling tests --------- Co-authored-by: Koustuv Sinha <koustuv.sinha@mail.mcgill.ca> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Koustuv Sinha <koustuvsinha@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2025-06-11 15:00:08 +01:00
Yao Matrix	c8c1e525ed	from 1.11.0, torchao.prototype.low_bit_optim is promoted to torchao.optim (#38689 ) * since 1.11.0, torchao.prototype.low_bit_optim is promoted to torchao.optim Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix review comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-06-11 12:16:25 +00:00
Raushan Turganbay	380e6ea406	[llava] fix integration tests with Siglip (#38732 ) fix llava siglip test	2025-06-11 08:09:16 +00:00
Yih-Dar	8257734b5f	Fix `llava` tests (#38722 ) * update * fix 1 * fix 2 * fix 3 * fix 4 * fix 5 * fix 6 * fix 7 * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-10 13:53:17 +02:00
Yih-Dar	04cdf83244	Update some tests for torch 2.7.1 (#38701 ) * fix 1 * fix 2 * fix 3 * fix 4 * fp16 * break * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-10 11:46:52 +02:00
rdonggroq	afdb821318	Fix smart resize (#38706 ) * Fix smart_resize bug * Add smart_resize test * Remove unnecessary error checking * Fix smart_resize tests --------- Co-authored-by: Richard Dong <rdong@rdong.c.groq-143208.internal>	2025-06-10 08:59:22 +00:00
Yih-Dar	e55983e2b9	Fix `aya_vision` test (#38674 ) * fix 1: load_in_4bit=True, * fix 2: decorateor * fixfix 2: breakpoint * fixfix 3: update * fixfix 4: fast * fixfix 5: cond * fixfix 5: cond * fixfix 6: cuda 8 * ruff * breakpoint * dtype * a10 * a10 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-09 22:18:52 +02:00
Yih-Dar	ebeec13609	Fix `InternVL` integration test (#38612 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled Details Build documentation / build (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled Details Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details Update Transformers metadata / build_and_package (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled Details Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled Details Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled Details * fix * fix * fix OOM --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-07 08:30:47 +02:00
Yih-Dar	3fb7e7bc01	Skip torchscript tests for 2 models (#38643 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-06 20:17:37 +02:00
Yao Matrix	dc76eff12b	remove ipex_optimize_model usage (#38632 ) * remove ipex_optimize_model usage Signed-off-by: YAO Matrix <matrix.yao@intel.com> * update Dockerfile Signed-off-by: root <root@a4bf01945cfe.jf.intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: root <root@a4bf01945cfe.jf.intel.com> Co-authored-by: root <root@a4bf01945cfe.jf.intel.com>	2025-06-06 20:04:44 +02:00
Yih-Dar	02f946a038	Don't run `AriaForConditionalGenerationModelTest` on CircleCI (#38615 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details git rid of this model Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-06 11:30:31 +02:00
Yih-Dar	fca6748246	Improve `test_initialization` for `SwiftFormer` (#38636 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-06 10:47:10 +02:00
Yih-Dar	92a87134ea	update `ColQwen2ModelIntegrationTest` (#38583 ) * update * update * update * update * 4 bit * 8 bit * final --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-06 10:41:17 +02:00
Raushan Turganbay	dbfc79c17c	[generation] bring back tests on vision models (#38603 ) * bring back geenration tests on VLMs * remove head mask tests overwritten	2025-06-06 08:23:15 +00:00
Yih-Dar	3e35ea1782	Improve `test_initialization` (#38607 ) * fix flaky init tests * fix flaky init tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-06 10:08:05 +02:00
Yao Matrix	89542fb81c	enable more test cases on xpu (#38572 ) * enable glm4 integration cases on XPU, set xpu expectation for blip2 Signed-off-by: Matrix YAO <matrix.yao@intel.com> * more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * refine wording Signed-off-by: YAO Matrix <matrix.yao@intel.com> * refine test case names Signed-off-by: YAO Matrix <matrix.yao@intel.com> * run Signed-off-by: YAO Matrix <matrix.yao@intel.com> * add gemma2 and chameleon Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix review comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix YAO <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-06-06 09:29:51 +02:00
Armaghan Shakir	31023b6909	Fix `MiniMax` (docs and integration tests checkpoint) (#38575 ) * update checkpoints for integration tests * minor fixes in docs	2025-06-06 08:43:11 +02:00
Sai-Suraj-27	88912b8e95	Remove `isort` from dependencies (#38616 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details Removed isort as a dependency	2025-06-05 16:42:49 +00:00
David Klank	fa921ad854	fix spelling errors (#38608 ) * fix errors test_modeling_mllama.py * fix error test_modeling_video_llava.py * fix errors test_processing_common.py	2025-06-05 13:57:23 +01:00
Henrik Matthiesen	1fed6166c0	added fast image processor for ZoeDepth and expanded tests accordingly (#38515 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * added fast image processor for ZoeDepth and expanded tests accordingly * added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now * final edits for zoedept fast image processor * final minor edit for zoedepth fast imate procesor	2025-06-04 22:59:17 +00:00
Dmitry Rogozhkin	8046aff520	tests/roformer: fix couple roformer tests on gpus (#38570 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details Fix "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu" error running the following roformer tests on GPUs (CUDA or XPU): ``` tests/models/roformer/test_modeling_roformer.py::RoFormerSinusoidalPositionalEmbeddingTest::test_basic tests/models/roformer/test_modeling_roformer.py::RoFormerSelfAttentionRotaryPositionEmbeddingTest::test_apply_rotary_position_embeddings ``` Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-06-04 18:45:56 +02:00
Anton Vlasjuk	1dc619e59f	[`FlexAttn`] Fix models with unique characteristics (#38433 ) * fix * style * check * check 2 * add deepseek workaround	2025-06-04 13:37:28 +02:00
Yih-Dar	ff3fad61e3	Fix `deepseekv3` (#38562 ) * fix 1 * fix 2 * fix 3 * fix 4 * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-04 11:40:14 +02:00
Yih-Dar	3c995c1fdc	Fix `chameleon` tests (#38565 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-04 10:13:35 +02:00
Armaghan Shakir	55736eea99	Add support for MiniMax's MiniMax-Text-01 (#35831 ) * end-to-end architecture * lightning-attn: refactor, clean, optimize * put minimax_text_01 in other files * use latest __init__ standards and auto-generate modular * support attention_mask for lightning-attn * Revert "use latest __init__ standards and auto-generate modular" This reverts commit `d8d3c409d8`. * fix modular conversion * pass both attention masks instead of tuple * formatting * Updated Dynamic Cache * created MiniMaxText01Cache * fix hardcoded slope_rate * update attn_type_list in config * fix lightning when use_cache=False * copy tests from mixtral * (checkpoint) all tests pass for normal attention * fix all unittests * fix import sorting * fix consistency and formatting tests * fix config * update tests, since changes in main * fix seq_len error * create dummy docs * fix checkpoint * add checkpoint in config docstring * run modular_conversion * update docs * fix checkpoint path and update tests * fix ruff * remove repeated expected_slice * update docs * rename "minimax-text-01" to "minimax" * inherit config from mixtral * remove from docs in other languages * undo files that should be untouched * move minimax to end in conversation docs * use MiniMaxForCausalLM as it is * ruff fixes * run modular * fix docstring example in causallm * refactor attention loop and decay factors * refactor config in modular * run modular * refactor cache * rename static_cache to linear_cache * make positional embeddings necessary * remove unnecessary layernorms declarations * fix import in tests * refactor attention in next tokens * remove outdated code * formatting and modular * update tests * rename layernorm alpha/beta factors * register decay factors as buffers * remove unused declarations of decay factors * update config for alpha/beta factors * run modular * remove head_dim in tests * remove minimax from fx.py * remove stuff that is not really needed * update __init__ * update qkv torch.split Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix qkv torch.split * quality fixes * remove mistakenly added dummy * purge unused ModelTester code * fix-copies * run fix-copies * fix head_dim * write cache formatting tests * remove postnorm * avoid contiguous in attention current states * update expected_slice * add generation test for integration * fix dtype in generation test * update authors * update with changes in main * update graident checkpointing and minor fixes * fix mutable attn_type_list * rename: attn_type -> layer_type * update for layer_types * update integration tests * update checkpoint * clean overview in docs --------- Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-06-04 09:38:40 +02:00
Rémi Ouazan	037acf1d10	[janus] Fix failing tests on mi3XX (#38426 ) * Fix multiple devices error on Janus * Fix AttributeError on Janus BOI token * Initialize lm first in Janus to get correct device map * Added expectations for Janus test_model_generate_images * Fixed JanusVisionEncoderLayer being split across devices * Code formatting * Adding modeling file * Reverted changes out of scope for this PR	2025-06-04 09:38:10 +02:00
Driss Guessous	279000bb70	Name change AOPermod -> ModuleFqn (#38456 ) Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-06-03 15:43:31 +00:00
Matej Sirovatka	caf708da1b	[TP] Change command in tests to `python3` (#38555 ) * Fix: change to `python3` * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-03 11:03:33 +00:00
jiqing-feng	814432423c	update emu3 test (#38543 ) Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-06-03 11:02:01 +02:00
Raushan Turganbay	55ec319de6	Don't use default attn if pre-set in sub-config (#38526 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * don't use default attn if pre-set in sib-config * style * add a test maybe	2025-06-03 07:53:07 +00:00
Raushan Turganbay	bf68dd9e6e	[tests] expand flex-attn test for vision models (#38434 ) * expand the test for VLMs * typo * mark models `supports_flex` + expand test for additional kwargs * flex attn for refactored vision models * fix copies * fix * unskip * style * address comments	2025-06-03 07:40:44 +00:00
Yih-Dar	de4cf5a38e	Fix blip2 tests (#38510 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * fix 1: not sure * fix 2: _supports_flex_attn = False * fix 3: embedding_output = self.layernorm(query_embeds.to(self.layernorm.weight.dtype)) * fix 4: query_embeds = query_embeds.to(self.layernorm.weight.dtype) * fix 5: text_embeds = text_embeds.to(dtype=torch.float16) * fix 5: question_embeds.to(dtype=torch.float16) * fix 6: text_embeds = text_embeds.to(dtype=self.itm_head.weight.dtype) * fix 7: image_embeds and question_embeds * fix 8: fix other 2 fp16 tests * fix 9: fix T5 OOM * fix 10: fix T5 OOM * fix 11: fix T5 * fix 11: fix T5 beam * fix 12: _supports_sdpa=False * fix 12: style and expect * revert * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-02 22:46:35 +02:00
Yih-Dar	ccc859620a	Fix `Gemma2IntegrationTest` (#38492 ) * fix * fix * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * skip-ci * update * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-02 22:45:09 +02:00
Yaswanth Gali	1094dd34f7	Remove type annotation in Siglip Attention Module (#38503 ) * Remove type annotation * remove print statement	2025-06-02 17:51:07 +02:00
Ita Zaporozhets	05ad826002	remove unhandled parameter (#38145 )	2025-06-02 15:57:32 +02:00
Tony Wu	c72ba69441	Add ColQwen2 to 🤗 transformers (#35778 ) * feat: add colqwen2 (wip) * tests: fix test_attention_outputs * tests: reduce hidden size to accelerate tests * tests: fix `test_attention_outputs` 🥳 * fix: fix wrong parent class for `ColQwen2ForRetrievalOutput` * fix: minor typing and style changes * chore: run `make style` * feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor` * tests: tweak comments * style: apply ruff formatter * feat: move default values for `visual_prompt_prefix` and `query_prefix` * docs: update ColQwen2 model card * docs: tweak model cards * docs: add required example config checkpoint * tests: update expected scores in integration test * docs: tweak quickstart snippets * fix: address PR comments * tests: fix colqwen2 tests + tweak comment in colpali test * tests: unskip useful tests * fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string * fix: fix ColPali outputs when `return_dict == False` * fix: fix issue with PaliGemma output not being a dict * docs: set default dtype to bfloat16 in quickstart snippets * fix: fix error when `return_dict=False` in ColPali and ColQwen2 * tests: fix special tokens not being replaced in input_ids * style: fix lint * fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json` * fix: remove unused `padding_side` in ColQwen2 model * docs: update ColQwen2's model doc * fix: fix harcoded vlm backbone class in ColQwen2Config * fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs * docs: fix typo in model docstring * docs: add illuin mention in model docs * fix: let `padding_size` be handled by `tokenizer_config.json` * docs: add colpali reference url in colqwen2's model doc * docs: add Hf mention in model docs * docs: add late interaction mention in model docs * docs: tweak colqwen2 model doc * docs: update reference checkpoint for ColPali to v1.3 * docs: simplify quickstart snippets * docs: remove redundant `.eval()` * refactor: use `can_return_tuple` decorator for ColPali and ColQwen2 * docs: fix copyright date * docs: add missing copyright in tests * fix: raise error when `initializer_range` is not in config * docs: remove redundant `.eval()` in colpali doc * fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config. * fix: add missing `initializer_range` attribute in `ColQwen2Config` * fix: use `get_text_config` in `resize_token_embeddings` * update colwen2 with auto_docstring * docs: fix wrong copyright year * chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config` * refactor: merge `inner_forward` into `forward` * Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code * protect torch import in modular to protect in processing * protect torch import in modular to protect in processing * tests: fix hf model path in ColQwen2 integration test * docs: clarify `attn_implementation` and add comments * docs: add fallback snippet for using offline PIL dummy images * docs: temporarily revert attn_implementation to `None` while sdpa is not fixed * docs: tweaks in colpali/colqwen2 quick start snippets * fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model * fix: add missing changes in modular file * fix modeling tests --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-06-02 12:58:01 +00:00
Rémi Ouazan	493cf1554b	[seamless_m4t] Skip some tests when speech is not available (#38430 ) * Added the require_speech decorator * Added require_speecj to some seamless_m4t tests * Changed skip message	2025-06-02 09:17:28 +00:00
Yuanyuan Chen	fde1120b6c	Remove deprecated use_flash_attention_2 parameter (#37131 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-06-02 11:06:25 +02:00
M Saqlain	e0545ef0b8	[Tests] Reduced model size for albert-test model (#38480 ) * Reduced model size for albert-test model * Run checks * Removed test_save_load * Removed test skipping functions	2025-05-30 14:22:32 +00:00
Yih-Dar	81cff7ad34	Fix `Gemma3IntegrationTest` (#38471 ) * check * check * check * check * check * check * check * test style bot * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-05-29 16:51:12 +02:00
Raushan Turganbay	ad9dd3d17b	🔴 [VLM] modeling updates (#38317 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * updates * fixup * fix tests * fix test * fix * let it be here for now, till monday * two more fixes * persimmon * fixup * fix * fixup * make sure fuyu runs now that LM has new attn API * fixup + tests * qwen vl uses new mask interface as well * qwen image features format * update * remove image_sizes * address comments * i am dumb...	2025-05-29 11:08:23 +00:00
Yaswanth Gali	a6f7acb603	[Tests] Clean up test cases for few models (#38315 ) * Update tests * revert aria change * too slow hence revert	2025-05-29 08:21:28 +00:00
Yih-Dar	66da700145	Fix GLM4 checkpoints (#38412 ) * fix * fix * fix * fix * fix * fix * test style bot * Apply style fixes --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-05-28 16:40:08 +00:00
Matt	f844733568	Fix MoE gradient test (#38438 )	2025-05-28 16:44:20 +01:00

1 2 3 4 5 ...

5098 Commits