transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 19:21:31 +06:00

Author	SHA1	Message	Date
eustlb	02ecdcfc0f	add _keep_in_fp32_modules_strict (#39058 ) * add _keep_in_fp32_modules_strict * complete test	2025-06-26 13:55:28 +00:00
vb	d973e62fdd	fix condition where torch_dtype auto collides with model_kwargs. (#39054 ) * fix condition where torch_dtype auto collides with model_kwargs. * update tests * update comment * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-26 14:52:57 +02:00
Manuel de Prada Corral	3abeaba7e5	Create test for #38916 (custom generate from local dir with imports) (#39015 ) * create test for #38916 (custom generate from local dir with imports)	2025-06-26 13:54:36 +02:00
Rémi Ouazan	25c44d4b68	Internvl fix (#38946 ) * Image processor compile fix (#38540) * Added a compile-friendly versiom of resize to BaseImgProcessorFast * Changed qwen2 processor to use its parent class .resize * Style * underlined issue only happens on AMD w/ comment and bool check * Fixed some utils functions * Fixed the same issue for bridgetower * Fixed the same issue for llava_next * Repo consistency for llava onevision * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> --------- Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> * Added an Expectation to an internvl test * Made qwen2_vl use the resize method of its parent clas * Changed to torch.where --------- Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>	2025-06-26 13:44:59 +02:00
Jaeyong Sung	583db52bc6	Add Dia model (#38405 ) * add dia model * add tokenizer files * cleanup some stuff * brut copy paste code * rough cleanup of the modeling code * nuke some stuff * more nuking * more cleanups * updates * add mulitLayerEmbedding vectorization * nits * more modeling simplifications * updates * update rope * update rope * just fixup * update configuration files * more cleanup! * default config values * update * forgotten comma * another comma! * update, more cleanups * just more nits * more config cleanups * time for the encoder * fix * sa=mall nit * nits * n * refacto a bit * cleanup * update cv scipt * fix last issues * fix last nits * styling * small fixes * just run 1 generation * fixes * nits * fix conversion * fix * more fixes * full generate * ouf! * fixes! * updates * fix * fix cvrt * fixup * nits * delete wrong test * update * update * test tokenization * let's start changing things bit by bit - fix encoder step * removing custom generation, moving to GenerationMixin * add encoder decoder attention masks for generation * mask changes, correctness checked against ad29837 in dia repo * refactor a bit already --> next cache * too important not to push :) * minimal cleanup + more todos * make main overwrite modeling utils * add cfg filter & eos filter * add eos countdown & delay pattern * update eos countdown * add max step eos countdown * fix tests * fix some things * fix generation with testing * move cfg & eos stuff to logits processor * make RepetitionPenaltyLogitsProcessor flexible - can accept 3D scores like (batch_size, channel, vocab) * fix input_ids concatenation dimension in GenerationMixin for flexibility * Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility. * Add stopping criteria * refactor * move delay pattern from processor to modeling like musicgen. - add docs - change eos countdown to eos delay pattern * fix processor & fix tests * refactor types * refactor imports * format code * fix docstring to pass ci * add docstring to DiaConfig & add DiaModel to test * fix docstring * add docstring * fix some bugs * check * porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first * experimental testing of left padding for first channel * whoops * Fix merge to make generation work * fix cfg filter * add position ids * add todos, break things * revert changes to generation --> we will force 2d but go 3d on custom stuff * refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos * some first fixes to get to 10. in generation * some more generation fixes / adjustment * style + rope fixes * move cfg out, simplify a few things, more todos * nit * start working on custom logit processors * nit * quick fixes * cfg top k * more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar * lets keep changes to core code minimal, only eos scaling is questionable atm * simpler eos delay logits processor * that was for debugging :D * proof of concept rope * small fix on device mismatch * cfg fixes + delay logits max len * transformers rope * modular dia * more cleanup * keep modeling consistently 3D, generate handles 2D internally * decoder starts with bos if nothing * post processing prototype * style * lol * force sample / greedy + fixes on padding * style * fixup tokenization * nits * revert * start working on dia tests * fix a lot of tests * more test fixes * nit * more test fixes + some features to simplify code more * more cleanup * forgot that one * autodocs * small consistency fixes * fix regression * small fixes * dia feature extraction * docs * wip processor * fix processor order * processing goes brrr * transpose before * small fix * fix major bug but needs now a closer look into the custom processors esp cfg * small thing on logits * nits * simplify indices and shifts * add simpler version of padding tests back (temporarily) * add logit processor tests * starting tests on processor * fix mask application during generation * some fixes on the weights conversion * style + fixup logits order * simplify conversion * nit * remove padding tests * nits on modeling * hmm * fix tests * trigger * probably gonna be reverted, just a quick design around audio tokenizer * fixup typing * post merge + more typing * initial design for audio tokenizer * more design changes * nit * more processor tests and style related things * add to init * protect import * not sure why tbh * add another protect * more fixes * wow * it aint stopping :D * another missed type issue * ... * change design around audio tokenizer to prioritize init and go for auto - in regards to the review * change to new causal mask function + docstrings * change ternary * docs * remove todo, i dont think its essential tbh * remove pipeline as current pipelines do not fit in the current scheme, same as csm * closer to wrapping up the processor * text to audio, just for demo purposes (will likely be reverted) * check if it's this * save audio function * ensure no grad * fixes on prefixed audio, hop length is used via preprocess dac, device fixes * integration tests (tested locally on a100) + some processor utils / fixes * style * nits * another round of smaller things * docs + some fixes (generate one might be big) * msytery solved * small fix on conversion * add abstract audio tokenizer, change init check to abstract class * nits * update docs + fix some processing :D * change inheritance scheme for audio tokenizer * delete dead / unnecessary code in copied generate loop * last nits on new pipeline behavior (+ todo on tests) + style * trigger --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Vasqu <antonprogamer@gmail.com>	2025-06-26 11:04:23 +00:00
Joao Gante	1d45d90e5d	[tests] remove TF tests (uses of `require_tf`) (#38944 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * remove uses of require_tf * remove redundant import guards * this class has no tests * nits * del tf rng comment	2025-06-25 17:29:10 +00:00
eustlb	551e48f182	[Kyutai-STT] correct model type + model id (#39035 ) * correct model type + model id * udpate doc * init fix * style !!!	2025-06-25 16:09:00 +00:00
Anton Lozhkov	dad0e87c79	Add SmolLM3 (#38755 ) * init smollm3 * integration tests * config quirks * docs stub * rests round 2 * tests round 3 * tests round 4 * bring SWA back * config checker pls * final checkpoint * style and copies * Update src/transformers/models/smollm3/modular_smollm3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/smollm3/modular_smollm3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-06-25 15:12:15 +00:00
Quentin Lhoest	858f9b71a8	Remove script datasets in tests (#38940 ) * remove trust_remote_code * again * Revert "Skip some tests for now (#38931)" This reverts commit `31d30b7224`. * again * style * again * again * style * fix integration test * fix tests * style * fix * fix * fix the last ones * style * last one * fix last * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-25 14:31:20 +00:00
EduardDurech	a2eb75c891	Support for Flash Attention 3 (#38972 ) * Support `flash_attn_3` Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper - Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...` An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated Based on https://github.com/huggingface/transformers/pull/36190 which has model implementations and examples which could be merged * Add tests for Flash Attention 2 and 3 parity * ci fix * FA2 compatibiity - `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids` - Remove bettertransformer check in Flash Attention 3 - Merge tests - Add licensing * ci fix * Test naming consistency * ci fix * Deprecation warning for `prepare_fa2_from_position_ids` * ci fix	2025-06-25 14:39:27 +02:00
redmoe-moutain	7503cb9113	[Model] add dots1 (#38143 ) * add dots1 * address comments * fix * add link to dots1 doc * format --------- Co-authored-by: taishan <rgtjf1@163.com>	2025-06-25 11:38:25 +02:00
Biao Zhang	3ef8896906	Encoder-Decoder Gemma (#38332 ) * Initial submit * Fix bugs: 1. add __init__ file 2. tied word embedding 3. support flash/flex attention 4. model saving and loading * Code refactor: * Rename encdecgemma to t5gemma. * Split attention into self- and cross-attention * Split stack into encoder and decoder * Add test cases * Add auto configuration * Update configurations. * Fix bugs related to copy and attribute checks * Fix type union * Fix merge errors * run ruff format * Run make style and update tests. * Add t5gemma model doc. * ruff and style formatting. * Add missed module config. * Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.). * Update model doc. * Minor updates following Arthur's comments: * replace docstrings with auto_docstrings * remove checkpoint layers * remove deprecate_kwargs * fix rebase errors * Fix docstring issues. * fix t5gemma doc issue. * run ruff format * Updates: * split encoder-only model out * make t5gemmamodel encoder-decoder only * update token and sequence classification * update tests	2025-06-25 09:05:10 +00:00
Yuxuan Zhang	af9870265e	GLM-4.1V Model support (#38431 ) * 20250508 Model Architecture * Update modeling_glm4v.py * Update modeling_glm4v.py * Update modeling_glm4v.py * update 1447 * 0526 * update * format * problem * update * update with only image embed diff * Final * upload * update * 1 * upload with ruff * update * update * work * 1 * 1 * update with new note * 2 * Update convert_glm4v_mgt_weights_to_hf.py * Update tokenization_auto.py * update with new format * remove rmsnrom * draft with videos * draft * update * update * fix for review problem * try to remove min_pixel * update * for test * remove timestamps * remove item * update with remove * change * update 2200 * update * Delete app.py * format * update * Update test_video_processing_glm4v.py * 1 * 2 * use new name * Update test_video_processing_glm4v.py * remove docs * change * update for image processors update * 2108 * 2128 * Update modular_glm4v.py * 1 * update some * update * rename * 1 * remove tests output * 2 * add configuration * update * Update test_video_processing_glm4v.py * fix simple forward tests * update with modular * 1 * fix more tests * fix generation test * fix beam search and init * modular changed * fix beam search in case of single-image/video. Fails if multiple visuals per text * update processor * update test * pass * fix beam search * update * param correct * Update convert_glm4v_mgt_weights_to_hf.py * 1 * Update test_modeling_glm4v.py * 4 * 2 * 2123 video process * 2 * revert * 1 * 2 * revert processing * update preprocesor * changed * 1 * update * update * 6 * update * update * update * Delete tmp.txt * config * Update video_processing_glm4v.py * apply modular correctly * move functions * fix order * update the longest_edge * style * simplify a lot * fix random order of classes * skip integration tests * correctly fix the tests * fix TP plan --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-06-25 10:43:05 +02:00
Raushan Turganbay	e212ff9e6a	[video processor] support torchcodec and decrease cuda memory usage (#38880 ) * don't move the whole video to GPU * add torchcodec * add tests * make style * instrucblip as well * consistency * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/utils/import_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/video_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-06-25 08:23:37 +00:00
efsotr	3ee72af6b6	Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 (#37332 ) * Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 * fix code format * add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1 * add assert loss is not nan	2025-06-25 07:58:34 +00:00
ivarflakstad	995666edb5	Skip sdpa dispatch on flash test due to unsupported head dims (#39010 )	2025-06-24 20:16:56 +02:00
Tugsbayasgalan Manlaibaatar	67d36dc1d7	Fix bugs in DynamicCache (#37880 ) * Fix bugs in DynamicCache * Updarte * Update * Lint * lint * Rename test * update * update	2025-06-24 19:43:40 +02:00
eustlb	6bdd4ec952	Add kyutai stt (#38909 ) * first draft * cleaner version * udpate tests + modeling * add tests * init * udpate test_modeling_common * fix tests * csm Processor draft * convertion update * mimi cache padding convolutions draft * mimi streaming udpates * update mimi padding cache test * udpate cache padding mimi test * make style mimi * updates generate moshi asr * moshi asr integration tests (single + batched) * update tests * update conversion script * good default sliding window value * udpdate generate * update test checkpoint * nit * fix mimi * fix codec prefix * revert * revert * update config * update config * unnecessary mimi input restriction * remove delay in tokens * remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask * test update * modular update * make style * nit * rename * create codec model generation config at init * remove delay * max_new_tokens/length warning * correct conv1 padding cache import for modular * nit * fix on encoder_past_key_values * convert modular * move frame_size to config * move frame_size to config * update test name * handle first token is bos * better handling of max_new_tokens * fix * fix batch size in test input prep * update docstring * convert modular * make style * make style * add feature extractor * correct modular convention name for feature_extraction file * update convertion script * doc processor * update doc * udpate init * update model type * fixes * update tests * fix * make * add doc * nit * fix * doc * auto mappings * doc * nit * convert modular * doc * nit * extend _keep_in_fp32_modules to enforce fp32 * renaming to stt * doc update + test update * doc fixes * doc fix * doc fix * fix musicgen tests * fix musicgen tests * make style * fix musicgen tests * correct frame_rate config param for mimi * update mimi test * revert update mimi test * enforce cpu test * move cache init in cache class * convert modular * docstring update * update model id * feature_extractor -> feature_extraction (SEW) * convert modular * update model id	2025-06-24 18:01:15 +02:00
Crystalcareai	71de20b818	Add Arcee model support (#38621 ) * Add Arcee model support to transformers - Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer - Use LlamaTokenizer for tokenization - Add FX graph support for Arcee models - Create lazy loading module structure for Arcee * feat: update YARN scaling and RoPE validation for Arcee model * feat: add auto_docstring checkpoint config to Arcee model classes * docs: add pre-trained model weights reference to Arcee configuration files * refactor: move RoPE utilities to dedicated modeling_rope_utils module * Add comprehensive test suite for Arcee model - Add test_modeling_arcee.py following standard transformers test patterns - Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification) - Add specific test for ReLU² activation in ArceeMLP - Add RoPE scaling tests including YARN support - Follow CausalLMModelTest pattern used by similar models * Add documentation for Arcee model - Add comprehensive model documentation with usage examples - Include all model variants in autodoc - Add to table of contents in proper alphabetical order - Fixes documentation coverage for Arcee model classes * Make style/fixup * fix copyright year * Sync modular conversion * revert in legacy supported models in src/transformers/utils/fx * cleaned redundant code in modular_arcee.py * cleaned testing * removed pretraining tp * fix styles * integration testing --------- Co-authored-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>	2025-06-24 15:05:29 +02:00
Yih-Dar	f9be71b34d	Fix `rag` (#38585 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-23 17:42:46 +02:00
Yusuf Shihata	9eac19eb59	[Feature] Support `is_split_into_words` in the `TokenClassificationPipeline`. (#38818 ) * some fixes * some fixes * now the pipeline can take list of tokens as input and is_split_into_words argument * now the pipeline can take list of tokens as input and is_split_into_words argument * now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input * now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input * solving test problems * some fixes * some fixes * modify tests * aligning start and end correctly * adding tests * some formatting * some formatting * some fixes * some fixes * some fixes * resolve conflicts * removing unimportant lines * removing unimportant lines * generalize to other languages * generalize to other languages * generalize to other languages * generalize to other languages	2025-06-23 15:31:32 +00:00
Yih-Dar	2ce02b98bf	fix `mistral` and `mistral3` tests (#38978 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-23 17:07:18 +02:00
Yoni Gozlan	d29482cc91	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 ) * add working idefics2 fast and improvements for fast nested images processing * add fast image processors idefics 3 and smolvlm * cleanup tests * fic doc idefics2 * PR review and fix issues after merge * Force providing disable_grouping to group_images_by_shape * simplify group_images_by_shape * fix modular * Fix nits after review	2025-06-23 14:17:25 +00:00
Rémi Ouazan	1a96127e46	Break tie in Expectations and gemma3 fixes (#38943 ) * Added major / minor version to Expectations ordering * Added fixes to gemma3 * Style	2025-06-23 15:13:27 +02:00
Cyril Vallez	07aab1af1e	Remove dead protected imports (#38980 ) * remove them * more	2025-06-23 13:44:50 +02:00
Ilyas Moutawwakil	984ff89e73	Gaudi3 CI (#38790 )	2025-06-23 10:56:51 +02:00
Yih-Dar	b8059e1f8f	Fix more flaky `test_initialization` (#38932 ) * try * try * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-20 17:28:32 +02:00
Pavel Iakubovskii	9120567b02	Add kwargs for timm.create_model in TimmWrapper (#38860 ) * Add init kwargs for timm wrapper * model_init_kwargs -> model_args * add save-load test * fixup	2025-06-20 12:00:09 +00:00
Rémi Ouazan	9bcdd5cde9	Modernbert fixes (#38912 ) * Removed deprecated argument in modernbert RotaryEmbedding * Skip test_sdpa_can_dispatch_on_flash for modernbert --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-06-20 11:22:32 +02:00
Yih-Dar	31d30b7224	Skip some tests for now (#38931 ) * try * [test all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-20 11:05:49 +02:00
Hamza Benchekroun	797860c68c	feat: add flexible Liger Kernel configuration to TrainingArguments (#38911 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * feat: add flexible Liger Kernel configuration to TrainingArguments Add support for granular Liger Kernel configuration through a new `liger_kernel_config` parameter in TrainingArguments. This allows users to selectively enable/disable specific kernels (rope, swiglu, cross_entropy, etc.) instead of the current approach that rely on default configuration. Features: - Add `liger_kernel_config` dict parameter to TrainingArguments - Support selective kernel application for all supported models - Maintain full backward compatibility with existing `use_liger_kernel` flag Example usage: ```python TrainingArguments( use_liger_kernel=True, liger_kernel_config={ "rope": True, "swiglu": True, "cross_entropy": False, "fused_linear_cross_entropy": True } ) Closes #38905 * Address comments and update Liger section in Trainer docs	2025-06-19 15:54:08 +00:00
ivarflakstad	af6120b3eb	Skip sdpa tests if submodule does not support sdpa (#38907 )	2025-06-19 13:11:01 +00:00
Yih-Dar	5d26a38735	Fix `FalconMambaIntegrationTests` (#38566 ) * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-19 13:50:33 +02:00
Yao Matrix	a9ce8c69c9	align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284 ) * siwtch to device agnostic autocast in nemotron to align xpu behavior w/ cuda Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix issue Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt Signed-off-by: Matrix Yao <matrix.yao@intel.com> * refine Signed-off-by: Matrix Yao <matrix.yao@intel.com> * update get_autocast_gpu_dtype to device agnostic one Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix style Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-06-19 11:48:23 +00:00
Yih-Dar	b949747b54	Fix `fsmt` tests (#38904 ) * fix 1 * fix 2 * fix 3 * fix 4 * fix 5 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-19 10:56:34 +02:00
Sam Rae	b922b22ec2	36978 \| Fast image processor for DPT model (#37481 ) * chore: ran codegen script * test: test_image_processor_properties * test: test_image_processor_from_dict_with_kwargs * test: wip - test_padding * test: test_padding * test: test_keep_aspect_ratio * wip * test * test: wip * test: wip * test: test_call_segmentation_maps, wip * chore: tidy up * test: test_call_segmentation_maps * fix: test_save_load_fast_slow * test: reduce labels * chore: make fixup * chore: rm comment * chore: tidy * chore remove comment * refactor: no need to infer channel dimesnion * refactor: encapsulate logic for preparing segmentation maps * refactor: improve readability of segmentation_map preparation * improvement: batched version of pad_image * chore: fixup * docs * chore: make quality * chore: remove unecessary comment * fix: add SemanticSegmentationMixin * feat: add post_process_depth_estimation to fast dpt image processor * chore: fix formatting * remove max_height, max_width * fix: better way of processin segmentation maps - copied from Beit Fast processor * chore: formatting + remove TODO * chore: fixup styles * chore: remove unecessary line break * chore: core review suggestion to remove autodocstring * fix: add do_reduce_labels logic + refactor - refactor preprocess logic to make it consistent with other processors - add missing reduce labels logic * refactor: remove deprecated mixin * chore: fixup * use modular for dpt + final nit changes * fix style --------- Co-authored-by: Samuel Rae <samuelrae@Samuels-Air.fritz.box> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-06-18 17:33:29 +00:00
Yuanyuan Chen	1fc67a25c6	More PYUP fixes (#38883 ) More pyup fixes Signed-off-by: cyy <cyyever@outlook.com>	2025-06-18 14:38:08 +01:00
艾梦	cb0f604192	Fix HQQ model param device transfer issue (#38466 ) * Fix HQQ model param device transfer issue * modify a comment * clear the code and add test for hqq device/dtype * fix test hqq code quality of imports --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-06-18 15:09:00 +02:00
Yih-Dar	c77bcd889f	Fix `qwen3_moe` tests (#38865 ) * try 1 * try 2 * try 3 * try 4 * try 5 * try 6 * try 7 * try 8 * try 9 * try 10 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-18 14:36:03 +02:00
Cyril Vallez	5a95ed5ca0	🚨🚨 Fix initialization of Mask2Former (#38864 ) * Correctly fix init Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com> * add back the block, breaking BC but this is correct author's code * override the test for params needing it --------- Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>	2025-06-18 09:46:22 +02:00
Yih-Dar	309e8c96f2	Fix `phi4_multimodal` tests (#38816 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-18 09:39:17 +02:00
Yao Matrix	3526e25d3d	enable misc test cases on XPU (#38852 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * enable misc test cases on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * tweak bamba ground truth on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * remove print Signed-off-by: YAO Matrix <matrix.yao@intel.com> * one more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-06-18 09:20:49 +02:00
Matt	508a704055	No more Tuple, List, Dict (#38797 ) * No more Tuple, List, Dict * make fixup * More style fixes * Docstring fixes with regex replacement * Trigger tests * Redo fixes after rebase * Fix copies * [test all] * update * [test all] * update * [test all] * make style after rebase * Patch the hf_argparser test * Patch the hf_argparser test * style fixes * style fixes * style fixes * Fix docstrings in Cohere test * [test all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-17 19:37:18 +01:00
StevenBucaille	e5a9ce48f7	Add LightGlue model (#31718 ) * init * chore: various changes to LightGlue * chore: various changes to LightGlue * chore: various changes to LightGlue * chore: various changes to LightGlue * Fixed dynamo bug and image padding tests * refactor: applied refactoring changes from SuperGlue's concat, batch and stack functions to LightGlue file * tests: removed sdpa support and changed expected values * chore: added some docs and refactoring * chore: fixed copy to superpoint.image_processing_superpoint.convert_to_grayscale * feat: adding batch implementation * feat: added validation for preprocess and post process method to LightGlueImageProcessor * chore: changed convert_lightglue_to_hf script to comply with new standard * chore: changed lightglue test values to match new lightglue config pushed to hub * chore: simplified convert_lightglue_to_hf conversion map * feat: adding batching implementation * chore: make style * feat: added threshold to post_process_keypoint_matching method * fix: added missing instructions that turns keypoints back to absolute coordinate before matching forward * fix: added typehint and docs * chore: make style * [run-slow] lightglue * fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching * tests: added CUDA proof tests similar to SuperGlue * chore: various changes to modeling_lightglue.py - Added "Copies from" statements for copied functions from modeling_superglue.py - Added missing docstrings - Removed unused functions or classes - Removed unnecessary statements - Added missing typehints - Added comments to the main forward method * chore: various changes to convert_lightglue_to_hf.py - Added model saving - Added model reloading * chore: fixed imports in lightglue files * [run-slow] lightglue * chore: make style * [run-slow] lightglue * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * [run-slow] lightglue * chore: Applied some suggestions from review - Added missing typehints - Refactor "cuda" to device variable - Variable renaming - LightGlue output order changed - Make style * fix: added missing grayscale argument in image processor in case use of SuperPoint keypoint detector * fix: changed lightglue HF repo to lightglue_superpoint with grayscale default to True * refactor: make keypoints `(batch_size, num_keypoints, keypoint_dim)` through forward and unsqueeze only before attention layer * refactor: refactor do_layer_keypoint_pruning * tests: added tests with no early stop and keypoint pruning * refactor: various refactoring to modeling_lightglue.py - Removed unused functions - Renamed variables for consistency - Added comments for clarity - Set methods to private in LightGlueForKeypointMatching - Replaced tensor initialization to list then concatenation - Used more pythonic list comprehension for repetitive instructions * refactor: added comments and renamed filter_matches to get_matches_from_scores * tests: added copied from statement with superglue tests * docs: added comment to prepare_keypoint_matching_output function in tests * [run-slow] lightglue * refactor: reordered _concat_early_stopped_outputs in LightGlue class * [run-slow] lightglue * docs: added lightglue.md model doc * docs: added Optional typehint to LightGlueKeypointMatchingOutput * chore: removed pad_images function * chore: set do_grayscale default value to True in LightGlueImageProcessor * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * docs: added missing LightGlueConfig typehint in nn.Module __init__ methods * docs: removed unnecessary code in docs * docs: import SuperPointConfig only from a TYPE_CHECKING context * chore: use PretrainedConfig arguments `num_hidden_layers` and `num_attention_heads` instead of `num_layers` and `num_heads` * chore: added organization as arg in convert_lightglue_to_hf.py script * refactor: set device variable * chore: added "gelu" in LightGlueConfig as hidden_act parameter * docs: added comments to reshape.flip.reshape instruction to perform cross attention * refactor: used batched inference for keypoint detector forward pass * fix: added fix for SDPA tests * docs: fixed docstring for LightGlueImageProcessor * [run-slow] lightglue * refactor: removed unused line * refactor: added missing arguments in LightGlueConfig init method * docs: added missing LightGlueConfig typehint in init methods * refactor: added checkpoint url as default variable to verify models output only if it is the default url * fix: moved print message inside if statement * fix: added log assignment r removal in convert script * fix: got rid of confidence_thresholds as registered buffers * refactor: applied suggestions from SuperGlue PR * docs: changed copyright to 2025 * refactor: modular LightGlue * fix: removed unnecessary import * feat: added plot_keypoint_matching method to LightGlueImageProcessor with matplotlib soft dependency * fix: added missing import error for matplotlib * Updated convert script to push on ETH org * fix: added missing licence * fix: make fix-copies * refactor: use cohere apply_rotary_pos_emb function * fix: update model references to use ETH-CVG/lightglue_superpoint * refactor: add and use intermediate_size attribute in config to inherit CLIPMLP for LightGlueMLP * refactor: explicit variables instead of slicing * refactor: use can_return_tuple decorator in LightGlue model * fix: make fix-copies * docs: Update model references in `lightglue.md` to use the correct pretrained model from ETH-CVG * Refactor LightGlue configuration and processing classes - Updated type hints for `keypoint_detector_config` in `LightGlueConfig` to use `SuperPointConfig` directly. - Changed `size` parameter in `LightGlueImageProcessor` to be optional. - Modified `position_embeddings` in `LightGlueAttention` and `LightGlueAttentionBlock` to be optional tuples. - Cleaned up import statements across multiple files for better readability and consistency. * refactor: Update LightGlue configuration to enforce eager attention implementation - Added `attn_implementation="eager"` to `keypoint_detector_config` in `LightGlueConfig` and `LightGlueAttention` classes. - Removed unnecessary logging related to attention implementation fallback. - Cleaned up import statements for better readability. * refactor: renamed message into attention_output * fix: ensure device compatibility in LightGlueMatchAssignmentLayer descriptor normalization - Updated the normalization of `m_descriptors` to use the correct device for the tensor, ensuring compatibility across different hardware setups. * refactor: removed Conv layers from init_weights since LightGlue doesn't have any * refactor: replace add_start_docstrings with auto_docstring in LightGlue models - Updated LightGlue model classes to utilize the new auto_docstring utility for automatic documentation generation. - Removed legacy docstring handling to streamline the code and improve maintainability. * refactor: simplify LightGlue image processing tests by inheriting from SuperGlue - Refactored `LightGlueImageProcessingTester` and `LightGlueImageProcessingTest` to inherit from their SuperGlue counterparts, reducing code duplication. - Removed redundant methods and properties, streamlining the test setup and improving maintainability. * test: forced eager attention implementation to LightGlue model tests - Updated `LightGlueModelTester` to include `attn_implementation="eager"` in the model configuration. - This change aligns the test setup with the recent updates in LightGlue configuration for eager attention. * refactor: update LightGlue model references * fix: import error * test: enhance LightGlue image processing tests with setup method - Added a setup method in `LightGlueImageProcessingTest` to initialize `LightGlueImageProcessingTester`. - Included a docstring for `LightGlueImageProcessingTester` to clarify its purpose. * refactor: added LightGlue image processing implementation to modular file * refactor: moved attention blocks into the transformer layer * fix: added missing import * fix: added missing import in __all__ variable * doc: added comment about enforcing eager attention because of SuperPoint * refactor: added SuperPoint eager attention comment and moved functions to the closest they are used --------- Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-06-17 18:10:23 +02:00
Yih-Dar	2507169bf6	Fix `qwen3` tests (#38862 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * fix * update * update * update * update * update * update * format --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-17 15:21:36 +02:00
Yih-Dar	c61ca64aaa	Fix `qwen2_5_vl` tests (#38845 ) * fix * breakpoint() * breakpoint() * update * update * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-06-17 10:55:24 +02:00
Yusuf Shihata	a7593a1d1f	[BugFix] QA pipeline edge case: `align_to_words=True` in `QuestionAnsweringPipeline` can lead to duplicate answers (#38761 ) * fixing the problem align_to_words=True leading to duplicate solutions * adding tests * some fixes * some fixes * changing the handle_duplicate_answers=False by default * some fixese * some fixes * make the duplicate handling the default behaviour and merge duplicates * make the duplicate handling the default behaviour	2025-06-16 15:01:22 +00:00
Pavel Iakubovskii	9bec2654ed	Add V-JEPA for video classification model (#38788 ) * adding model and conversion scripts * add imports to test vjepa conversion * fix imports and make conversion work * fix computation for short side * replace attention with library attention function * cleanup more attention classes * remove config overrides * add test cases, fix some of the failing ones * fix the model outputs * fix outputs of the model per review * fix too big model test case * fix styling __init__.py * fix initialization test * remove all asserts per review * update sorting unsorting logic as per feedback * remove is_video per review * remove another is_video segment * remove unwanted stuff * small fixes * add docstrings for the model * revert adding vjepa2 config here * update styling * add config docstrings (wip) * fix dpr issue * removed test failing issues * update styles * merge predictor configs into main config * remove processing code, add video processor * remove permute which is not necessary now * fix styles * updated vjepa2 to be in video_processing_auto * update comment for preprocessing * test integration test and fix the outputs * update test values, change test to look at repeated frames for a given image * add a simple video processing test * refactoring pixel_values_videos and upload ckpts to original * fix torch_fx test cases * remove unused config * add all config docstrings * add more integration tests * add basic doc * revert unwanted styling changes * working make fixup * Fix model_type in config * Add ForVideoClassification model * update attention implementation to fit new hf standards * fix the preprocessing logic, ensure it matches the original model * remove use_rope logic, cleanup * fix docstrings * Further cleanup, update doc * Fix model prefix * fix get_vision_features * VJEPA2Embeddings style refactor * nit, style comment * change modules default values * Only `str` activation in config * GradientCheckpointingLayer * fixup * fix conversion script * Remove return_dict * remove None return typehint * Refactor VJEPA2Layer, remove use_SiLU * Fix fx tests * dpr -> drop_path_rates * move ModelOutput on top format docs bit * update docs * update docs * update doc example * remove prune_heads from model * remove unused config params * refactor embed signature * Add vjepa to docs * Fix config docstring * attention head * update defaults * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/vjepa2.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix import * Min refactoring * Update HUB_SOURCE and HUB_REPO in conversion script * Add missing headers * VJEPA -> V-JEPA in docs * Add image to doc * fix style * fix init weights * change checkpoint name in modeling tests * Initial cls head setup * remove rop attention from head (not needed) * remove swigluffn - not needed * Add siglip layer * Replace with siglip layer * Rename Siglip - VJEPA2 * remove unused modules * remove siglip mlp * nit * remove MLP * Refactor head cross attention * refactor VJEPA2HeadCrossAttentionLayer * nit renaming * fixup * remove commented code * Add cls head params to config * depth from config * move pooler + classifier to the model * Update for cls model signature * move layers, rename a bit * fix docs * update weights init * remove typehint for init * add to auto-mapping * enable tests * Add conversion script * fixup * add to docs * fix docs * nit * refactor for mapping * clean * Add integration test * Fixing multi gpu test * update not-split-modules * update video cls test tolerance * Increase test_inference_image tolerance * Update no-split modules for multi gpu * Apply suggestions from code review * fixing multi-gpu * fix docstring * Add cls snippet to docs * Update checkpoint	2025-06-13 17:56:15 +01:00
Matt	b82a45b3b4	Refactor DBRX tests to use CausalLMModelTest base classes (#38475 ) * Refactor DBRX tests to use CausalLMModelTest base classes - Changed DbrxModelTester to inherit from CausalLMModelTester - Changed DbrxModelTest to inherit from CausalLMModelTest - Removed duplicate methods that are already in base classes - Added required class attributes for model classes - Updated pipeline_model_mapping to include feature-extraction - Kept DBRX-specific configuration and test methods - Disabled RoPE tests as DBRX's rotary embedding doesn't accept config parameter This refactoring reduces code duplication and follows the pattern established in other causal LM model tests like Gemma. * Apply style fixes * Trigger tests * Refactor DBRX test * Make sure the DBRX-specific settings are handled * Use the attribute_map * Fix attribute map --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-06-13 16:22:12 +01:00
Rémi Ouazan	9ff246db00	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00

1 2 3 4 5 ...

5053 Commits