transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

Author	SHA1	Message	Date
Vasqu	34f5f3f1e0	bridgetower (text) + seamlessv2 copy fixes Some checks failed Secret Leaks / trufflehog (push) Has been cancelled Details	2025-07-11 20:43:10 +02:00
Vasqu	b11d91e400	xlm roberta xl	2025-07-11 20:04:38 +02:00
Vasqu	369ede627c	bert generation + fixes on eager	2025-07-11 18:14:02 +02:00
Vasqu	367fe5d043	copy fixes	2025-07-11 15:59:42 +02:00
Vasqu	d9f0a8a304	camembert	2025-07-11 15:48:15 +02:00
Vasqu	5a05d5a5bc	fix roberta offloading	2025-07-11 15:27:29 +02:00
Vasqu	396c1ec896	ernie	2025-07-11 15:11:50 +02:00
Vasqu	d4fbfd9106	Merge branch 'main' into vas-bert-attn-refactors	2025-07-11 13:28:56 +02:00
Shuming Hu	bf607f6d3b	PerceptionLM (#37878 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details New model PR merged notification / Notify new model (push) Has been cancelled Details * plm template * A working plm with fixed image features * hacked processor * First version that reproduced PLM output using PE from timm. * Simplify and fix tie_word_embeddings * Use PIL resize. Simplify converstion. * First version that works with video input. * simplifed image preprocessing (not batched) * Minor fixes after rebasing on main. * Video processor based on new API. * Revert to use _preprocess for image processor. * refactor with modular * fix tie_word_embedding * Testing with timm PE * check in missed converstion from modular to model.py * First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences. * address review comments * Fixed batching if video and image examples mixed. * Simplify PE configuration. * Enable AutoModel for PerceptionEncoder. * Update PE config style. * update all headers * Minor fixes. * Move lm_head to PerceptionLMForConditionalGeneration. Fix vit_G model specification. * Fix for testing_modeling_perception_lm.py * Image processing refactoring to use more common parts. * Fix processor test. * update tests to use model from hub * More test fixes. * integration test GT update after rebasing; probably due to video preprocessing * update test media path to hub * Stop tracking local scripts * address some review comments * refactor image processing. * small fixes * update documentation and minor fixes * remove scripts * Minor fix for CI * Fix image processing * CI and doc fix * CI formatting fix * ruff fix * ruff formatting * ran utils/sort_auto_mappings.py * update docstring * more docstring udpates * add vision_input_type default fallback for image processing * more verbose variable naming * test update * Remove PE and PEConfig use AutoModel(TimmWrapper) instead * Minor cleanup. * Minor Fix: remove any ref to PE. Ruff format and check. * fix docstring * Fix modular/model consistency.Improvex docstringfor . * Fix PerceptionLMForConditionalGenerationModelTest * ruff fix * fix for check_repo * minor formatting * dummy size arg to fix for processor test. * Update docstring for PerceptionLMConfig * Minor fixes from review feedback. * Revert some minor changes per reviewer feedback. * update base_model_prefix * address reviewer feedback * fix comment in modeling file * address reviewer feedback * ruff format * Pre-merge test update. * reapply modular and fix checkpoint name * processor test path * use modular a bit more * remove dead code * add token decorator --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-11 11:07:32 +02:00
Giuseppe Coccia	4b47b2b8ea	Updated Switch Transformers model card with standardized format (Issue #36979 ) (#39305 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Updated Switch Transformers model card with standardized format (Issue #36979) * Apply reviewer suggestions to the new standardised Switch Transformer's model card * Update switch_transformers.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-10 15:34:10 -07:00
Pavel Iakubovskii	fe1a5b73e6	[modular] speedup check_modular_conversion with multiprocessing (#37456 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Change topological sort to return level-based output (lists of lists) * Update main for modular converter * Update test * update check_modular_conversion * Update gitignore * Fix missing conversion for glm4 * Update * Fix error msg * Fixup * fix docstring * update docs * Add comment * delete qwen3_moe	2025-07-10 19:07:59 +01:00
Cyril Vallez	571a8c2131	Add a default value for `position_ids` in masking_utils (#39310 ) * set default * Update masking_utils.py * add small test	2025-07-10 18:53:40 +02:00
Kyle Sayers	bdc8028cb3	[Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups (#39263 ) * fix counting meta tensors, fix onloading meta tensors Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unrelated fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add test Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-10 18:33:30 +02:00
Joao Gante	df49b399dc	[tests] tag serve tests as slow (#39343 ) * maybe they need more cpu resources? * add todo	2025-07-10 15:40:08 +00:00
Paul Pak	36e80a18da	[modeling][lfm2] LFM2: Remove deprecated seen_tokens (#39342 ) * [modeling][lfm2] remove deprecated seen_tokens * [modular][lfm2] remove deprecated seen_tokens from modular file	2025-07-10 17:27:55 +02:00
Paul Pak	9682d07f92	LFM2 (#39340 ) * [modeling][lfm2] LFM2 model on 4.53.0 interface * [configuration] hook in LFM2 keys * [modeling][lfm2] update modeling interface for 4.53.1 * [modeling][lfm2] apply mask to hidden conv states * [misc] ruff format/lint * [modeling][lfm2] minor: NotImplemented legacy cache conversion * Create lfm2.md * create nice modular * style * Update modeling_auto.py * clean and start adding tests * style * Update test_modeling_lfm2.py * Update __init__.py * small test model size * config * small fix * fix * remove useless config attrs -> block_dim and conv_dim are hiden_size * fix prepare inputs * fix config * test * typo * skip tests accordingly * config docstrings * add doc to .md * skip config docstring check --------- Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-10 16:07:33 +02:00
Joao Gante	38c3931362	[server] add tests and fix passing a custom `generation_config` (#39230 ) * add tests; fix passing a custom generation_config * tool integration test * add install step * add accelerate as dep to serving * add todo	2025-07-10 13:41:38 +00:00
edwko	6b09c8eab0	Handle DAC conversion when using weight_norm with newer PyTorch versions (#36393 ) * Update convert_dac_checkpoint.py * Update convert_dac_checkpoint.py * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-07-10 10:36:58 +00:00
Yih-Dar	92043bde29	fix `phi3` tests (#39312 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-10 11:51:55 +02:00
Kingsley	520b9dcb42	fix Glm4v batch videos forward (#39172 ) * changes for video * update modular * change get_video_features * update video token replacement * update modular * add test and fix typo * lint * fix order * lint * fix * remove dependency * lint * lint * remove todo * resize video for test * lint.. * fix test * new a processor for video_test * fix test	2025-07-10 10:44:28 +02:00
Raushan Turganbay	bc161d5d06	Delete deprecated stuff (#38838 ) * delete deprecated stuff * fix copies * remove unused tests * fix modernbert and fuyu * Update src/transformers/cache_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * bye bye `seen_tokens` * address comments * update typings * ecnoder decoder models follow same pattern as whisper * fix copies * why is it set to False? * fix switch transformers * fix encoder decoder models shared weight * fix copies and RAG * remove `next_cache` * fix gptj/git * fix copies * fix copies * style... * another forgotten docsrting --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-10 05:18:44 +00:00
Yoni Gozlan	c6ee0b1da8	Fix broken SAM after #39120 (#39289 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details fix	2025-07-09 17:46:22 -04:00
jiqing-feng	aff7df8436	enable static cache on TP model (#39164 ) * enable static cache on TP model Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check tp size before init kv cache Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix docstring Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add tp tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix other cache head size Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-07-09 21:14:45 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	2ef59646b8	Fix `max_length_q` and `max_length_k` types to `flash_attn_varlen_func` (#37206 ) Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1` or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently this will cause a graph break. Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-07-09 23:12:39 +02:00
Avihu Dekel	2d600a4363	Granite speech speedups (#39197 ) * ensure the query is updated during training avoid unused parameters that DDP does not like * avoid a crash when `kwargs` contain `padding=True` trainers often pass this argument automatically * minor * Remove mel_spec lazy init, and rename to mel_filters. this ensures save_pretrained will not crash when saving the processor during training `d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)` * minor - most feature extractors has a `sampling_rate` property * speedup relative position embeddings * fix several issues in model saving/loading: - avoid modifying `self._hf_peft_config_loaded` when saving - adapter_config automatically points to the original base model - a finetuned version should point to the model save dir. - fixing model weights names, that are changed by adding an adapter. * minor * minor * minor * fixing a crash without peft active * add todo to replace einsum * granite speech speedups: 1. register attention_dist to avoid cpu-to-gpu transfer every layer. 2. pad_sequence is much faster than per-sample-padding + concat. 3. avoid returning audio back to cpu when using a compute device. * support audio.shape=(1,L)	2025-07-09 23:09:50 +02:00
Tom Aarsen	5111c8ea2f	Fix typo: langauge -> language (#39317 )	2025-07-09 12:06:46 -07:00
Priya aka Priyamvadha Balakrishnan	2781ad092d	docs: update LLaVA-NeXT model card (#38894 ) * docs: update LLaVA-NeXT model card * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * [docs] Updated llava_next model card * Update docs/source/en/model_doc/llava_next.md remove image sources Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * [fix] Change Flash Attention to SDPA badge * [doc] fixed quantization example * docs: updated contribution details and badges * Update llava_next.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-09 11:32:40 -07:00
Yih-Dar	16dd7f48d0	skip files in `src/` for doctest (for now) (#39316 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-09 19:36:48 +02:00
Eman Risha	d61c0d087c	Updated the Model docs - for the MARIAN model (#39138 ) * Update marian.md This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include: - Added a clear description of MarianMT, its architecture, and how it differs from other models. - Provided usage examples for Pipeline and AutoModel. - Added a quantization example for optimizing model inference. - Included instructions and examples for multilingual translation with language codes. - Added an Attention Mask Visualizer example. - Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation. - Fixed formatting issues in the code blocks for correct rendering. This update improves the readability, usability, and consistency of the Marian model documentation for users. * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update marian.md * Update marian.md * Update marian.md * Update marian.md * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update marian.md * Update marian.md * Update marian.md * Update marian.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-09 10:23:03 -07:00
Yih-Dar	161cf3415e	add `stevhliu` to the list in `self-comment-ci.yml` (#39315 ) add Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-09 19:07:44 +02:00
Cyril Vallez	3be10c6d19	Fix consistency and a few docstrings warnings (#39314 ) * Update modeling_deepseek_v2.py * fix docstrings * fix * fix	2025-07-09 18:40:37 +02:00
MaCAT	4652677c89	🌐 [i18n-KO] Translated quark.md to Korean (#39268 ) * initial translation * removed english parts * maintain consistency * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * add toctree * fixed indentation --------- Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>	2025-07-09 09:29:51 -07:00
Vladislav Bronzov	c980904204	Add DeepSeek V2 Model into Transformers (#36400 ) * add initial structure * doc fixes, add model base logic * update init files * some fixes to config and modular * some improvements for attention * format * remove unused attn * some fixes for moe layer and for decoder * adapt _compute_yarn_parameters for deepseek * format * small fix * fix for decoder forward * add tests, small refactoring * fix dummies * fix init * fix doc * fix config docs * add sequce doc, fix init for gate * fix issues in tests * fix config doc * remove unused args * some fixes and refactoring after review * fix doc for config * small fixes for config args * revert config refactoring * small refactoring * minor fixes after rebase * small fix after merge * fix modular * remove rotaryembd from public init * small test fix * some rotary pos calculation improvement * fix format * some improvements and fixes * fix config * some refactoring * adjust some unit tests * skip test * small fixes and tests adjustment * reapply modular * fix all tests except Integration * fix integration testzs * cleanup BC stuff * rope * fix integrations tests based on a10 * style --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-09 17:04:28 +02:00
Raushan Turganbay	accbd8e0fe	[sliding window] revert and deprecate (#39301 ) * bring back and deprecate * oops --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-07-09 16:10:38 +02:00
Cyril Vallez	1cefb5d788	[modular] Allow method with the same name in case of @property decorator (#39308 ) * fix * add example * fix * Update modular_model_converter.py	2025-07-09 15:46:53 +02:00
Yih-Dar	4798c05c64	skip `test_torchscript_*` for now until the majority of the community ask for it (#39307 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-09 15:35:48 +02:00
Yih-Dar	fe5f3c85d2	fix `aria` tests (#39277 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-09 13:49:33 +02:00
Raushan Turganbay	0687d481e2	[flash attn 3] bring back flags (#39294 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * flash attn 3 flag * fix copies	2025-07-09 09:45:01 +02:00
JJJYmmm	25343aafee	Fix SDPA attention precision issue in Qwen2.5-VL (#37363 ) * solve conflicts and remove redundant attention_mask in qwenvit * update decoded text check * remove trailing whitespace	2025-07-09 07:03:44 +02:00
Yaswanth Gali	0e1c281745	[Tests] Update model_id in AIMv2 Tests (#39281 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Update model_id in tests * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 21:46:32 +02:00
Biao Zhang	7ef592c96c	Update T5gemma (#39210 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * bug fix: add vocab_size to t5gemmaconfig for pipeline. * Update checkpoint placeholder * minor change * minor change * minor change: update example. * fix: add vocab_size as an explict arg. * buf fix: remove vocab_size verification; instead, re-set encoder/decoder vocab size. Note, in t5gemma, vocab size of encoder/decoder shoud be always the same. * add `add_generation_prompt` for message preprocessing.	2025-07-08 19:08:48 +02:00
Quentin Lhoest	1ecd52e50a	Add torchcodec in docstrings/tests for `datasets` 4.0 (#39156 ) * fix dataset run_object_detection * bump version * keep same dataset actually * torchcodec in docstrings and testing utils * torchcodec in dockerfiles and requirements * remove duplicate * add torchocodec to all the remaining docker files * fix tests * support torchcodec in audio classification and ASR * [commit to revert] build ci-dev images * [commit to revert] trigger circleci * [commit to revert] build ci-dev images * fix * fix modeling_hubert * backward compatible run_object_detection * revert ci trigger commits * fix mono conversion and support torch tensor as input * revert map_to_array docs + fix it * revert mono * nit in docstring * style * fix modular --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 17:06:12 +02:00
StevenBucaille	1255480fd2	[lightglue] add support for remote code DISK keypoint detector (#39253 ) * feat: add trust_remote_code in LightGlueConfig * fix: made sure trust_remote_code is provided only when necessary * fix: make style * docs: added missing trust_remote_code docstring * refactor: refactored LightGlue config init * fix: removed unnecessary argument	2025-07-08 15:03:04 +00:00
Yih-Dar	838a0268b8	fix flaky `test_generate_compile_model_forward` (#39276 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 15:36:05 +02:00
Pavel Iakubovskii	29d0030e23	Refactor `PretrainedConfig.__init__` method to make it more explicit (#39158 ) * cleanup * fix no `__init__` test * fix missing inits	2025-07-08 14:24:39 +01:00
Joao Gante	1580f64653	[smollm3] add tokenizer mapping for `smollm3` (#39271 ) add tok mapping to smollm3	2025-07-08 10:44:01 +00:00
Kashif Rasul	db05e4ff33	[pagged-attention] fix off-by-1 error in pagged attention generation (#39258 ) * fix off-by-1 error in pagged attention generation * formatting * use update_with_token	2025-07-08 12:34:22 +02:00
Joao Gante	6f1a43896c	[CI] fix docs (#39273 ) * fix docs * add ko gloassary file to toctree	2025-07-08 11:31:03 +01:00
Yaswanth Gali	fbdaa7b099	Add Aimv2 model (#36625 ) * Model skelton * changes * temp push * changes * Added support for aimv2-native * More changes * More changes * Stupid mistake correction * Added config and refactor * Added vison model * update * Refactor for lit variant * Added Text Model * Minor fixes * nits * update * Preliminary tests * More fixes * Updated tests 🤗 * Refactor * Updated testcase * Updated config * make fixup * more fixes * Bug fix and updates * deadcode * Fixes * nit * up * Happy CI ✅ * Reduce LOC * nit * nit * make style * return_dict refactor * bug fix * fix * doc update * nit * make fixup * Minor update * _init_weigths modifcation * update tests * Minor fixes post review * Update w.r.t GradientCheckpointingLayer * docs update * update * nit * Use more Modular 😉 * Change name from AIMv2 to Aimv2 * Nit * make style * Add model doc pointer * make style * Update model doc section * updates * Modify attn mask and interface * update test * Final change * Utilize flash and flex attn * keep attn mask * camelcase model name in test file * Fix docstring * Fix config warning finally and create_causal_mask * disable torchscript * remove unused arg * remove from tests * balance model size for tests * fix device * tests * tests * flaky test * fix import --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-08 11:53:21 +02:00
Jingze Shi	d8590b4b0c	Add Doge model (#35891 ) * Add Doge Model * Fix code quality * Rollback an error commit * Fix config for open-source weights * Revert "Fix config for open-source weights" This reverts commit `229cdcac10`. * Add modular_doge * Update Doge inherits from Llama * Fix import bug * [docs] Add usage of doge model * Fix Doge import pretrainedconfig from modeling_utils to configuration_utils * [docs] remove trust remote code from doge * Fix dynamo bug in doge model * Update docstrings * Import apply_rotary_pos_emb and repeat_kv from Llama * Fix all nits * Fix code quality * Fix some bugs * Fix code quality * Remove inherited `_update_causal_mask` from Llama This leads to incorrect weight initialization. * Fix the wrong tensor orderings in DogeCDMoE * Fix attention mask bug We have to provide attention_mask for dynamic mask computation * Modify most implementations to inherit from Llama But there are two problems: 1. `flex_attention_forward` is not updated properly 2. `Example` error in the forward method of DogeForCausalLM * Modify CDMoE for batch efficient implementation * Uniform MoE configuration names, just like QwenMoE * Fix code quality * Fix code quality * Fix code quality * Add tp plan of CDMoE Module * Hybird DMA with sliding window * Update valid tokens greater than window size * Fix code quality * Add `convert_doge_weights_to_hf` * Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py * Fix nits in modular_doge * Fix code quality * Fix all nits * Fix all nits * Make sure the attention function is updated inside the class * Fix code quality issues in the Doge model and add a test for it * Fix `test_generate` * Fix code quality * Fix nits fllowing suggestions * Fix code quality * Fix code quality issues * Fix nits * Fix code quality nits * Fix the missing parameters in the configuration. * Fix the missing parameters in the configuration. * Fix nits * Add initialization of attention * Fix last nits * Simplify dynamic mask generation logic * Rename router_logits to gate_logits for matching latest changes of MixtralModel * Rename typings for matching latest changes of MixtralModel * Fixes typo in comment * Update src/transformers/models/doge/modular_doge.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix code quality issues to match other modular * Fix code quality issues to match other modular * Fix the static compilation errors * Update model weights link * Fix code quality issues to match other modular * reapply modular and support for new outputs * style * simplify a lot * fix import location * reapply modular * fix * fix integration test --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-08 11:44:29 +02:00

1 2 3 4 5 ...

19642 Commits