Commit Graph

844 Commits

Author SHA1 Message Date
Quentin Gallouédec
c989ddd294
Simplify and update trl examples (#38772)
* Simplify and update trl examples

* Remove optim_args from SFTConfig in Trainer documentation

* Update docs/source/en/trainer.md

* Apply suggestions from code review

* Update docs/source/en/trainer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <qgallouedec@Quentins-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 12:03:49 +00:00
Quentin Gallouédec
de24fb63ed
Use HF papers (#38184)
* Use hf papers

* Hugging Face papers

* doi to hf papers

* style
2025-06-13 11:07:09 +00:00
SohamPrabhu
85f060e9b0
Updated moonshine modelcard (#38711)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Moved the sources to the right

* small Changes

* Some Changes to moonshine

* Added the install to pipline

* updated the monshine model card

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated Documentation According to changes

* Fixed the model with the commits

* Update moonshine.md

* Update moshi.md

---------

Co-authored-by: Your Name <sohamprabhu@Mac.fios-router.home>
Co-authored-by: Your Name <sohamprabhu@Sohams-MacBook-Air.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 10:27:17 -07:00
Drew Ross
645cf297cc
Add missing div in Pegasus model card (#38773)
Add missing div
2025-06-12 10:27:07 -07:00
Yusuf Shihata
346f341630
[Docs] New DiT model card (#38721)
* documenation finished

* Update dit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 10:26:50 -07:00
Cyril Vallez
4b8ec667e9
Remove all traces of low_cpu_mem_usage (#38792)
* remove it from all py files

* remove it from the doc

* remove it from examples

* style

* remove traces of _fast_init

* Update test_peft_integration.py

* CIs
2025-06-12 16:39:33 +02:00
rileyafox
9487765f07
Add Qwen2 MoE model card (#38649)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Add Qwen2 MoE model card

* Revisions to qwen2 moe model card

* Add Qwen2 MoE model card
2025-06-11 15:14:01 -07:00
Emile Aydar
32dbf4bddb
Update altCLIP model card (#38306)
* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Rename altclip.md to altclip.mdx

* Rename altclip.mdx to altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 14:48:34 -07:00
Drew Ross
bb44d2a0f6
Update pegasus model card (#38675)
* Update Pegasus model card

* Fix transformers-cli command

* Update code examples to use bfloat16

* Reverted code examples to use float16

* Fix typo, update checkpoints link

* Update str formatting in code examples

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

* Remove inaccurate badges

* Revert badge removal

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Include cache_implementation argument in quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 10:56:25 -07:00
Pavel Iakubovskii
84710a4291
Add V-JEPA 2 (#38746)
* adding model and conversion scripts

* add imports to test vjepa conversion

* fix imports and make conversion work

* fix computation for short side

* replace attention with library attention function

* cleanup more attention classes

* remove config overrides

* add test cases, fix some of the failing ones

* fix the model outputs

* fix outputs of the model per review

* fix too big model test case

* fix styling __init__.py

* fix initialization test

* remove all asserts per review

* update sorting unsorting logic as per feedback

* remove is_video per review

* remove another is_video segment

* remove unwanted stuff

* small fixes

* add docstrings for the model

* revert adding vjepa2 config here

* update styling

* add config docstrings (wip)

* fix dpr issue

* removed test failing issues

* update styles

* merge predictor configs into main config

* remove processing code, add video processor

* remove permute which is not necessary now

* fix styles

* updated vjepa2 to be in video_processing_auto

* update comment for preprocessing

* test integration test and fix the outputs

* update test values, change test to look at repeated frames for a given image

* add a simple video processing test

* refactoring pixel_values_videos and upload ckpts to original

* fix torch_fx test cases

* remove unused config

* add all config docstrings

* add more integration tests

* add basic doc

* revert unwanted styling changes

* working make fixup

* Fix model_type in config

* update attention implementation to fit new hf standards

* fix the preprocessing logic, ensure it matches the original model

* remove use_rope logic, cleanup

* fix docstrings

* Further cleanup, update doc

* Fix model prefix

* fix get_vision_features

* VJEPA2Embeddings style refactor

* nit, style comment

* change modules default values

* Only `str` activation in config

* GradientCheckpointingLayer

* fixup

* fix conversion script

* Remove return_dict

* remove None return typehint

* Refactor VJEPA2Layer, remove use_SiLU

* Fix fx tests

* dpr -> drop_path_rates

* move *ModelOutput on top

* format docs bit

* update docs

* update docs

* update doc example

* remove prune_heads from model

* remove unused config params

* refactor embed signature

* Add vjepa to docs

* Fix config docstring

* update defaults

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fix import

* Min refactoring

* Update HUB_SOURCE and HUB_REPO in conversion script

* Add missing headers

* VJEPA -> V-JEPA in docs

* Add image to doc

* fix style

* fix init weights

* change checkpoint name in modeling tests

---------

Co-authored-by: Koustuv Sinha <koustuv.sinha@mail.mcgill.ca>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Koustuv Sinha <koustuvsinha@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-06-11 15:00:08 +01:00
RogerSinghChugh
aa798b7ac9
New canine model card (#38631)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Commit for new_gpt_model_card.

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* commit for new canine model card.

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* implemented suggestion by @stevhliu.

* Update canine.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-10 09:30:05 -07:00
Yana Mishula
81799d8b55
Standardize ByT5 model card format (#38699)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Standardize ByT5 model card format

* Apply review feedback from @stevhliu

* Fix Notes formatting and wording

* Fix `aya_vision` test (#38674)

* fix 1: load_in_4bit=True,

* fix 2: decorateor

* fixfix 2: breakpoint

* fixfix 3: update

* fixfix 4: fast

* fixfix 5: cond

* fixfix 5: cond

* fixfix 6: cuda 8

* ruff

* breakpoint

* dtype

* a10

* a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix autodoc formatting for ByT5Tokenizer

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-09 15:02:50 -07:00
Aashish Anand
b61c47f5a5
Created model card for xlm-roberta-xl (#38597)
* Created model card for xlm-roberta-xl

* Update XLM-RoBERTa-XL model card with improved descriptions and usage examples

* Minor option labeling fix

* Added MaskedLM version of XLM RoBERTa XL to model card

* Added quantization example for XLM RoBERTa XL model card

* minor fixes to xlm roberta xl model card

* Minor fixes to mask format in xlm roberta xl model card
2025-06-09 13:00:38 -07:00
Aashish Anand
e594e75f1b
Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596)
* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout

* Added CLI command example and quantization example for XLM RoBERTa model card.

* Minor change to transformers CLI and quantization example for XLM roberta model card
2025-06-09 12:26:31 -07:00
Aashish Anand
29ca043856
Created model card for XLM model (#38595)
* Created model card for XLM model

* Revised model card structure and content of XLM model

* Update XLM model documentation with improved examples and code snippets for predicting <mask> tokens using Pipeline and AutoModel.
2025-06-09 12:26:23 -07:00
Armaghan Shakir
31023b6909
Fix MiniMax (docs and integration tests checkpoint) (#38575)
* update checkpoints for integration tests

* minor fixes in docs
2025-06-06 08:43:11 +02:00
Vanshu
593e29c5e2
Updated Aria model card (#38472)
Some checks failed
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
New model PR merged notification / Notify new model (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
* Update aria.md

* Update aria.md

* Suggested Updates - aria.md
2025-06-05 14:36:54 -07:00
Monish Singhal
c75bf2c36e
Fix typo in LLaVa documentation (#38618)
* Fix typo in LLaVa documentation

In exactly one section, LlavaImageProcessor was spelt wrongly as LLavaImageProcessor, which throws off copy-pasting the section.

* Fix LlavaImageProcessor url to make it valid (and copypaste-able)

Earlier, the URL contained the entire HF prefix. This commit removes that to ensure that the code block can be copied and run as is.
2025-06-05 13:25:07 -07:00
Henrik Matthiesen
1fed6166c0
added fast image processor for ZoeDepth and expanded tests accordingly (#38515)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* added fast image processor for ZoeDepth and expanded tests accordingly

* added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now

* final edits for zoedept fast image processor

* final minor edit for zoedepth fast imate procesor
2025-06-04 22:59:17 +00:00
RogerSinghChugh
8e1266de2b
New gpt neo model card (#38505)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Commit for new_gpt_model_card.

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-04 09:56:47 -07:00
Armaghan Shakir
55736eea99
Add support for MiniMax's MiniMax-Text-01 (#35831)
* end-to-end architecture

* lightning-attn: refactor, clean, optimize

* put minimax_text_01 in other files

* use latest __init__ standards and auto-generate modular

* support attention_mask for lightning-attn

* Revert "use latest __init__ standards and auto-generate modular"

This reverts commit d8d3c409d8.

* fix modular conversion

* pass both attention masks instead of tuple

* formatting

* Updated Dynamic Cache

* created MiniMaxText01Cache

* fix hardcoded slope_rate

* update attn_type_list in config

* fix lightning when use_cache=False

* copy tests from mixtral

* (checkpoint) all tests pass for normal attention

* fix all unittests

* fix import sorting

* fix consistency and formatting tests

* fix config

* update tests, since changes in main

* fix seq_len error

* create dummy docs

* fix checkpoint

* add checkpoint in config docstring

* run modular_conversion

* update docs

* fix checkpoint path and update tests

* fix ruff

* remove repeated expected_slice

* update docs

* rename "minimax-text-01" to "minimax"

* inherit config from mixtral

* remove from docs in other languages

* undo files that should be untouched

* move minimax to end in conversation docs

* use MiniMaxForCausalLM as it is

* ruff fixes

* run modular

* fix docstring example in causallm

* refactor attention loop and decay factors

* refactor config in modular

* run modular

* refactor cache

* rename static_cache to linear_cache

* make positional embeddings necessary

* remove unnecessary layernorms declarations

* fix import in tests

* refactor attention in next tokens

* remove outdated code

* formatting and modular

* update tests

* rename layernorm alpha/beta factors

* register decay factors as buffers

* remove unused declarations of decay factors

* update config for alpha/beta factors

* run modular

* remove head_dim in tests

* remove minimax from fx.py

* remove stuff that is not really needed

* update __init__

* update qkv torch.split

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix qkv torch.split

* quality fixes

* remove mistakenly added dummy

* purge unused ModelTester code

* fix-copies

* run fix-copies

* fix head_dim

* write cache formatting tests

* remove postnorm

* avoid contiguous in attention current states

* update expected_slice

* add generation test for integration

* fix dtype in generation test

* update authors

* update with changes in main

* update graident checkpointing and minor fixes

* fix mutable attn_type_list

* rename: attn_type -> layer_type

* update for layer_types

* update integration tests

* update checkpoint

* clean overview in docs

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-04 09:38:40 +02:00
Tony Wu
c72ba69441
Add ColQwen2 to 🤗 transformers (#35778)
* feat: add colqwen2 (wip)

* tests: fix test_attention_outputs

* tests: reduce hidden size to accelerate tests

* tests: fix `test_attention_outputs` 🥳

* fix: fix wrong parent class for `ColQwen2ForRetrievalOutput`

* fix: minor typing and style changes

* chore: run `make style`

* feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor`

* tests: tweak comments

* style: apply ruff formatter

* feat: move default values for `visual_prompt_prefix` and `query_prefix`

* docs: update ColQwen2 model card

* docs: tweak model cards

* docs: add required example config checkpoint

* tests: update expected scores in integration test

* docs: tweak quickstart snippets

* fix: address PR comments

* tests: fix colqwen2 tests + tweak comment in colpali test

* tests: unskip useful tests

* fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string

* fix: fix ColPali outputs when `return_dict == False`

* fix: fix issue with PaliGemma output not being a dict

* docs: set default dtype to bfloat16 in quickstart snippets

* fix: fix error when `return_dict=False` in ColPali and ColQwen2

* tests: fix special tokens not being replaced in input_ids

* style: fix lint

* fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json`

* fix: remove unused `padding_side` in ColQwen2 model

* docs: update ColQwen2's model doc

* fix: fix harcoded vlm backbone class in ColQwen2Config

* fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs

* docs: fix typo in model docstring

* docs: add illuin mention in model docs

* fix: let `padding_size` be handled by `tokenizer_config.json`

* docs: add colpali reference url in colqwen2's model doc

* docs: add Hf mention in model docs

* docs: add late interaction mention in model docs

* docs: tweak colqwen2 model doc

* docs: update reference checkpoint for ColPali to v1.3

* docs: simplify quickstart snippets

* docs: remove redundant `.eval()`

* refactor:  use `can_return_tuple` decorator for ColPali and ColQwen2

* docs: fix copyright date

* docs: add missing copyright in tests

* fix: raise error when `initializer_range` is not in config

* docs: remove redundant `.eval()` in colpali doc

* fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute

See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config.

* fix: add missing `initializer_range` attribute in `ColQwen2Config`

* fix: use `get_text_config` in `resize_token_embeddings`

* update colwen2 with auto_docstring

* docs: fix wrong copyright year

* chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config`

* refactor: merge `inner_forward` into `forward`

* Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code

* protect torch import in modular to protect in processing

* protect torch import in modular to protect in processing

* tests: fix hf model path in ColQwen2 integration test

* docs: clarify `attn_implementation` and add comments

* docs: add fallback snippet for using offline PIL dummy images

* docs: temporarily revert attn_implementation to `None` while sdpa is not fixed

* docs: tweaks in colpali/colqwen2 quick start snippets

* fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model

* fix: add missing changes in modular file

* fix modeling tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-02 12:58:01 +00:00
Yuanzhou Cai
942c60956f
Model card for mobilenet v1 and v2 (#37948)
* doc: #36979

* doc: update hfoptions

* add model checkpoints links

* add model checkpoints links

* update example output

* update style #36979

* add pipeline tags

* improve comments

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggested changes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:20:19 -07:00
Jiwook Han
9a8510572b
Updated the model card for ViTMAE (#38302)
* Update vit_mae.md

* badge float:right

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update model_doc/vit_mae.md

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:19:43 -07:00
Vanshu
c9fcbd5bf9
Updated the Model docs - for the ALIGN model (#38072)
* Updated the Model docs - for the ALIGN model

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated align.md

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update align.md

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:19:09 -07:00
Andy Vu
3b3ebcec40
Updated model card for OLMo2 (#38394)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Updated OLMo2 model card

* added command line

* Add suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Indented code block as per suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 16:24:36 -07:00
Tanuj Rai
a092f6babf
Update granite.md (#37791)
* Update granite.md

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update granite.md

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* minor fixes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 12:55:15 -07:00
RogerSinghChugh
be7aa3210b
New bart model card (#37858)
* Modified BART documentation wrt to issue #36979.

* Modified BART documentation wrt to issue #36979.

* fixed a typo.

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* blank commit.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:51:41 -07:00
RogerSinghChugh
587c1b0ed1
Updated BERTweet model card. (#37981)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:51:22 -07:00
RogerSinghChugh
b73faef52f
Updated BigBird Model card as per #36979. (#37959)
* Updated BigBird Model card as per #36979.

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:24:28 -07:00
Madhav Kumar
538e847c06
Updated Zoedepth model card (#37898)
* Edited zoedepth model card according to specifications.

* Edited Zoedepth model file

* made suggested changes.
2025-05-27 10:06:53 -07:00
Parag Ekbote
4f7b0ff8d1
Update Model Card for Mamba-2 (#37951)
* update model page.

* update model page.

* Update docs/source/en/model_doc/mamba2.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* update the model page.

* update.

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Apply the suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add an quantization example and update the toctree.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* remove the additional comma

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 10:06:39 -07:00
eustlb
b9f8f863d9
[CSM] update model id (#38211)
* update model id

* codec_model eval

* add processor img

* use ungated repo for processor tests
2025-05-27 17:03:55 +02:00
eustlb
3142bd8592
[CSM] infer codec model with no_grad + audio eos label (#38215)
* infer codec model with no_grad

* codec_model eval

* training labels: add audio eos token
2025-05-27 14:10:17 +00:00
Ragnar
63964b7c67
fix typos (#38336)
* Update video_processor.md

* Update deepseek_v3.md
2025-05-26 14:42:37 +00:00
Kseniya Parkhamchuk
31f8a0fe8a
[docs]: update roformer.md model card (#37946)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Update roformer model card

* fix example purpose description

* fix model description according to the comments

* revert changes for autodoc

* remove unneeded tags

* fix review issues

* fix hfoption

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 16:27:56 -07:00
Bryan C.
36f97ae15b
docs(swinv2): Update SwinV2 model card to new standard format (#37942)
* docs(swinv2): Update SwinV2 model card to new standard format

* docs(swinv2): Apply review suggestions

Incorporates feedback from @stevhliu to:
- Enhance the introductory paragraph with more details about scaling and SimMIM.
- Generalize the tip from "image classification tasks" to "vision tasks".

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 13:04:13 -07:00
Aguedo
33d23c39ed
Update BioGPT model card (#38214)
* Update BioGPT model card

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* correction for CPU fallback

* added quantization code and method

* fixed transformers-cli call

---------

Co-authored-by: Aguedo <aguedo@fakeemail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 13:03:47 -07:00
Arthur
f5d45d89c4
🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288)
* Protect ParallelInterface

* early error out on output attention setting for no wraning in modeling

* modular update

* fixup

* update model tests

* update

* oups

* set model's config

* more cases

* ??

* properly fix

* fixup

* update

* last onces

* update

* fix?

* fix wrong merge commit

* fix hub test

* nits

* wow I am tired

* updates

* fix pipeline!

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-05-23 17:17:38 +02:00
Jinan Zhou
135163e9c5
Expose AutoModelForTimeSeriesPrediction for import (#38307)
* expose AutoModelForTimeSeriesPrediction for import

* add in docs
2025-05-23 13:09:29 +00:00
Anton Vlasjuk
d95c864a25
🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108)
* starting attn refactor for encoder decoder models via bart (eager + sdpa)

* flash attention works, remove unnecessary code

* flex attention support for bart!, gotta check if the renaming is not too aggressive

* some comments

* skip flex grad test for standalone as done with the other test

* revert flex attn rename (for now), sdpa simplify, and todos

* more todos

* refactor mask creation for reuse

* modular attempt at biogpt

* first batch of other models

* fix attn dropout

* fix autoformer copies

* hubert

* another batch of models

* copies/style + last round of bart models --> whisper next?

* remove unnecessary _reshape function and remove copy to whisper

* add skip for decoder-only models out of enc-dec (same as in bart)

* bring back licences

* remove comment, added to pr read instead

* mostly docs

* disable sew flex attn as it's unclear attn mask for now

* oops

* test fixes for enc-dec

* torch fx fixes + try at flex attn

* skip on mbart

* some more fixes

* musicgen skip / delete old attn class logic + sdpa compose compile skip

* disable flex attn for musicgen, not worth the effort

* more fixes and style

* flex attention test for dropout and encoder decoder that dont have main input names

* informer fixes

* the weirdest thing I've encountered yet...

* style

* remove empty tensor attempt, found core root in previous commits

* disable time series due to tests being very text centric on inputs

* add speech to text to be ignoring the other attns, also due to tests

* update docs

* remaining issues resolved ?

* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D

* some models have not set the is_causal flag...

* change dtype in softmax tol old behaviour + some modular fixes

* I hate it but it is what it is

* fixes from main for bart

* forgot this one

* some model fixes

* style

* current status

* marian works now

* fixing some copies

* some copy fixes + time series x informer

* last models possibly and fixes on style/copies

* some post merge fixes

* more fixes

* make attention interface callable and move warnings there

* style lol

* add comment to "unsupported"

* remove callable interface and change interface warnings + some copies

* fix

* ternary is ugly af, make it simpler

* how did that happen

* fix flex attn test

* failing the test

* no more fallback! fixing copies next

* style + attn fixed

* fixing copies and mask creation

* wrong copy

* fixup tests and disable flex attn for now

* fixup last tests?
2025-05-22 17:12:58 +02:00
Joao Gante
f8630c778c
[Whisper] handle deprecation of forced_decoder_ids (#38232)
* fix

* working saved forced_decoder_ids

* docstring

* add deprecation message

* exception message ordering

* circular import comment
2025-05-22 09:16:38 +00:00
Bryan C.
b369a65480
docs(swin): Update Swin model card to standard format (#37628)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* docs(swin): Update Swin model card to standard format

* docs(swin): Refine link to Microsoft organization for Swin models

Apply suggestion from @stevhliu in PR #37628.

This change updates the link pointing to the official Microsoft Swin Transformer checkpoints on the Hugging Face Hub.

The link now directs users specifically to the Microsoft organization page, filtered for Swin models, providing a clearer and more canonical reference compared to the previous general search link.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(swin): Clarify padding description and link to backbone docs

Apply suggestion from @stevhliu in PR #37628.

This change introduces two improvements to the Swin model card:

1.  Refines the wording describing how Swin handles input padding for better clarity.
2.  Adds an internal documentation link to the general "backbones" page when discussing Swin's capability as a backbone model.

These updates enhance readability and improve navigation within the Transformers documentation.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(swin): Change Swin paper link to huggingface.co/papers as suggested

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-21 16:16:43 -07:00
Parag Ekbote
28d3148b07
Update Model Card for Mamba (#37863)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* update model card.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update quantization example.

* update example.

* update

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-21 10:58:23 -07:00
youngrok cha
101b3fa4ea
fix multi-image case for llava-onevision (#38084)
* _get_padding_size module

* do not patchify images when processing multi image

* modify llava onevision image processor fast

* tensor to list of tensors

* backward compat

* reuse pad_to_square in llave & some clarification

* add to doc

* fix: consider no image cases (text only or video)

* add integration test

* style & repo_consistency
2025-05-21 11:50:46 +02:00
Younes Belkada
6829936ee0
[MODEL] Add Falcon H1 (#38249)
* Create push-important-models.yml

* feat: add falcon-h1

* fixup

* address comment

* fix

* fix copies

* fix copies

* fix

* fix

* fix

* fix

* fix copies

* fix

* fix copies

* fix test import to at least trigget the cis

* yups

* update

* fix make fix copies

* fix inits?

* fix style

* skip annoying test

* add integration test for Falcon H1

* fix copies

* fix

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
2025-05-21 10:43:11 +02:00
Garrett Goon
390f153469
Add padding-free to bamba (#35861)
* add seq_idx and fa kwargs

* update tests

* docs and grad ckpt support

* fmt

* better names

* test_raise_missing_padding_free_kwarg_errs

* + seq_idx in doc strings

* padding free training docs

* add link to pr plots

* raise err on attn_mask with padding free

* rm raising missing padding free err test

* BambaFlashAttentionKwargs

* run modular util for modular_granitemoehybrid.py
2025-05-20 17:13:59 +02:00
NielsRogge
7c9b0ca08c
[SAM-HQ] Update names in the docs (#38058)
Update names
2025-05-19 09:21:14 -07:00
Raushan Turganbay
955e61b0da
Remove head mask in generative models (#35786)
* just squash into one commit

* delete print
2025-05-15 10:44:19 +02:00
Jinyong Lee
342961f669
Add Fast Image Processor for vilt (#37304)
* init vilt image processor fast

* Refactor image processor tests to use loop for all processors

* Add ViltImageProcessorFast with PyTorch-based optimized image processing

* Change made automatically by make fixup command

* Change made automatically by make fix-copies command

* Fix type hints in ViltImageProcessorFast for Python compatibility

* Define constants for image resizing based on COCO dataset aspect ratio

* Add missing property initializations to ViltImageProcessorFast

* Extract resize logic into dedicated method in ViltImageProcessorFast

* Extract padding logic into dedicated method

* Implement shape-based image grouping for optimized processing in Vilt

* Update test suite to verify ViltImageProcessorFast attributes

* Move variable declarations to _preprocess method parameters

* Remove unused parameters

* Rename _resize method to resize to override existing function

* Remove whitespace

* Remove unnecessary type check and conversion for stacked_images

* Remove redundant loop and apply padding directly to stacked images

* Refactor pad function to return images and mask as tuple instead of dict

* Add tests comparing padding masks in slow and fast implementations

* Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations

* Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast

* Move docstrings of custom args to ViltFastImageProcessorKwargs

* Use reorder_images function for both masks and images

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-05-13 15:40:53 +00:00