Commit Graph

19276 Commits

Author SHA1 Message Date
Yih-Dar
d4e7aa5526
Fix qwen_2_5 omni (#38658)
* fix

* fix

* break style

* break style

* Apply style fixes

* break style

* Apply style fixes

* fix modular

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-12 14:43:54 +02:00
Jesse Cai
e1812864ab
[docs] Add int4wo + 2:4 sparsity example to TorchAO README (#38592)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* update quantization readme

* update

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-12 12:17:07 +00:00
Quentin Gallouédec
bc68defcac
Update PULL_REQUEST_TEMPLATE.md (#38770) 2025-06-12 14:03:33 +02:00
Quentin Gallouédec
960fda25d1
Reduce verbosity for average_tokens_across_devices=True and world size = 1 (#38785)
* Warning to info for average_tokens_across_devices and world size = 1

* Update src/transformers/training_args.py
2025-06-12 14:02:53 +02:00
Yih-Dar
89c46b648d
Skip some export tests on torch 2.7 (#38677)
* skip

* fix

* better check

* Update import_utils.py

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-12 12:47:15 +02:00
Raushan Turganbay
27459025b8
[video processors] support frame sampling within processors (#38105)
* apply updates smolVLM (still needs workaround for chat template)

* add other models

* dump qwen omni for now, come back later

* port qwen omni from their impl

* wait, all qwens sample videos in same way!

* clean up

* make smolvlm backwards compatible and fix padding

* dix some tests

* fox smolvlm tests

* more clean up and test fixing

* delete unused arg

* fix

* address comments

* style

* fix test
2025-06-12 09:34:30 +00:00
Cyril Vallez
887054c714
Fix masking utils (#38783)
* fix

* Update masking_utils.py

* Update masking_utils.py
2025-06-12 11:00:46 +02:00
Yih-Dar
7c58336949
[Hotfix] Fix style bot (#38779)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-12 10:20:36 +02:00
Raushan Turganbay
7c6b1707c3
[masking utils] check None instead of try/except (#38561)
* fix vllm's compile backend

* fix the test

* apply the same changes in other masking strategies
2025-06-12 06:50:28 +00:00
rileyafox
9487765f07
Add Qwen2 MoE model card (#38649)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Add Qwen2 MoE model card

* Revisions to qwen2 moe model card

* Add Qwen2 MoE model card
2025-06-11 15:14:01 -07:00
Emile Aydar
32dbf4bddb
Update altCLIP model card (#38306)
* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Rename altclip.md to altclip.mdx

* Rename altclip.mdx to altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 14:48:34 -07:00
Dongruixuan Li
1dcb022e8f
chore(pixtral): emit block attention mask when using flash attention (#38741)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* chore(pixtral): emit block attention mask when using flash attention

Since flash_attention_2 relies solely on position_ids, emitting the block attention mask avoids unnecessary memory usage and prevents OOM on large inputs.

* remove unnecessary attention_mask assignment
2025-06-11 18:55:23 +00:00
Yih-Dar
60d4b35b20
Make style bot trigger CI after push (#38754)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-11 20:40:04 +02:00
Drew Ross
bb44d2a0f6
Update pegasus model card (#38675)
* Update Pegasus model card

* Fix transformers-cli command

* Update code examples to use bfloat16

* Reverted code examples to use float16

* Fix typo, update checkpoints link

* Update str formatting in code examples

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

* Remove inaccurate badges

* Revert badge removal

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Include cache_implementation argument in quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 10:56:25 -07:00
L
b84ebb7f3c
fix(qwen3_moe): pass kwargs to self_attn (#38691)
This is needed to avoid `.item()` calls in `_flash_attention_forward`.
2025-06-11 19:26:08 +02:00
Matt
9f563ada70
Deprecate TF + JAX (#38758)
* Scatter deprecation warnings around

* Delete the tests

* Make logging work properly!
2025-06-11 17:28:06 +01:00
Matt
337757cbd5
Update repo consistency check (#38763) 2025-06-11 17:02:03 +01:00
Matthew Douglas
e2bdc13375
Remove IPEX requirement for bitsandbytes on CPU (#38594)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 17:46:34 +02:00
Matt
063bef0865
Prepare for TF+Jax deprecation (#38760)
* Prepare for TF+Jax deprecation

* Remove .circleci jobs
2025-06-11 16:03:31 +01:00
Marc Sun
11ad9be153
Better typing for num_items_in_batch (#38728)
* fix

* style

* type checking ?

* maybe this ?

* fix

* can't be an int anymore

* fix
2025-06-11 16:26:41 +02:00
Pavel Iakubovskii
84710a4291
Add V-JEPA 2 (#38746)
* adding model and conversion scripts

* add imports to test vjepa conversion

* fix imports and make conversion work

* fix computation for short side

* replace attention with library attention function

* cleanup more attention classes

* remove config overrides

* add test cases, fix some of the failing ones

* fix the model outputs

* fix outputs of the model per review

* fix too big model test case

* fix styling __init__.py

* fix initialization test

* remove all asserts per review

* update sorting unsorting logic as per feedback

* remove is_video per review

* remove another is_video segment

* remove unwanted stuff

* small fixes

* add docstrings for the model

* revert adding vjepa2 config here

* update styling

* add config docstrings (wip)

* fix dpr issue

* removed test failing issues

* update styles

* merge predictor configs into main config

* remove processing code, add video processor

* remove permute which is not necessary now

* fix styles

* updated vjepa2 to be in video_processing_auto

* update comment for preprocessing

* test integration test and fix the outputs

* update test values, change test to look at repeated frames for a given image

* add a simple video processing test

* refactoring pixel_values_videos and upload ckpts to original

* fix torch_fx test cases

* remove unused config

* add all config docstrings

* add more integration tests

* add basic doc

* revert unwanted styling changes

* working make fixup

* Fix model_type in config

* update attention implementation to fit new hf standards

* fix the preprocessing logic, ensure it matches the original model

* remove use_rope logic, cleanup

* fix docstrings

* Further cleanup, update doc

* Fix model prefix

* fix get_vision_features

* VJEPA2Embeddings style refactor

* nit, style comment

* change modules default values

* Only `str` activation in config

* GradientCheckpointingLayer

* fixup

* fix conversion script

* Remove return_dict

* remove None return typehint

* Refactor VJEPA2Layer, remove use_SiLU

* Fix fx tests

* dpr -> drop_path_rates

* move *ModelOutput on top

* format docs bit

* update docs

* update docs

* update doc example

* remove prune_heads from model

* remove unused config params

* refactor embed signature

* Add vjepa to docs

* Fix config docstring

* update defaults

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fix import

* Min refactoring

* Update HUB_SOURCE and HUB_REPO in conversion script

* Add missing headers

* VJEPA -> V-JEPA in docs

* Add image to doc

* fix style

* fix init weights

* change checkpoint name in modeling tests

---------

Co-authored-by: Koustuv Sinha <koustuv.sinha@mail.mcgill.ca>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Koustuv Sinha <koustuvsinha@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-06-11 15:00:08 +01:00
Davis Wertheimer
a6f0e2b64a
Add z-loss to Bamba for v2 (#37842)
* Remove const

* Fix arg ref

* Sharded save

* Add z_loss flag

* Add modeling zloss

* Demodularize clm forward for zloss

* Also demodularize init for z_loss flag

* PR comments (mostly modularizing right)

* Demodularize forward

* Better name zloss and explain typematch

* Fully propagate coeff name

* style fixes

* zloss default float

* Remove conflicting annotations

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-11 15:29:17 +02:00
Yih-Dar
6b610d89f1
Revert "Trigger doc-builder job after style bot" (#38735)
Revert "Trigger doc-builder job after style bot (#38398)"

This reverts commit 51e0fac29f.
2025-06-11 14:56:39 +02:00
Minho Ryu
0bf53e69e2
[DeepSeek-V3] implement when q_lora_rank is None (#38743)
* implement when q_lora_rank is None

* make style and quality
2025-06-11 13:35:10 +01:00
ye
b426c2b313
fix: bf16 with TPU is allowed in configuration (#38670)
* fix: tpu bf16

* fix: style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 12:35:01 +00:00
Yao Matrix
c8c1e525ed
from 1.11.0, torchao.prototype.low_bit_optim is promoted to torchao.optim (#38689)
* since 1.11.0, torchao.prototype.low_bit_optim is promoted to
torchao.optim

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix review comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 12:16:25 +00:00
Yushun Xiang
56a7cf5546
fix: Add method to get image features in PaliGemmaForConditionalGeneration (#38730)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* fix: Add method to retrieve image features in PaliGemmaForConditionalGeneration

* feat: Add get_image_features method to multiple models for image feature extraction

* fix: reformat the files with ruff.

* feat: Add methods for packing and retrieving image and video features across multiple models

modified:
- modeling_chameleon.py
- modeling_llava_next.py
- modular_llava_next_video.py
- modeling_qwen2_vl.py

and generate the:
- modeling_llava_next_video.py
- modeling_llava_onevision.py
- modeling_qwen2_5_vl.py

* feat: Implement get_image_features method in Aria, Mistral3, and VipLlava models with updated parameters

* fix: reformatted the code with fix-style
2025-06-11 10:26:31 +00:00
Raushan Turganbay
380e6ea406
[llava] fix integration tests with Siglip (#38732)
fix llava siglip test
2025-06-11 08:09:16 +00:00
Rémi Ouazan
f1849eab22
Fixed a multiple-devices issue in SmolVLM model (#38736)
Fixed a multiple-devices issue in SmolVLMModel (#38557)

* Fixed a multiple-devices issue in SmolVLMModel

* Changed the modular to reflect changes
2025-06-11 10:08:01 +02:00
RogerSinghChugh
aa798b7ac9
New canine model card (#38631)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Commit for new_gpt_model_card.

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* commit for new canine model card.

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* implemented suggestion by @stevhliu.

* Update canine.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-10 09:30:05 -07:00
Matt
e28fb26e7d
Add AGENTS.md (#38734)
* More name sync

* repeatedly underlining "WRITE LESS, ROBOT"

* fewer, commas, please

* Clarify "copied from"

* Clarify "copied from"

* Mention test dependencies

* Added a line on preferring `modular` style
2025-06-10 16:27:37 +00:00
Francisco R Castro Garcia
cb4c56ce0d
Fix typo in Language Modeling example scripts and update TPU type (#38652)
* Fix typo that prevents the examples to be run correctly

* return .TPU in accelerator.distributedtype comparison
2025-06-10 13:43:35 +00:00
alexzms
8ff22e9d3b
[add-new-model-like] Robust search & proper outer '),' in tokenizer mapping (#38703)
* [add-new-model-like] Robust search & proper outer '),' in tokenizer mapping

* code-style: arrange the importation in add_new_model_like.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-10 12:25:12 +00:00
Yuanyuan Chen
8340e8746e
Use OSError (#38712)
Signed-off-by: cyy <cyyever@outlook.com>
2025-06-10 12:13:49 +00:00
Yih-Dar
8257734b5f
Fix llava tests (#38722)
* update

* fix 1

* fix 2

* fix 3

* fix 4

* fix 5

* fix 6

* fix 7

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-10 13:53:17 +02:00
वेदांत
71f7385942
Logging message for `` is_bitsandbytes_available() `` (#38528)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* bnb import log

* bnb import log

* log mesage change

* moved error issue in qunatizer_bnb_4_bit.py

* ruff

* arg added for bnb check

* required changes

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-10 10:15:01 +00:00
Yih-Dar
04cdf83244
Update some tests for torch 2.7.1 (#38701)
* fix 1

* fix 2

* fix 3

* fix 4

* fp16

* break

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-10 11:46:52 +02:00
rdonggroq
afdb821318
Fix smart resize (#38706)
* Fix smart_resize bug

* Add smart_resize test

* Remove unnecessary error checking

* Fix smart_resize tests

---------

Co-authored-by: Richard Dong <rdong@rdong.c.groq-143208.internal>
2025-06-10 08:59:22 +00:00
Yana Mishula
81799d8b55
Standardize ByT5 model card format (#38699)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
* Standardize ByT5 model card format

* Apply review feedback from @stevhliu

* Fix Notes formatting and wording

* Fix `aya_vision` test (#38674)

* fix 1: load_in_4bit=True,

* fix 2: decorateor

* fixfix 2: breakpoint

* fixfix 3: update

* fixfix 4: fast

* fixfix 5: cond

* fixfix 5: cond

* fixfix 6: cuda 8

* ruff

* breakpoint

* dtype

* a10

* a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix autodoc formatting for ByT5Tokenizer

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-09 15:02:50 -07:00
Yih-Dar
e55983e2b9
Fix aya_vision test (#38674)
* fix 1: load_in_4bit=True,

* fix 2: decorateor

* fixfix 2: breakpoint

* fixfix 3: update

* fixfix 4: fast

* fixfix 5: cond

* fixfix 5: cond

* fixfix 6: cuda 8

* ruff

* breakpoint

* dtype

* a10

* a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-09 22:18:52 +02:00
Aashish Anand
b61c47f5a5
Created model card for xlm-roberta-xl (#38597)
* Created model card for xlm-roberta-xl

* Update XLM-RoBERTa-XL model card with improved descriptions and usage examples

* Minor option labeling fix

* Added MaskedLM version of XLM RoBERTa XL to model card

* Added quantization example for XLM RoBERTa XL model card

* minor fixes to xlm roberta xl model card

* Minor fixes to mask format in xlm roberta xl model card
2025-06-09 13:00:38 -07:00
Aashish Anand
e594e75f1b
Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596)
* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout

* Added CLI command example and quantization example for XLM RoBERTa model card.

* Minor change to transformers CLI and quantization example for XLM roberta model card
2025-06-09 12:26:31 -07:00
Aashish Anand
29ca043856
Created model card for XLM model (#38595)
* Created model card for XLM model

* Revised model card structure and content of XLM model

* Update XLM model documentation with improved examples and code snippets for predicting <mask> tokens using Pipeline and AutoModel.
2025-06-09 12:26:23 -07:00
Marcel Ambo Ndowah
25f711aa89
Drop as_target_processor from the _call_ and pad methods (#38642)
Drop as_target_processor from _call_ and pad methods; reformat docstrings for readability
2025-06-09 12:26:09 -07:00
Matthew Douglas
837ddac1ec
Docs: update bitsandbytes torch.compile compatibility (#38651)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
New model PR merged notification / Notify new model (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
2025-06-09 14:51:57 -04:00
dbleyl
b9faf2f930
Fix TypeError: 'NoneType' object is not iterable for esm (#38667) (#38668)
Add post_init() calls to EsmForMaskedLM, EsmForTokenClassification and EsmForSequenceClassification.
2025-06-09 15:23:20 +00:00
Fiona Waters
11dca07a10
Fix retrieve function signature and remove faiss requirement (#38624)
Signed-off-by: Fiona Waters <fiwaters6@gmail.com>
2025-06-09 15:17:33 +00:00
xiao
b31d462c61
Fix some models import (#38694)
Fix models import
2025-06-09 16:09:24 +01:00
pweglik
282d6684dc
Fix attention mask expansion when converting to executorch (#38637) 2025-06-09 15:00:55 +00:00
Anthony
19224c3642
fix: "check out" as verb (#38678)
"check out" as verb
2025-06-09 14:07:31 +00:00