* squash commits
* rename gpu
* rename accelerator
* change _toctree.yml
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: sdp <sdp@a4bf01943ff7.jf.intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update vit_mae.md
* badge float:right
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update model_doc/vit_mae.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated the Model docs - for the ALIGN model
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated align.md
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update align.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated OLMo2 model card
* added command line
* Add suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Added suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Indented code block as per suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* minor fixes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Modified BART documentation wrt to issue #36979.
* Modified BART documentation wrt to issue #36979.
* fixed a typo.
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* blank commit.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BigBird Model card as per #36979.
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update model page.
* update model page.
* Update docs/source/en/model_doc/mamba2.md
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* update the model page.
* update.
* Apply suggestions from code review
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Apply the suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add an quantization example and update the toctree.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* remove the additional comma
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update roformer model card
* fix example purpose description
* fix model description according to the comments
* revert changes for autodoc
* remove unneeded tags
* fix review issues
* fix hfoption
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(swinv2): Update SwinV2 model card to new standard format
* docs(swinv2): Apply review suggestions
Incorporates feedback from @stevhliu to:
- Enhance the introductory paragraph with more details about scaling and SimMIM.
- Generalize the tip from "image classification tasks" to "vision tasks".
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update BioGPT model card
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* correction for CPU fallback
* added quantization code and method
* fixed transformers-cli call
---------
Co-authored-by: Aguedo <aguedo@fakeemail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Get parallel loader working. Include tests.
* Update the tests for parallel loading
* Rename env variables.
* Add docs for parallel model weight loading.
* Touch up parallel model loading docs.
* Touch up parallel model loading docs again.
* Edit comment in test_modeling_utils_parallel_loading.py
* Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py
* Correct times for parallelized loading, previous times were for a "hot" filesystem
* Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule.
* Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally.
* Fix style on model loading parallelism changes.
* Merge latest version of master's modeling_utils.
* Removed unused variable.
* Fix argument packing for the parallel loader.
* Fix state dict being undefined in the parallel model loader.
* Rename variables used in parallel model loading for clarity. Use get_module_from_name().
* Switch to the use of threads for parallel model loading.
* Update docs for parallel loading.
* Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting.
* Move parallelized shard loading into its own function.
* Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING.
* Update copyright to 2025 in readme for paralell model loading.
* Remove garbage collection line in load_shard_file, implicit garbage collection already occurs.
* Run formatter on modeling_utils.py
* Apply style fixes
* Delete tests/utils/test_modeling_utils_parallel_loading.py
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* starting attn refactor for encoder decoder models via bart (eager + sdpa)
* flash attention works, remove unnecessary code
* flex attention support for bart!, gotta check if the renaming is not too aggressive
* some comments
* skip flex grad test for standalone as done with the other test
* revert flex attn rename (for now), sdpa simplify, and todos
* more todos
* refactor mask creation for reuse
* modular attempt at biogpt
* first batch of other models
* fix attn dropout
* fix autoformer copies
* hubert
* another batch of models
* copies/style + last round of bart models --> whisper next?
* remove unnecessary _reshape function and remove copy to whisper
* add skip for decoder-only models out of enc-dec (same as in bart)
* bring back licences
* remove comment, added to pr read instead
* mostly docs
* disable sew flex attn as it's unclear attn mask for now
* oops
* test fixes for enc-dec
* torch fx fixes + try at flex attn
* skip on mbart
* some more fixes
* musicgen skip / delete old attn class logic + sdpa compose compile skip
* disable flex attn for musicgen, not worth the effort
* more fixes and style
* flex attention test for dropout and encoder decoder that dont have main input names
* informer fixes
* the weirdest thing I've encountered yet...
* style
* remove empty tensor attempt, found core root in previous commits
* disable time series due to tests being very text centric on inputs
* add speech to text to be ignoring the other attns, also due to tests
* update docs
* remaining issues resolved ?
* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D
* some models have not set the is_causal flag...
* change dtype in softmax tol old behaviour + some modular fixes
* I hate it but it is what it is
* fixes from main for bart
* forgot this one
* some model fixes
* style
* current status
* marian works now
* fixing some copies
* some copy fixes + time series x informer
* last models possibly and fixes on style/copies
* some post merge fixes
* more fixes
* make attention interface callable and move warnings there
* style lol
* add comment to "unsupported"
* remove callable interface and change interface warnings + some copies
* fix
* ternary is ugly af, make it simpler
* how did that happen
* fix flex attn test
* failing the test
* no more fallback! fixing copies next
* style + attn fixed
* fixing copies and mask creation
* wrong copy
* fixup tests and disable flex attn for now
* fixup last tests?
* docs(swin): Update Swin model card to standard format
* docs(swin): Refine link to Microsoft organization for Swin models
Apply suggestion from @stevhliu in PR #37628.
This change updates the link pointing to the official Microsoft Swin Transformer checkpoints on the Hugging Face Hub.
The link now directs users specifically to the Microsoft organization page, filtered for Swin models, providing a clearer and more canonical reference compared to the previous general search link.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(swin): Clarify padding description and link to backbone docs
Apply suggestion from @stevhliu in PR #37628.
This change introduces two improvements to the Swin model card:
1. Refines the wording describing how Swin handles input padding for better clarity.
2. Adds an internal documentation link to the general "backbones" page when discussing Swin's capability as a backbone model.
These updates enhance readability and improve navigation within the Transformers documentation.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(swin): Change Swin paper link to huggingface.co/papers as suggested
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update model card.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update quantization example.
* update example.
* update
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* _get_padding_size module
* do not patchify images when processing multi image
* modify llava onevision image processor fast
* tensor to list of tensors
* backward compat
* reuse pad_to_square in llave & some clarification
* add to doc
* fix: consider no image cases (text only or video)
* add integration test
* style & repo_consistency
* add seq_idx and fa kwargs
* update tests
* docs and grad ckpt support
* fmt
* better names
* test_raise_missing_padding_free_kwarg_errs
* + seq_idx in doc strings
* padding free training docs
* add link to pr plots
* raise err on attn_mask with padding free
* rm raising missing padding free err test
* BambaFlashAttentionKwargs
* run modular util for modular_granitemoehybrid.py
* mvp
* remove trust_remote_code
* generate_from_hub
* handle requirements; docs
* english
* doc PR suggestions
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changed remote code path to generate/generate.py
* model repo has custom generate -> override base generate
* check for proper inheritance
* some doc updates (missing: tag-related docs)
* update docs to model repo
* nit
* nit
* nits
* Update src/transformers/dynamic_module_utils.py
* Apply suggestions from code review
* Update docs/source/en/generation_strategies.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* trust remote code is required
* use new import utils for requirements version parsing
* use org examples
* add tests
* Apply suggestions from code review
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
* ascii file structure; tag instructions on readme.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
* init vilt image processor fast
* Refactor image processor tests to use loop for all processors
* Add ViltImageProcessorFast with PyTorch-based optimized image processing
* Change made automatically by make fixup command
* Change made automatically by make fix-copies command
* Fix type hints in ViltImageProcessorFast for Python compatibility
* Define constants for image resizing based on COCO dataset aspect ratio
* Add missing property initializations to ViltImageProcessorFast
* Extract resize logic into dedicated method in ViltImageProcessorFast
* Extract padding logic into dedicated method
* Implement shape-based image grouping for optimized processing in Vilt
* Update test suite to verify ViltImageProcessorFast attributes
* Move variable declarations to _preprocess method parameters
* Remove unused parameters
* Rename _resize method to resize to override existing function
* Remove whitespace
* Remove unnecessary type check and conversion for stacked_images
* Remove redundant loop and apply padding directly to stacked images
* Refactor pad function to return images and mask as tuple instead of dict
* Add tests comparing padding masks in slow and fast implementations
* Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations
* Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast
* Move docstrings of custom args to ViltFastImageProcessorKwargs
* Use reorder_images function for both masks and images
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* accept arbitrary kwargs
* move user commands to a separate fn
* work with generation config files
* rm cmmt
* docs
* base generate flag doc section
* nits
* nits
* nits
* no <br>
* better basic args description
* initial design
* update all video processors
* add tests
* need to add qwen2-vl (not tested yet)
* add qwen2-vl in auto map
* fix copies
* isort
* resolve confilicts kinda
* nit:
* qwen2-vl is happy now
* qwen2-5 happy
* other models are happy
* fix copies
* fix tests
* add docs
* CI green now?
* add more tests
* even more changes + tests
* doc builder fail
* nit
* Update src/transformers/models/auto/processing_auto.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* small update
* imports correctly
* dump, otherwise this is getting unmanagebale T-T
* dump
* update
* another update
* update
* tests
* move
* modular
* docs
* test
* another update
* init
* remove flakiness in tests
* fixup
* clean up and remove commented lines
* docs
* skip this one!
* last fix after rebasing
* run fixup
* delete slow files
* remove unnecessary tests + clean up a bit
* small fixes
* fix tests
* more updates
* docs
* fix tests
* update
* style
* fix qwen2-5-vl
* fixup
* fixup
* unflatten batch when preparing
* dump, come back soon
* add docs and fix some tests
* how to guard this with new dummies?
* chat templates in qwen
* address some comments
* remove `Fast` suffix
* fixup
* oops should be imported from transforms
* typo in requires dummies
* new model added with video support
* fixup once more
* last fixup I hope
* revert image processor name + comments
* oh, this is why fetch test is failing
* fix tests
* fix more tests
* fixup
* add new models: internvl, smolvlm
* update docs
* imprt once
* fix failing tests
* do we need to guard it here again, why?
* new model was added, update it
* remove testcase from tester
* fix tests
* make style
* not related CI fail, lets' just fix here
* mark flaky for now, filas 15 out of 100
* style
* maybe we can do this way?
* don't download images in setup class
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>