* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Commit for new_gpt_model_card.
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* commit for new canine model card.
* Update docs/source/en/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* implemented suggestion by @stevhliu.
* Update canine.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Created model card for xlm-roberta-xl
* Update XLM-RoBERTa-XL model card with improved descriptions and usage examples
* Minor option labeling fix
* Added MaskedLM version of XLM RoBERTa XL to model card
* Added quantization example for XLM RoBERTa XL model card
* minor fixes to xlm roberta xl model card
* Minor fixes to mask format in xlm roberta xl model card
* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout
* Added CLI command example and quantization example for XLM RoBERTa model card.
* Minor change to transformers CLI and quantization example for XLM roberta model card
* Created model card for XLM model
* Revised model card structure and content of XLM model
* Update XLM model documentation with improved examples and code snippets for predicting <mask> tokens using Pipeline and AutoModel.
* add note on sigopt
* update
* Update docs/source/en/hpo_train.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix typo in LLaVa documentation
In exactly one section, LlavaImageProcessor was spelt wrongly as LLavaImageProcessor, which throws off copy-pasting the section.
* Fix LlavaImageProcessor url to make it valid (and copypaste-able)
Earlier, the URL contained the entire HF prefix. This commit removes that to ensure that the code block can be copied and run as is.
* added fast image processor for ZoeDepth and expanded tests accordingly
* added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now
* final edits for zoedept fast image processor
* final minor edit for zoedepth fast imate procesor
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Commit for new_gpt_model_card.
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* feat: add colqwen2 (wip)
* tests: fix test_attention_outputs
* tests: reduce hidden size to accelerate tests
* tests: fix `test_attention_outputs` 🥳
* fix: fix wrong parent class for `ColQwen2ForRetrievalOutput`
* fix: minor typing and style changes
* chore: run `make style`
* feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor`
* tests: tweak comments
* style: apply ruff formatter
* feat: move default values for `visual_prompt_prefix` and `query_prefix`
* docs: update ColQwen2 model card
* docs: tweak model cards
* docs: add required example config checkpoint
* tests: update expected scores in integration test
* docs: tweak quickstart snippets
* fix: address PR comments
* tests: fix colqwen2 tests + tweak comment in colpali test
* tests: unskip useful tests
* fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string
* fix: fix ColPali outputs when `return_dict == False`
* fix: fix issue with PaliGemma output not being a dict
* docs: set default dtype to bfloat16 in quickstart snippets
* fix: fix error when `return_dict=False` in ColPali and ColQwen2
* tests: fix special tokens not being replaced in input_ids
* style: fix lint
* fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json`
* fix: remove unused `padding_side` in ColQwen2 model
* docs: update ColQwen2's model doc
* fix: fix harcoded vlm backbone class in ColQwen2Config
* fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs
* docs: fix typo in model docstring
* docs: add illuin mention in model docs
* fix: let `padding_size` be handled by `tokenizer_config.json`
* docs: add colpali reference url in colqwen2's model doc
* docs: add Hf mention in model docs
* docs: add late interaction mention in model docs
* docs: tweak colqwen2 model doc
* docs: update reference checkpoint for ColPali to v1.3
* docs: simplify quickstart snippets
* docs: remove redundant `.eval()`
* refactor: use `can_return_tuple` decorator for ColPali and ColQwen2
* docs: fix copyright date
* docs: add missing copyright in tests
* fix: raise error when `initializer_range` is not in config
* docs: remove redundant `.eval()` in colpali doc
* fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute
See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config.
* fix: add missing `initializer_range` attribute in `ColQwen2Config`
* fix: use `get_text_config` in `resize_token_embeddings`
* update colwen2 with auto_docstring
* docs: fix wrong copyright year
* chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config`
* refactor: merge `inner_forward` into `forward`
* Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code
* protect torch import in modular to protect in processing
* protect torch import in modular to protect in processing
* tests: fix hf model path in ColQwen2 integration test
* docs: clarify `attn_implementation` and add comments
* docs: add fallback snippet for using offline PIL dummy images
* docs: temporarily revert attn_implementation to `None` while sdpa is not fixed
* docs: tweaks in colpali/colqwen2 quick start snippets
* fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model
* fix: add missing changes in modular file
* fix modeling tests
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* squash commits
* rename gpu
* rename accelerator
* change _toctree.yml
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: sdp <sdp@a4bf01943ff7.jf.intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update vit_mae.md
* badge float:right
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update model_doc/vit_mae.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated the Model docs - for the ALIGN model
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated align.md
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update align.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated OLMo2 model card
* added command line
* Add suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Added suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Indented code block as per suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* minor fixes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Modified BART documentation wrt to issue #36979.
* Modified BART documentation wrt to issue #36979.
* fixed a typo.
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* blank commit.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BigBird Model card as per #36979.
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update model page.
* update model page.
* Update docs/source/en/model_doc/mamba2.md
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* update the model page.
* update.
* Apply suggestions from code review
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Apply the suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add an quantization example and update the toctree.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* remove the additional comma
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update roformer model card
* fix example purpose description
* fix model description according to the comments
* revert changes for autodoc
* remove unneeded tags
* fix review issues
* fix hfoption
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(swinv2): Update SwinV2 model card to new standard format
* docs(swinv2): Apply review suggestions
Incorporates feedback from @stevhliu to:
- Enhance the introductory paragraph with more details about scaling and SimMIM.
- Generalize the tip from "image classification tasks" to "vision tasks".
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update BioGPT model card
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/biogpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* correction for CPU fallback
* added quantization code and method
* fixed transformers-cli call
---------
Co-authored-by: Aguedo <aguedo@fakeemail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Get parallel loader working. Include tests.
* Update the tests for parallel loading
* Rename env variables.
* Add docs for parallel model weight loading.
* Touch up parallel model loading docs.
* Touch up parallel model loading docs again.
* Edit comment in test_modeling_utils_parallel_loading.py
* Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py
* Correct times for parallelized loading, previous times were for a "hot" filesystem
* Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule.
* Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally.
* Fix style on model loading parallelism changes.
* Merge latest version of master's modeling_utils.
* Removed unused variable.
* Fix argument packing for the parallel loader.
* Fix state dict being undefined in the parallel model loader.
* Rename variables used in parallel model loading for clarity. Use get_module_from_name().
* Switch to the use of threads for parallel model loading.
* Update docs for parallel loading.
* Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting.
* Move parallelized shard loading into its own function.
* Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING.
* Update copyright to 2025 in readme for paralell model loading.
* Remove garbage collection line in load_shard_file, implicit garbage collection already occurs.
* Run formatter on modeling_utils.py
* Apply style fixes
* Delete tests/utils/test_modeling_utils_parallel_loading.py
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* starting attn refactor for encoder decoder models via bart (eager + sdpa)
* flash attention works, remove unnecessary code
* flex attention support for bart!, gotta check if the renaming is not too aggressive
* some comments
* skip flex grad test for standalone as done with the other test
* revert flex attn rename (for now), sdpa simplify, and todos
* more todos
* refactor mask creation for reuse
* modular attempt at biogpt
* first batch of other models
* fix attn dropout
* fix autoformer copies
* hubert
* another batch of models
* copies/style + last round of bart models --> whisper next?
* remove unnecessary _reshape function and remove copy to whisper
* add skip for decoder-only models out of enc-dec (same as in bart)
* bring back licences
* remove comment, added to pr read instead
* mostly docs
* disable sew flex attn as it's unclear attn mask for now
* oops
* test fixes for enc-dec
* torch fx fixes + try at flex attn
* skip on mbart
* some more fixes
* musicgen skip / delete old attn class logic + sdpa compose compile skip
* disable flex attn for musicgen, not worth the effort
* more fixes and style
* flex attention test for dropout and encoder decoder that dont have main input names
* informer fixes
* the weirdest thing I've encountered yet...
* style
* remove empty tensor attempt, found core root in previous commits
* disable time series due to tests being very text centric on inputs
* add speech to text to be ignoring the other attns, also due to tests
* update docs
* remaining issues resolved ?
* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D
* some models have not set the is_causal flag...
* change dtype in softmax tol old behaviour + some modular fixes
* I hate it but it is what it is
* fixes from main for bart
* forgot this one
* some model fixes
* style
* current status
* marian works now
* fixing some copies
* some copy fixes + time series x informer
* last models possibly and fixes on style/copies
* some post merge fixes
* more fixes
* make attention interface callable and move warnings there
* style lol
* add comment to "unsupported"
* remove callable interface and change interface warnings + some copies
* fix
* ternary is ugly af, make it simpler
* how did that happen
* fix flex attn test
* failing the test
* no more fallback! fixing copies next
* style + attn fixed
* fixing copies and mask creation
* wrong copy
* fixup tests and disable flex attn for now
* fixup last tests?
* docs(swin): Update Swin model card to standard format
* docs(swin): Refine link to Microsoft organization for Swin models
Apply suggestion from @stevhliu in PR #37628.
This change updates the link pointing to the official Microsoft Swin Transformer checkpoints on the Hugging Face Hub.
The link now directs users specifically to the Microsoft organization page, filtered for Swin models, providing a clearer and more canonical reference compared to the previous general search link.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(swin): Clarify padding description and link to backbone docs
Apply suggestion from @stevhliu in PR #37628.
This change introduces two improvements to the Swin model card:
1. Refines the wording describing how Swin handles input padding for better clarity.
2. Adds an internal documentation link to the general "backbones" page when discussing Swin's capability as a backbone model.
These updates enhance readability and improve navigation within the Transformers documentation.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(swin): Change Swin paper link to huggingface.co/papers as suggested
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update model card.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update quantization example.
* update example.
* update
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* _get_padding_size module
* do not patchify images when processing multi image
* modify llava onevision image processor fast
* tensor to list of tensors
* backward compat
* reuse pad_to_square in llave & some clarification
* add to doc
* fix: consider no image cases (text only or video)
* add integration test
* style & repo_consistency
* add seq_idx and fa kwargs
* update tests
* docs and grad ckpt support
* fmt
* better names
* test_raise_missing_padding_free_kwarg_errs
* + seq_idx in doc strings
* padding free training docs
* add link to pr plots
* raise err on attn_mask with padding free
* rm raising missing padding free err test
* BambaFlashAttentionKwargs
* run modular util for modular_granitemoehybrid.py
* mvp
* remove trust_remote_code
* generate_from_hub
* handle requirements; docs
* english
* doc PR suggestions
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changed remote code path to generate/generate.py
* model repo has custom generate -> override base generate
* check for proper inheritance
* some doc updates (missing: tag-related docs)
* update docs to model repo
* nit
* nit
* nits
* Update src/transformers/dynamic_module_utils.py
* Apply suggestions from code review
* Update docs/source/en/generation_strategies.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* trust remote code is required
* use new import utils for requirements version parsing
* use org examples
* add tests
* Apply suggestions from code review
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
* ascii file structure; tag instructions on readme.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
* init vilt image processor fast
* Refactor image processor tests to use loop for all processors
* Add ViltImageProcessorFast with PyTorch-based optimized image processing
* Change made automatically by make fixup command
* Change made automatically by make fix-copies command
* Fix type hints in ViltImageProcessorFast for Python compatibility
* Define constants for image resizing based on COCO dataset aspect ratio
* Add missing property initializations to ViltImageProcessorFast
* Extract resize logic into dedicated method in ViltImageProcessorFast
* Extract padding logic into dedicated method
* Implement shape-based image grouping for optimized processing in Vilt
* Update test suite to verify ViltImageProcessorFast attributes
* Move variable declarations to _preprocess method parameters
* Remove unused parameters
* Rename _resize method to resize to override existing function
* Remove whitespace
* Remove unnecessary type check and conversion for stacked_images
* Remove redundant loop and apply padding directly to stacked images
* Refactor pad function to return images and mask as tuple instead of dict
* Add tests comparing padding masks in slow and fast implementations
* Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations
* Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast
* Move docstrings of custom args to ViltFastImageProcessorKwargs
* Use reorder_images function for both masks and images
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* accept arbitrary kwargs
* move user commands to a separate fn
* work with generation config files
* rm cmmt
* docs
* base generate flag doc section
* nits
* nits
* nits
* no <br>
* better basic args description
* initial design
* update all video processors
* add tests
* need to add qwen2-vl (not tested yet)
* add qwen2-vl in auto map
* fix copies
* isort
* resolve confilicts kinda
* nit:
* qwen2-vl is happy now
* qwen2-5 happy
* other models are happy
* fix copies
* fix tests
* add docs
* CI green now?
* add more tests
* even more changes + tests
* doc builder fail
* nit
* Update src/transformers/models/auto/processing_auto.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* small update
* imports correctly
* dump, otherwise this is getting unmanagebale T-T
* dump
* update
* another update
* update
* tests
* move
* modular
* docs
* test
* another update
* init
* remove flakiness in tests
* fixup
* clean up and remove commented lines
* docs
* skip this one!
* last fix after rebasing
* run fixup
* delete slow files
* remove unnecessary tests + clean up a bit
* small fixes
* fix tests
* more updates
* docs
* fix tests
* update
* style
* fix qwen2-5-vl
* fixup
* fixup
* unflatten batch when preparing
* dump, come back soon
* add docs and fix some tests
* how to guard this with new dummies?
* chat templates in qwen
* address some comments
* remove `Fast` suffix
* fixup
* oops should be imported from transforms
* typo in requires dummies
* new model added with video support
* fixup once more
* last fixup I hope
* revert image processor name + comments
* oh, this is why fetch test is failing
* fix tests
* fix more tests
* fixup
* add new models: internvl, smolvlm
* update docs
* imprt once
* fix failing tests
* do we need to guard it here again, why?
* new model was added, update it
* remove testcase from tester
* fix tests
* make style
* not related CI fail, lets' just fix here
* mark flaky for now, filas 15 out of 100
* style
* maybe we can do this way?
* don't download images in setup class
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add fast image processor support for Swin2SR
* Add Swin2SR tests of fast image processing
* Update docs and remove unnecessary test func
* Fix docstring formatting
* Skip fast vs slow processing test
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* i guessreverted all CdGen classes
* style
* llava onevision
* fix copies
* fix some tests
* some more tests
* dump
* skip these
* nevermind, i am dumb
* revert fix not needed
* fixup
* fixup
* another fixup
* more fixup to make ci finally happy
* fixup after rebasing
* fix qwen tests
* add internVL + typos here and there
* image token index -> id
* style
* fix init weights
* revert blip-2 not supported
* address comments
* fix copies
* revert blip2 test file as well
* as discussed internally, revert back CdGen models
* fix some tests
* fix more tests for compile
* CI red
* fix copies
* enumerate explicitly allowed models
* address comments
* fix tests
* fixup
* style again
* add tests for new model class
* another fixup ( x _ x )
* [fixup] unused attributes can be removed post-deprecation
* aligning for vllm
* using input shape rather than attn outputs
* remove demo
* revert Conv1D
* style
* style
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix copies
* Apply suggestions from code review
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* adding docs about vllm
* chore: style
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Support `AOPerModuleConfig` and include_embedding
Summary:
This PR adds support per module configuration for torchao
Also added per module quantization examples:
1. Quantizing different layers with different quantization configs
2. Skip quantization for certain layers
Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding
python tests/quantization/torchao_integration/test_torchao.py -k test_per_module_config_skip
Reviewers:
Subscribers:
Tasks:
Tags:
* format
* format
* inlcude embedding remove input embedding from module not to convert
* more docs
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_torchao.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_torchao.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Enhance documentation to explain chat-based few-shot prompting
Updates the documentation on few-shot prompting to illustrate how to structure examples using the chat-based format for instruction-tuned models.
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix typos
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* copy the last changes from broken PR
* small format
* some fixes and refactoring after review
* format
* add config attr for loss
* some fixes and refactoring
* fix copies
* fix style
* add test for d-fine resnet
* fix decoder layer prop
* fix dummies
* format init
* remove extra print
* refactor modeling, move resnet into separate folder
* fix resnet config
* change resnet on hgnet_v2, add clamp into decoder
* fix init
* fix config doc
* fix init
* fix dummies
* fix config docs
* fix hgnet_v2 config typo
* format modular
* add image classification for hgnet, some refactoring
* format tests
* fix dummies
* fix init
* fix style
* fix init for hgnet v2
* fix index.md, add init rnage for hgnet
* fix conversion
* add missing attr to encoder
* add loss for d-fine, add additional output for rt-detr decoder
* tests and docs fixes
* fix rt_detr v2 conversion
* some fixes for loos and decoder output
* some fixes for loss
* small fix for converted modeling
* add n model config, some todo comments for modular
* convert script adjustments and fixes, small refact
* remove extra output for rt_detr
* make some outputs optionsl, fix conversion
* some posr merge fixes
* small fix
* last field fix
* fix not split for hgnet_v2
* disable parallelism test for hgnet_v2 image classification
* skip multi gpu for d-fine
* adjust after merge init
* remove extra comment
* fix repo name references
* small fixes for tests
* Fix checkpoint path
* Fix consistency
* Fixing docs
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* added fast image processor for VitMatte including updated and new tests, fixed a bug in the slow image processor that processed images incorrectly for input format ChannelDimension.FIRST in which case the trimaps were not added in the correct dimension, this bug was also reflected in the tests through incorretly shaped trimaps being passed
* final edits for fast vitmatte image processor and tests
* final edits for fast vitmatte image processor and tests
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* added the configuartion for sam_hq
* added the modeelling for sam_hq
* added the sam hq mask decoder with hq features
* added the code for the samhq
* added the code for the samhq
* added the code for the samhq
* Delete src/transformers/models/sam_hq/modelling_sam_hq.py
* added the code for the samhq
* added the code for the samhq
* added the chnages for the modeelling
* added the code for sam hq for image processing
* added code for the sam hq model
* added the required changes
* added the changes
* added the key mappings for the sam hq
* adding the working code of samhq
* added the required files
* adding the pt object
* added the push to hub account
* added the args for the sam maks decoder
* added the args for the sam hq vision config
* aded the some more documentation
* removed the unecessary spaces
* all required chnages
* removed the image processor
* added the required file
* added the changes for the checkcopies
* added the code for modular file
* added the changes for the __init file
* added the code for the interm embeds
* added the code for sam hq
* added the changes for modular file
* added the test file
* added the changes required
* added the changes required
* added the code for the
* added the cl errors
* added the changes
* added the required changes
* added the some code
* added the code for the removing image processor
* added the test dimensins
* added the code for the removing extra used variables
* added the code for modeluar file hf_mlp for a better name
* removed abbrevaation in core functionality
* removed abbrevaation in core functionality
* .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory
* added the code which is after make fixup
* added some test for the intermediate embeddings test
* added the code for the torch support in sam hq
* added the code for the updated modular file
* added the changes for documentations as mentioned
* removed the heading
* add the changes for the code
* first mentioned issue resolved
* added the changes code to processor
* added the easy loading to init file
* added the changes to code
* added the code to changes
* added the code to work
* added the code for sam hq
* added the code for sam hq
* added the code for the point pad value
* added the small test for the image embeddings and intermediate embedding
* added the code
* added the code
* added the code for the tests
* added the code
* added ythe code for the processor file
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code for tests and some checks
* added some code
* added the code
* added the code
* added some code
* added some code
* added the changes for required
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added some changes
* added some changes
* removed spaces and quality checks
* added some code
* added some code
* added some code
* added code quality checks
* added the checks for quality checks
* addded some code which fixes test_inference_mask_generation_no_point
* added code for the test_inference_mask_generation_one_point_one_bb
* added code for the test_inference_mask_generation_one_point_one_bb_zero
* added code for the test_inference_mask_generation_one_box
* added some code in modelling for testing
* added some code which sort maks with high score
* added some code
* added some code
* added some code for the move KEYS_TO_MODIFY_MAPPING
* added some code for the unsqueeze removal
* added some code for the unsqueeze removal
* added some code
* added some code
* add some code
* added some code
* added some code
* added some testign values changed
* added changes to code in sam hq for readbility purpose
* added pre commit checks
* added the fix samvisionmodel for compatibilty
* added the changes made on sam by cyyever
* fixed the tests for samhq
* added some the code
* added some code related to init file issue during merge conflicts
* remobved the merge conflicts
* added changes mentioned by aruther and mobap
* added changes mentioned by aruther and mobap
* solving quality checks
* added the changes for input clearly
* added the changes
* added changes in mask generation file rgearding model inputs and sam hq quargs in processor file
* added changes in processor file
* added the Setup -> setupclass conversion
* added the code mentioned for processor
* added changes for the code
* added some code
* added some code
* added some code
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* update siglip2 model card
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* address comments
* separate naflex and fixres variant
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>