* add note on sigopt
* update
* Update docs/source/en/hpo_train.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix typo in LLaVa documentation
In exactly one section, LlavaImageProcessor was spelt wrongly as LLavaImageProcessor, which throws off copy-pasting the section.
* Fix LlavaImageProcessor url to make it valid (and copypaste-able)
Earlier, the URL contained the entire HF prefix. This commit removes that to ensure that the code block can be copied and run as is.
* mlm_probability in DataCollatorForLanguageModeling should be validated only when mlm is True (#38522)
* Change mlm_probability to Optional in DataCollatorForLanguageModeling (#38537)
---------
Co-authored-by: eak <eak@ivalua.com>
* added fast image processor for ZoeDepth and expanded tests accordingly
* added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now
* final edits for zoedept fast image processor
* final minor edit for zoedepth fast imate procesor
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Commit for new_gpt_model_card.
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt_neo.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Fix "RuntimeError: Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cpu" error running the
following roformer tests on GPUs (CUDA or XPU):
```
tests/models/roformer/test_modeling_roformer.py::RoFormerSinusoidalPositionalEmbeddingTest::test_basic
tests/models/roformer/test_modeling_roformer.py::RoFormerSelfAttentionRotaryPositionEmbeddingTest::test_apply_rotary_position_embeddings
```
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Fix: resolve import order and duplicate import (ruff I001, F811)
* Format: clean up Dinov2 test file with ruff formatter
* Add _no_split_modules = ['Dinov2Layer'] to enable device_map='auto'
* Revert dinov2_with_registers _no_split_modules to original state
* Remove redundant device_map test as suggested
* Remove unused import after deleting test
* removed import torch and the redundant test function
* Update tests/models/dinov2/test_modeling_dinov2.py
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Fix multiple devices error on Janus
* Fix AttributeError on Janus BOI token
* Initialize lm first in Janus to get correct device map
* Added expectations for Janus test_model_generate_images
* Fixed JanusVisionEncoderLayer being split across devices
* Code formatting
* Adding modeling file
* Reverted changes out of scope for this PR
* feat: add colqwen2 (wip)
* tests: fix test_attention_outputs
* tests: reduce hidden size to accelerate tests
* tests: fix `test_attention_outputs` 🥳
* fix: fix wrong parent class for `ColQwen2ForRetrievalOutput`
* fix: minor typing and style changes
* chore: run `make style`
* feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor`
* tests: tweak comments
* style: apply ruff formatter
* feat: move default values for `visual_prompt_prefix` and `query_prefix`
* docs: update ColQwen2 model card
* docs: tweak model cards
* docs: add required example config checkpoint
* tests: update expected scores in integration test
* docs: tweak quickstart snippets
* fix: address PR comments
* tests: fix colqwen2 tests + tweak comment in colpali test
* tests: unskip useful tests
* fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string
* fix: fix ColPali outputs when `return_dict == False`
* fix: fix issue with PaliGemma output not being a dict
* docs: set default dtype to bfloat16 in quickstart snippets
* fix: fix error when `return_dict=False` in ColPali and ColQwen2
* tests: fix special tokens not being replaced in input_ids
* style: fix lint
* fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json`
* fix: remove unused `padding_side` in ColQwen2 model
* docs: update ColQwen2's model doc
* fix: fix harcoded vlm backbone class in ColQwen2Config
* fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs
* docs: fix typo in model docstring
* docs: add illuin mention in model docs
* fix: let `padding_size` be handled by `tokenizer_config.json`
* docs: add colpali reference url in colqwen2's model doc
* docs: add Hf mention in model docs
* docs: add late interaction mention in model docs
* docs: tweak colqwen2 model doc
* docs: update reference checkpoint for ColPali to v1.3
* docs: simplify quickstart snippets
* docs: remove redundant `.eval()`
* refactor: use `can_return_tuple` decorator for ColPali and ColQwen2
* docs: fix copyright date
* docs: add missing copyright in tests
* fix: raise error when `initializer_range` is not in config
* docs: remove redundant `.eval()` in colpali doc
* fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute
See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config.
* fix: add missing `initializer_range` attribute in `ColQwen2Config`
* fix: use `get_text_config` in `resize_token_embeddings`
* update colwen2 with auto_docstring
* docs: fix wrong copyright year
* chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config`
* refactor: merge `inner_forward` into `forward`
* Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code
* protect torch import in modular to protect in processing
* protect torch import in modular to protect in processing
* tests: fix hf model path in ColQwen2 integration test
* docs: clarify `attn_implementation` and add comments
* docs: add fallback snippet for using offline PIL dummy images
* docs: temporarily revert attn_implementation to `None` while sdpa is not fixed
* docs: tweaks in colpali/colqwen2 quick start snippets
* fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model
* fix: add missing changes in modular file
* fix modeling tests
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Update Loss Functions to Accept Tensor num_items_in_batch
* Fix device mismatch by moving num_items_in_batch to loss device in fixed_cross_entropy
* fix the ruff check
* delete the unused if stat
* fix the type problem