* added fast image processor for ZoeDepth and expanded tests accordingly
* added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now
* final edits for zoedept fast image processor
* final minor edit for zoedepth fast imate procesor
Fix "RuntimeError: Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cpu" error running the
following roformer tests on GPUs (CUDA or XPU):
```
tests/models/roformer/test_modeling_roformer.py::RoFormerSinusoidalPositionalEmbeddingTest::test_basic
tests/models/roformer/test_modeling_roformer.py::RoFormerSelfAttentionRotaryPositionEmbeddingTest::test_apply_rotary_position_embeddings
```
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Fix multiple devices error on Janus
* Fix AttributeError on Janus BOI token
* Initialize lm first in Janus to get correct device map
* Added expectations for Janus test_model_generate_images
* Fixed JanusVisionEncoderLayer being split across devices
* Code formatting
* Adding modeling file
* Reverted changes out of scope for this PR
* feat: add colqwen2 (wip)
* tests: fix test_attention_outputs
* tests: reduce hidden size to accelerate tests
* tests: fix `test_attention_outputs` 🥳
* fix: fix wrong parent class for `ColQwen2ForRetrievalOutput`
* fix: minor typing and style changes
* chore: run `make style`
* feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor`
* tests: tweak comments
* style: apply ruff formatter
* feat: move default values for `visual_prompt_prefix` and `query_prefix`
* docs: update ColQwen2 model card
* docs: tweak model cards
* docs: add required example config checkpoint
* tests: update expected scores in integration test
* docs: tweak quickstart snippets
* fix: address PR comments
* tests: fix colqwen2 tests + tweak comment in colpali test
* tests: unskip useful tests
* fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string
* fix: fix ColPali outputs when `return_dict == False`
* fix: fix issue with PaliGemma output not being a dict
* docs: set default dtype to bfloat16 in quickstart snippets
* fix: fix error when `return_dict=False` in ColPali and ColQwen2
* tests: fix special tokens not being replaced in input_ids
* style: fix lint
* fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json`
* fix: remove unused `padding_side` in ColQwen2 model
* docs: update ColQwen2's model doc
* fix: fix harcoded vlm backbone class in ColQwen2Config
* fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs
* docs: fix typo in model docstring
* docs: add illuin mention in model docs
* fix: let `padding_size` be handled by `tokenizer_config.json`
* docs: add colpali reference url in colqwen2's model doc
* docs: add Hf mention in model docs
* docs: add late interaction mention in model docs
* docs: tweak colqwen2 model doc
* docs: update reference checkpoint for ColPali to v1.3
* docs: simplify quickstart snippets
* docs: remove redundant `.eval()`
* refactor: use `can_return_tuple` decorator for ColPali and ColQwen2
* docs: fix copyright date
* docs: add missing copyright in tests
* fix: raise error when `initializer_range` is not in config
* docs: remove redundant `.eval()` in colpali doc
* fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute
See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config.
* fix: add missing `initializer_range` attribute in `ColQwen2Config`
* fix: use `get_text_config` in `resize_token_embeddings`
* update colwen2 with auto_docstring
* docs: fix wrong copyright year
* chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config`
* refactor: merge `inner_forward` into `forward`
* Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code
* protect torch import in modular to protect in processing
* protect torch import in modular to protect in processing
* tests: fix hf model path in ColQwen2 integration test
* docs: clarify `attn_implementation` and add comments
* docs: add fallback snippet for using offline PIL dummy images
* docs: temporarily revert attn_implementation to `None` while sdpa is not fixed
* docs: tweaks in colpali/colqwen2 quick start snippets
* fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model
* fix: add missing changes in modular file
* fix modeling tests
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* updates
* fixup
* fix tests
* fix test
* fix
* let it be here for now, till monday
* two more fixes
* persimmon
* fixup
* fix
* fixup
* make sure fuyu runs now that LM has new attn API
* fixup + tests
* qwen vl uses new mask interface as well
* qwen image features format
* update
* remove image_sizes
* address comments
* i am dumb...
Support tensor-valued _extra_state values
TransformerEngine uses the pytorch get/set_extra_state API to store FP8
layer config information as bytes Tensor in the _extra_state entry in
the state dict. With recent changes to from_pretrained, this
functionality has broken and loading a model that uses this API doesn't
appear to work. This PR fixes the save/load pretrained functions for
extra state entries that use a pytorch tensor, and adds a (currently
x-failing) test for a dictionary extra state.
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* start refactoring whisper
* revert for now
* first step
* carry over attn fixes
* check if this works
* whisper has an off by one somewhere - cutting mask in any interface
* make it based on interface
* remove some tests that were skipped but now work
* some fixes for whisper tests
* interface changes
* change the order of fix
* some attention adjustments for eager + TP
* fix scaling
* mask changes
* why does whisper contain those extra seq lens?
* fix from config for fa2 as input_ids is invalid
* fix another test
* another fix
* disable flex attn due to compile issues
* copies and refactor for qwen audio since it somewhat relies on whisper
* fix scaling and smaller things
* retrigger
* new new interface version + more fixups
* adjust qwen
* add comment
* forgot this one
* change copies as whisper cuts on the mask
* add guard
* add flex attention
* switch to new mask function + add skips for torchscript
* remove old api with cache position
* last changes?
* trigger ci
* Firstly: Better detection of when we're a custom class
* Trigger tests
* Let's break everything
* make fixup
* fix mistaken line doubling
* Let's try to get rid of it from config classes at least
* Let's try to get rid of it from config classes at least
* Fixup image processor
* no more circular import
* Let's go back to setting `_auto_class` again
* Let's go back to setting `_auto_class` again
* stash commit
* Revert the irrelevant changes until we figure out AutoConfig
* Change tests since we're breaking expectations
* make fixup
* do the same for all custom classes
* Cleanup for feature extractor tests
* Cleanup tokenization tests too
* typo
* Fix tokenizer tests
* make fixup
* fix image processor test
* make fixup
* Remove warning from register_for_auto_class
* Stop adding model info to auto map entirely
* Remove todo
* Remove the other todo
* Let's start slapping _auto_class on models why not
* Let's start slapping _auto_class on models why not
* Make sure the tests know what's up
* Make sure the tests know what's up
* Completely remove add_model_info_to_*
* Start adding _auto_class to models
* Start adding _auto_class to models
* Add a flaky decorator
* Add a flaky decorator and import
* stash commit
* More message cleanup
* make fixup
* fix indent
* Fix trust_remote_code prompts
* make fixup
* correct indentation
* Reincorporate changes into dynamic_module_utils
* Update call to trust_remote_code
* make fixup
* Fix video processors too
* Fix video processors too
* Remove is_flaky additions
* make fixup
* use device agnostic APIs in test cases
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* xpu now supports integer device id, aligning to CUDA behaviors
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update to use device_properties
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update comment
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* stash commit
* Experiment 1: Try just Gemma
* Experiment 1: Just try Gemma
* make fixup
* Trigger tests
* stash commit
* Try adding Gemma3 as well
* make fixup
* Correct attrib names
* Correct pipeline model mapping
* Add in all_model_classes for Gemma1 again
* Move the pipeline model mapping around again
* make fixup
* Revert Gemma3 changes since it's a VLM
* Let's try Falcon
* Correct attributes
* Correct attributes
* Let's try just overriding get_config() for now
* Do Nemotron too
* And Llama!
* Do llama/persimmon
* Correctly skip tests
* Fix Persimmon
* Include Phimoe
* Fix Gemma2
* Set model_tester_class correctly
* Add GLM
* More models!
* models models models
* make fixup
* Add Qwen3 + Qwen3MoE
* Correct import
* make fixup
* Add the QuestionAnswering classes
* Add the QuestionAnswering classes
* Move pipeline mapping to the right place
* Jetmoe too
* Stop RoPE testing models with no RoPE
* Fix up JetMOE a bit
* Fix up JetMOE a bit
* Can we just force pad_token_id all the time?
* make fixup
* fix starcoder2
* Move pipeline mapping
* Fix RoPE skipping
* Fix RecurrentGemma tests
* Fix Falcon tests
* Add MoE attributes
* Fix values for RoPE testing
* Make sure we set bos_token_id and eos_token_id in an appropriate range
* make fixup
* Fix GLM4
* Add mamba attributes
* Revert bits of JetMOE
* Re-add the JetMOE skips
* Update tests/causal_lm_tester.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add licence
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Get parallel loader working. Include tests.
* Update the tests for parallel loading
* Rename env variables.
* Add docs for parallel model weight loading.
* Touch up parallel model loading docs.
* Touch up parallel model loading docs again.
* Edit comment in test_modeling_utils_parallel_loading.py
* Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py
* Correct times for parallelized loading, previous times were for a "hot" filesystem
* Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule.
* Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally.
* Fix style on model loading parallelism changes.
* Merge latest version of master's modeling_utils.
* Removed unused variable.
* Fix argument packing for the parallel loader.
* Fix state dict being undefined in the parallel model loader.
* Rename variables used in parallel model loading for clarity. Use get_module_from_name().
* Switch to the use of threads for parallel model loading.
* Update docs for parallel loading.
* Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting.
* Move parallelized shard loading into its own function.
* Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING.
* Update copyright to 2025 in readme for paralell model loading.
* Remove garbage collection line in load_shard_file, implicit garbage collection already occurs.
* Run formatter on modeling_utils.py
* Apply style fixes
* Delete tests/utils/test_modeling_utils_parallel_loading.py
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>