* enable misc test cases on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* tweak bamba ground truth on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* remove print
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Unbreak optimum-executorch
* use static cache if has layer_types but no sliding_window
* revert view on kv_arange
---------
Co-authored-by: Guang Yang <guangyang@fb.com>
* remove it from all py files
* remove it from the doc
* remove it from examples
* style
* remove traces of _fast_init
* Update test_peft_integration.py
* CIs
Support tensor-valued _extra_state values
TransformerEngine uses the pytorch get/set_extra_state API to store FP8
layer config information as bytes Tensor in the _extra_state entry in
the state dict. With recent changes to from_pretrained, this
functionality has broken and loading a model that uses this API doesn't
appear to work. This PR fixes the save/load pretrained functions for
extra state entries that use a pytorch tensor, and adds a (currently
x-failing) test for a dictionary extra state.
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* use device agnostic APIs in test cases
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* xpu now supports integer device id, aligning to CUDA behaviors
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update to use device_properties
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update comment
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Get parallel loader working. Include tests.
* Update the tests for parallel loading
* Rename env variables.
* Add docs for parallel model weight loading.
* Touch up parallel model loading docs.
* Touch up parallel model loading docs again.
* Edit comment in test_modeling_utils_parallel_loading.py
* Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py
* Correct times for parallelized loading, previous times were for a "hot" filesystem
* Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule.
* Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally.
* Fix style on model loading parallelism changes.
* Merge latest version of master's modeling_utils.
* Removed unused variable.
* Fix argument packing for the parallel loader.
* Fix state dict being undefined in the parallel model loader.
* Rename variables used in parallel model loading for clarity. Use get_module_from_name().
* Switch to the use of threads for parallel model loading.
* Update docs for parallel loading.
* Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting.
* Move parallelized shard loading into its own function.
* Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING.
* Update copyright to 2025 in readme for paralell model loading.
* Remove garbage collection line in load_shard_file, implicit garbage collection already occurs.
* Run formatter on modeling_utils.py
* Apply style fixes
* Delete tests/utils/test_modeling_utils_parallel_loading.py
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* use device agnostic APIs in tests
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* more
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add reset_peak_memory_stats API
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* update
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* initial design
* update all video processors
* add tests
* need to add qwen2-vl (not tested yet)
* add qwen2-vl in auto map
* fix copies
* isort
* resolve confilicts kinda
* nit:
* qwen2-vl is happy now
* qwen2-5 happy
* other models are happy
* fix copies
* fix tests
* add docs
* CI green now?
* add more tests
* even more changes + tests
* doc builder fail
* nit
* Update src/transformers/models/auto/processing_auto.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* small update
* imports correctly
* dump, otherwise this is getting unmanagebale T-T
* dump
* update
* another update
* update
* tests
* move
* modular
* docs
* test
* another update
* init
* remove flakiness in tests
* fixup
* clean up and remove commented lines
* docs
* skip this one!
* last fix after rebasing
* run fixup
* delete slow files
* remove unnecessary tests + clean up a bit
* small fixes
* fix tests
* more updates
* docs
* fix tests
* update
* style
* fix qwen2-5-vl
* fixup
* fixup
* unflatten batch when preparing
* dump, come back soon
* add docs and fix some tests
* how to guard this with new dummies?
* chat templates in qwen
* address some comments
* remove `Fast` suffix
* fixup
* oops should be imported from transforms
* typo in requires dummies
* new model added with video support
* fixup once more
* last fixup I hope
* revert image processor name + comments
* oh, this is why fetch test is failing
* fix tests
* fix more tests
* fixup
* add new models: internvl, smolvlm
* update docs
* imprt once
* fix failing tests
* do we need to guard it here again, why?
* new model was added, update it
* remove testcase from tester
* fix tests
* make style
* not related CI fail, lets' just fix here
* mark flaky for now, filas 15 out of 100
* style
* maybe we can do this way?
* don't download images in setup class
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* switch to use Expectations
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* extract gen bits from architecture and use it
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* add cross refererence
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Optimize to_py_obj for python-native numeric lists and scalars
* Fix bug that tuple is not converted to list
* Try np.array for more robust type checking
* Apply review and add tests for to_py_obj