Driss Guessous
279000bb70
Name change AOPermod -> ModuleFqn ( #38456 )
...
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-03 15:43:31 +00:00
Yao Matrix
fb82a98717
enable large_gpu and torchao cases on XPU ( #38355 )
...
* cohere2 done
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* enable torchao cases on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* rename
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix comments
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
2025-05-28 10:30:16 +02:00
Yao Matrix
a5a0c7b888
switch to device agnostic device calling for test cases ( #38247 )
...
* use device agnostic APIs in test cases
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* xpu now supports integer device id, aligning to CUDA behaviors
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update to use device_properties
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update comment
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 10:18:53 +02:00
Mohamed Mekkouri
9a962dd9ed
Add tearDown method to Quark to solve OOM issues ( #38234 )
...
fix
2025-05-21 14:26:44 +02:00
Titus
f022bf9322
Remove trust_remote_code=True tests from bnb quantization tests (MPT now integrated) ( #38206 )
...
bnb quant tests: remove obsolete trust_remote_code test
The MPT model is now natively integrated in Transformers and no longer requires trust_remote_code=True. This removes the failing test_get_keys_to_not_convert_trust_remote_code and related usage, which depended on remote code and caused CI issues due to missing dependencies (e.g., triton_pre_mlir).
2025-05-20 11:43:11 +02:00
Yao Matrix
7f28da2850
clean autoawq cases on xpu ( #38163 )
...
* clean autoawq cases on xpu
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 13:56:43 +02:00
Jerry Zhang
44fa04ae8d
Include output embedding as well with include_embedding
flag ( #37935 )
...
* Include output embedding as well with `include_embedding` flag
Summary:
att
Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding
Reviewers:
Subscribers:
Tasks:
Tags:
* format
* rename include_embedding to include_input_output_embeddings
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-16 12:06:11 +02:00
Yao Matrix
34c1e29cdd
enable autoround cases on XPU ( #38167 )
...
* enable autoround cases on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 09:08:35 +00:00
Yao Matrix
9b5ce556aa
enable finegrained_fp8 and granite_speech cases on XPU ( #38036 )
...
* enable finegrained_fp8 cases on XPU
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* change back to auto
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* rename per comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-05-14 08:58:40 +00:00
jiqing-feng
d231f5a7d4
update bnb tests ( #38011 )
...
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-05-08 20:35:24 +00:00
Jerry Zhang
86777b5e2f
Support AOPerModuleConfig
and include_embedding
( #37802 )
...
* Support `AOPerModuleConfig` and include_embedding
Summary:
This PR adds support per module configuration for torchao
Also added per module quantization examples:
1. Quantizing different layers with different quantization configs
2. Skip quantization for certain layers
Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding
python tests/quantization/torchao_integration/test_torchao.py -k test_per_module_config_skip
Reviewers:
Subscribers:
Tasks:
Tags:
* format
* format
* inlcude embedding remove input embedding from module not to convert
* more docs
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_torchao.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_torchao.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-30 20:16:29 +02:00
Mohamed Mekkouri
b262680af4
Add Bitnet model ( #37742 )
...
* Adding BitNet b1.58 Model
* Add testing code for BitNet
* Fix format issues
* Fix docstring format issues
* Fix docstring
* Fix docstring
* Fix: weight back to uint8
* Fix
* Fix format issues
* Remove copy comments
* Add model link to the docstring
* Fix: set tie_word_embeddings default to false
* Update
* Generate modeling file
* Change config name for automatically generating modeling file.
* Generate modeling file
* Fix class name
* Change testing branch
* Remove unused param
* Fix config docstring
* Add docstring for BitNetQuantConfig.
* Fix docstring
* Update docs/source/en/model_doc/bitnet.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update docs/source/en/model_doc/bitnet.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update bitnet config
* Update explanation between online and offline mode
* Remove space
* revert changes
* more revert
* spaces
* update
* fix-copies
* doc fix
* fix minor nits
* empty
* small nit
* empty
---------
Co-authored-by: Shuming Ma <shumingma@pku.edu.cn>
Co-authored-by: shumingma <shmingm@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 15:08:46 +02:00
co63oc
d5fa7d2d19
Fix typos in strings and comments ( #37799 )
2025-04-28 11:39:11 +01:00
Mohamed Mekkouri
38c406844e
Fixing quantization tests ( #37650 )
...
* fix
* style
* add capability check
2025-04-22 13:59:57 +02:00
Wenhua Cheng
b3492ff9f7
Add AutoRound quantization support ( #37393 )
...
* add auto-round support
* Update src/transformers/quantizers/auto.py
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
* fix style issue
Signed-off-by: wenhuach <wenhuach87@gmail.com>
* tiny change
* tiny change
* refine ut and doc
* revert unnecessary change
* tiny change
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* fix doc issue
* Update tests/quantization/autoround/test_auto_round.py
* fix comments
* Update tests/quantization/autoround/test_auto_round.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/autoround/test_auto_round.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* update doc
* Update src/transformers/quantizers/quantizer_auto_round.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* update
* update
* fix
* try to fix style issue
* Update src/transformers/quantizers/auto.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update docs/source/en/quantization/auto_round.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update docs/source/en/quantization/auto_round.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update docs/source/en/quantization/auto_round.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* update
* fix style issue
* update doc
* update doc
* Refine the doc
* refine doc
* revert one change
* set sym to True by default
* Enhance the unit test's robustness.
* update
* add torch dtype
* tiny change
* add awq convert test
* fix typo
* update
* fix packing format issue
* use one gpu
---------
Signed-off-by: wenhuach <wenhuach87@gmail.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Shen, Haihao <haihao.shen@intel.com>
2025-04-22 13:56:54 +02:00
Isotr0py
c69e23455d
Support loading Gemma3 QAT GGUF models ( #37649 )
...
* fix gemma3 qat gguf support
Signed-off-by: isotr0py <2037008807@qq.com>
* update test
Signed-off-by: isotr0py <2037008807@qq.com>
* make ruff happy
Signed-off-by: isotr0py <2037008807@qq.com>
---------
Signed-off-by: isotr0py <2037008807@qq.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-22 11:23:17 +02:00
Mohamed Mekkouri
bb2a44ad4b
Fix Quark quantization config ( #37578 )
...
fix
2025-04-18 07:23:39 +02:00
Mohamed Mekkouri
7752e7487c
Fixes hqq by following a new path for bias parameter in pre_quantized models ( #37530 )
...
* fix
* add test
2025-04-16 13:58:14 +02:00
Yao Matrix
33f6c5a5c8
enable several cases on XPU ( #37516 )
...
* enable several cases on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* Update tests/test_modeling_common.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 11:01:04 +02:00
Mohamed Mekkouri
d228f50acc
Fixing gated repo issues ( #37463 )
...
using unsloth model
2025-04-14 17:19:10 +02:00
Bowen Bao
6cef03ba66
[Regression] Fix Quark quantized model loading after refactorization ( #37407 )
2025-04-11 13:43:36 +02:00
Isotr0py
6daec12d0b
Add GGUF support to Gemma3 Text backbone ( #37424 )
...
* add gemma3 gguf support
Signed-off-by: Isotr0py <2037008807@qq.com>
* fix typo and add gguf limit
Signed-off-by: Isotr0py <2037008807@qq.com>
* fix a typo
Signed-off-by: Isotr0py <2037008807@qq.com>
* add vision conversion test
Signed-off-by: Isotr0py <2037008807@qq.com>
* fix typos
Signed-off-by: Isotr0py <2037008807@qq.com>
---------
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-10 17:15:43 +02:00
Mohamed Mekkouri
9c0c323e12
Fix require_read_token ( #37422 )
...
* nit
* fix
* fix
2025-04-10 17:01:40 +02:00
Mohamed Mekkouri
5ae9b2cac0
Quark Quantization gated repo ( #37412 )
...
* fix
* empty commit
* empty
* nit
* fix maybe ?
2025-04-10 14:57:15 +02:00
cyyever
1e6b546ea6
Use Python 3.9 syntax in tests ( #37343 )
...
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-08 14:12:08 +02:00
jiqing-feng
99f9f1042f
Fix torchao usage ( #37034 )
...
* fix load path
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix path
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Fix torchao usage
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert useless change
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert fp8 test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix fp8 test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix fp8 test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torch dtype
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-07 14:50:48 +02:00
Rahul Tuli
ebe47ce3e9
Fix: Unexpected Keys, Improve run_compressed
, Rename Test Folder ( #37077 )
2025-04-04 21:30:11 +02:00
Joao Gante
9a1c1fe7ed
[CI] green llama tests ( #37244 )
...
* green llama tests
* use cleanup instead
* better test comment; cleanup upgrade
* better test comment; cleanup upgrade
2025-04-03 14:15:53 +01:00
Jerry Zhang
a165458901
Add device workaround for int4 weight only quantization after API update ( #36980 )
...
* merge
* fix import
* format
* reformat
* reformat
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-02 12:42:22 +02:00
jiqing-feng
3a6ab46a0b
add gpt2 test on XPU ( #37028 )
...
* add gpt2 test on XPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* auto dtype has been fixed
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* convert model to train mode
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-01 11:09:29 +02:00
Fanli Lin
475664e2c6
[tests] remove cuda-only test marker in AwqConfigTest
( #37032 )
...
* enable on xpu
* add xpu support
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-31 11:53:02 +02:00
Mohamed Mekkouri
92429057d9
Skip FP8 linear tests For device capability < 9.0( #37008 )
...
* skip fp8 linear
* add capability check
* format
2025-03-27 12:38:37 +01:00
湛露先生
ebd2029483
Change GPUS to GPUs ( #36945 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 17:25:39 +01:00
omahs
cbf924b76c
Fix typos ( #36910 )
...
* fix typos
* fix typos
* fix typos
* fix typos
2025-03-24 14:08:29 +00:00
fxmarty-amd
1a374799ce
Support loading Quark quantized models in Transformers ( #36372 )
...
* add quark quantizer
* add quark doc
* clean up doc
* fix tests
* make style
* more style fixes
* cleanup imports
* cleaning
* precise install
* Update docs/source/en/quantization/quark.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/quark_integration/test_quark.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* remove import guard as suggested
* update copyright headers
* add quark to transformers-quantization-latest-gpu Dockerfile
* make tests pass on transformers main + quark==0.7
* add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-20 15:40:51 +01:00
mobicham
3e8f0fbf44
Fix hqq skipped modules and dynamic quant ( #36821 )
...
* Fix hqq skip_modules and dynamic_quant
* fix skipped modules loading
* add dynamic/skip HqqConfig test
2025-03-20 15:31:49 +01:00
Driss Guessous
e8d960329e
Add option for ao base configs ( #36526 )
2025-03-19 14:59:47 +01:00
Mohamed Mekkouri
a861db01e5
Fix Device map for bitsandbytes tests ( #36800 )
...
fix
2025-03-19 11:57:13 +01:00
Marc Sun
3017536ebf
fix hqq due to recent modeling changes ( #36771 )
...
* fix-hqq
* style
* test
2025-03-18 12:20:27 +01:00
Afanti
19b9d8ae13
chore: fix typos in tests directory ( #36785 )
...
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
2025-03-18 10:31:13 +01:00
jiqing-feng
27361bd218
fix xpu tests ( #36656 )
...
* fix awq xpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix llava next video bnb tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-17 15:57:49 +01:00
Marc Sun
9e94801146
enable/disable compile for quants methods ( #36519 )
...
* disable compile for most quants methods
* fix
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Update tests/quantization/bnb/test_mixed_int8.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* changes from joao suggestions
---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-17 11:38:21 +01:00
Mohamed Mekkouri
47cc4da351
Changing the test model in Quanto kv cache ( #36670 )
...
changing model
2025-03-13 12:23:34 +01:00
Mohamed Mekkouri
0013ba61e5
Fix Failing GPTQ tests ( #36666 )
...
fix tests
2025-03-12 20:03:02 +01:00
Mohamed Mekkouri
a7fbab33ae
Fix Expected output for compressed-tensors tests ( #36425 )
...
fix
2025-02-26 21:17:24 +01:00
Fanli Lin
c3700b0eee
[tests] enable autoawq tests on XPU ( #36327 )
...
add autoawq
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 13:38:09 +01:00
Dmitry Rogozhkin
b4b9da6d9b
tests: revert change of torch_require_multi_gpu to be device agnostic ( #35721 )
...
* tests: revert change of torch_require_multi_gpu to be device agnostic
The 11c27dd33
modified `torch_require_multi_gpu()` to be device agnostic
instead of being CUDA specific. This broke some tests which are rightfully
CUDA specific, such as:
* `tests/trainer/test_trainer_distributed.py::TestTrainerDistributed`
In the current Transformers tests architecture `require_torch_multi_accelerator()`
should be used to mark multi-GPU tests agnostic to device.
This change addresses the issue introduced by 11c27dd33
and reverts
modification of `torch_require_multi_gpu()`.
Fixes: 11c27dd33
("Enable BNB multi-backend support (#31098 )")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* fix bug: modification of frozen set
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 13:36:10 +01:00
jiqing-feng
9d6abf9778
enable torchao quantization on CPU ( #36146 )
...
* enable torchao quantization on CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix int4
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable CPU torchao tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix cuda tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix style
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix cuda tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torchao available
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torchao available
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torchao config cannot convert to json
* fix docs
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* rm to_dict to rebase
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* limited torchao version for CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Update src/transformers/testing_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix cpu test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-25 11:06:52 +01:00
Jerry Zhang
2af272c101
Add autoquant support for torchao quantizer ( #35503 )
...
* Add autoquant support for torchao quantizer
Summary:
att, also verified that autoquantized model can be saved and loaded:
save: https://gist.github.com/jerryzh168/01d367aaf44dbbbfd4068a4a10a00061
load: https://gist.github.com/jerryzh168/d5c6c401b2abdf18e0b6771341f1525c
Test Plan:
tested locally with above script
model uploaded to https://huggingface.co/jerryzh168/llama3-8b-autoquant
Reviewers:
Subscribers:
Tasks:
Tags:
* add test
* ruff fix
* ruff reformat
* add docs and min_sqnr support
* format
* format
* fix test
* update doc
* format
* remove disable_compile
* format
2025-02-24 15:54:16 +01:00
Rahul Tuli
884a8ea1f0
Improve model loading for compressed tensor models ( #36152 )
...
* Disable warnings for stacked compressors
* Introduce two new hooks in HfQuantizer lifecycle
to allow updates to missing and unexpected keys
* Update missing and unexpected keys
for stacked compressors
* Add tests
* Fix: run_compressed cases
* Fix: uncompressed cases
* Rename compressed_tensor folder to compressed_tensors
Move RunCompressedTest to the same file
Update tests to unittest
2025-02-24 13:47:21 +01:00