Yao Matrix
33f6c5a5c8
enable several cases on XPU ( #37516 )
...
* enable several cases on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* Update tests/test_modeling_common.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 11:01:04 +02:00
cyyever
1e6b546ea6
Use Python 3.9 syntax in tests ( #37343 )
...
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-08 14:12:08 +02:00
Fanli Lin
475664e2c6
[tests] remove cuda-only test marker in AwqConfigTest
( #37032 )
...
* enable on xpu
* add xpu support
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-31 11:53:02 +02:00
jiqing-feng
27361bd218
fix xpu tests ( #36656 )
...
* fix awq xpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix llava next video bnb tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-17 15:57:49 +01:00
Fanli Lin
c3700b0eee
[tests] enable autoawq tests on XPU ( #36327 )
...
add autoawq
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 13:38:09 +01:00
jiqing-feng
b916efcb3c
Enables CPU AWQ model with IPEX version. ( #33460 )
...
* enable cpu awq ipex linear
* add doc for cpu awq with ipex kernel
* add tests for cpu awq
* fix code style
* fix doc and tests
* Update docs/source/en/quantization/awq.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/autoawq/test_awq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix comments
* fix log
* fix log
* fix style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-10-04 16:25:10 +02:00
amyeroberts
1de7dc7403
Skip tests properly ( #31308 )
...
* Skip tests properly
* [test_all]
* Add 'reason' as kwarg for skipTest
* [test_all] Fix up
* [test_all]
2024-06-26 21:59:08 +01:00
Marc Sun
ae87f9797b
FIX / TST: Fix expected results on Mistral AWQ test ( #30971 )
...
fix awq mistral test
2024-05-24 14:06:31 +02:00
Marc Sun
de6e0db184
[awq] replace scale when we have GELU ( #30074 )
...
* fix awq test
* style
* add log
* new fix
* style
* only modifying impacted model in the end
* rename function
2024-05-13 11:41:03 +02:00
Younes Belkada
080b700805
FIX / AWQ: Fix failing exllama test ( #30288 )
...
fix filing exllama test
2024-04-17 11:26:35 +02:00
Marc Sun
58a939c6b7
Fix quantization tests ( #29914 )
...
* revert back to torch 2.1.1
* run test
* switch to torch 2.2.1
* udapte dockerfile
* fix awq tests
* fix test
* run quanto tests
* update tests
* split quantization tests
* fix
* fix again
* final fix
* fix report artifact
* build docker again
* Revert "build docker again"
This reverts commit 399a5f9d93
.
* debug
* revert
* style
* new notification system
* testing notfication
* rebuild docker
* fix_prev_ci_results
* typo
* remove warning
* fix typo
* fix artifact name
* debug
* issue fixed
* debug again
* fix
* fix time
* test notif with faling test
* typo
* issues again
* final fix ?
* run all quantization tests again
* remove name to clear space
* revert modfiication done on workflow
* fix
* build docker
* build only quant docker
* fix quantization ci
* fix
* fix report
* better quantization_matrix
* add print
* revert to the basic one
2024-04-09 17:10:29 +02:00
Ilyas Moutawwakil
4fc708f98c
Exllama kernels support for AWQ models ( #28634 )
...
* added exllama kernels support for awq models
* doc
* style
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* refactor
* moved exllama post init to after device dispatching
* bump autoawq version
* added exllama test
* style
* configurable exllama kernels
* copy exllama_config from gptq
* moved exllama version check to post init
* moved to quantization dockerfile
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-03-05 03:22:48 +01:00
Poedator
d78e78a0e4
HfQuantizer
class for quantization-related stuff in modeling_utils.py
(#26610 )
...
* squashed earlier commits for easier rebase
* rm rebase leftovers
* 4bit save enabled @quantizers
* TMP gptq test use exllama
* fix AwqConfigTest::test_wrong_backend for A100
* quantizers AWQ fixes
* _load_pretrained_model low_cpu_mem_usage branch
* quantizers style
* remove require_low_cpu_mem_usage attr
* rm dtype arg from process_model_before_weight_loading
* rm config_origin from Q-config
* rm inspect from q_config
* fixed docstrings in QuantizationConfigParser
* logger.warning fix
* mv is_loaded_in_4(8)bit to BnbHFQuantizer
* is_accelerate_available error msg fix in quantizer
* split is_model_trainable in bnb quantizer class
* rm llm_int8_skip_modules as separate var in Q
* Q rm todo
* fwd ref to HFQuantizer in type hint
* rm note re optimum.gptq.GPTQQuantizer
* quantization_config in __init__ simplified
* replaced NonImplemented with create_quantized_param
* rm load_in_4/8_bit deprecation warning
* QuantizationConfigParser refactoring
* awq-related minor changes
* awq-related changes
* awq config.modules_to_not_convert
* raise error if no q-method in q-config in args
* minor cleanup
* awq quantizer docstring
* combine common parts in bnb process_model_before_weight_loading
* revert test_gptq
* .process_model_ cleanup
* restore dict config warning
* removed typevars in quantizers.py
* cleanup post-rebase 16 jan
* QuantizationConfigParser classmethod refactor
* rework of handling of unexpected aux elements of bnb weights
* moved q-related stuff from save_pretrained to quantizers
* refactor v1
* more changes
* fix some tests
* remove it from main init
* ooops
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix awq issues
* fix
* fix
* fix
* fix
* fix
* fix
* add docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/hf_quantizer.md
* address comments
* fix
* fixup
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address final comment
* update
* Update src/transformers/quantizers/base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/quantizers/auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* add kwargs update
* fixup
* add `optimum_quantizer` attribute
* oops
* rm unneeded file
* fix doctests
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-30 02:48:25 +01:00
Younes Belkada
266c67b06a
[Mixtral
/ Awq
] Add mixtral fused modules for Awq ( #28240 )
...
* add mixtral fused modules
* add changes from modeling utils
* add test
* fix test + rope theta issue
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add tests
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-12 14:29:35 +01:00
Younes Belkada
07bdbebb48
[Awq
] Add llava fused modules support ( #28239 )
...
* add llava + fused modules
* Update src/transformers/models/llava/modeling_llava.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-12 06:55:54 +01:00
Younes Belkada
fa21ead73d
[Awq
] Enable the possibility to skip quantization for some target modules ( #27950 )
...
* v1
* add docstring
* add tests
* add awq 0.1.8
* oops
* fix test
2023-12-25 11:06:56 +01:00
Younes Belkada
fdb85be40f
Faster generation using AWQ + Fused modules ( #27411 )
...
* v1 fusing modules
* add fused mlp support
* up
* fix CI
* block save_pretrained
* fixup
* small fix
* add new condition
* add v1 docs
* add some comments
* style
* fix nit
* adapt from suggestion
* add check
* change arg names
* change variables name
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* style
* split up into 3 different private methods
* more conditions
* more checks
* add fused tests for custom models
* fix
* fix tests
* final update docs
* final fixes
* fix importlib metadata
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change it to `do_fuse`
* nit
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* few fixes
* revert
* fix test
* fix copies
* raise error if model is not quantized
* add test
* use quantization_config.config when fusing
* Update src/transformers/modeling_utils.py
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2023-12-05 12:14:45 +01:00
Younes Belkada
7b139023c3
[AWQ
] Addresses TODO for awq tests ( #27467 )
...
addresses todo for awq tests
2023-11-13 18:18:41 +01:00
Younes Belkada
fd685cfd59
[Quantization
] Add str to enum conversion for AWQ ( #27320 )
...
* add str to enum conversion
* fixup
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-10 13:45:00 +01:00
Younes Belkada
ae093eef01
[core
/ Quantization
] AWQ integration ( #27045 )
...
* working v1
* oops
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fixup
* oops
* push
* more changes
* add docs
* some fixes
* fix copies
* add v1 doc
* added installation guide
* relax constraints
* revert
* attempt llm-awq
* oops
* oops
* fixup
* raise error when incorrect cuda compute capability
* nit
* add instructions for llm-awq
* fixup
* fix copies
* fixup and docs
* change
* few changes + add demo
* add v1 tests
* add autoawq in dockerfile
* finalize
* Update tests/quantization/autoawq/test_awq.py
* fix test
* fix
* fix issue
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add link to example script
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add more content
* add more details
* add link to quantization docs
* camel case + change backend class name
* change to string
* fixup
* raise errors if libs not installed
* change to `bits` and `group_size`
* nit
* nit
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* disable training
* address some comments and fix nits
* fix
* final nits and fix tests
* adapt to our new runners
* make fix-copies
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* move to top
* add conversion test
* final nit
* add more elaborated test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 09:06:31 +01:00