Poedator
|
d78e78a0e4
|
HfQuantizer class for quantization-related stuff in modeling_utils.py (#26610)
* squashed earlier commits for easier rebase
* rm rebase leftovers
* 4bit save enabled @quantizers
* TMP gptq test use exllama
* fix AwqConfigTest::test_wrong_backend for A100
* quantizers AWQ fixes
* _load_pretrained_model low_cpu_mem_usage branch
* quantizers style
* remove require_low_cpu_mem_usage attr
* rm dtype arg from process_model_before_weight_loading
* rm config_origin from Q-config
* rm inspect from q_config
* fixed docstrings in QuantizationConfigParser
* logger.warning fix
* mv is_loaded_in_4(8)bit to BnbHFQuantizer
* is_accelerate_available error msg fix in quantizer
* split is_model_trainable in bnb quantizer class
* rm llm_int8_skip_modules as separate var in Q
* Q rm todo
* fwd ref to HFQuantizer in type hint
* rm note re optimum.gptq.GPTQQuantizer
* quantization_config in __init__ simplified
* replaced NonImplemented with create_quantized_param
* rm load_in_4/8_bit deprecation warning
* QuantizationConfigParser refactoring
* awq-related minor changes
* awq-related changes
* awq config.modules_to_not_convert
* raise error if no q-method in q-config in args
* minor cleanup
* awq quantizer docstring
* combine common parts in bnb process_model_before_weight_loading
* revert test_gptq
* .process_model_ cleanup
* restore dict config warning
* removed typevars in quantizers.py
* cleanup post-rebase 16 jan
* QuantizationConfigParser classmethod refactor
* rework of handling of unexpected aux elements of bnb weights
* moved q-related stuff from save_pretrained to quantizers
* refactor v1
* more changes
* fix some tests
* remove it from main init
* ooops
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix awq issues
* fix
* fix
* fix
* fix
* fix
* fix
* add docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/hf_quantizer.md
* address comments
* fix
* fixup
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address final comment
* update
* Update src/transformers/quantizers/base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/quantizers/auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* add kwargs update
* fixup
* add `optimum_quantizer` attribute
* oops
* rm unneeded file
* fix doctests
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
2024-01-30 02:48:25 +01:00 |
|