transformers/docs/source/en/quantization
Benjamin Fineran 574a9e12bb
HFQuantizer implementation for compressed-tensors library (#31704)
* Add compressed-tensors HFQuantizer implementation

* flag serializable as False

* run

* revive lines deleted by ruff

* fixes to load+save from sparseml, edit config to quantization_config, and load back

* address satrat comment

* compressed_tensors to compressed-tensors and revert back is_serializable

* rename quant_method from sparseml to compressed-tensors

* tests

* edit tests

* clean up tests

* make style

* cleanup

* cleanup

* add test skip for when compressed tensors is not installed

* remove pydantic import + style

* delay torch import in test

* initial docs

* update main init for compressed tensors config

* make fix-copies

* docstring

* remove fill_docstring

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* review comments

* review comments

* comments - suppress warnings on state dict load, tests, fixes

* bug-fix - remove unnecessary call to apply quant lifecycle

* run_compressed compatability

* revert changes not needed for compression

* no longer need unexpected keys fn

* unexpected keys not needed either

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* add to_diff_dict

* update docs and expand testing

* Update _toctree.yml with compressed-tensors

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update doc

* add note about saving a loaded model

---------

Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
2024-09-25 14:31:38 +02:00
..
aqlm.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
awq.md docs: fix broken link (#31370) 2024-06-12 11:33:00 +01:00
bitsandbytes.md Enable BNB multi-backend support (#31098) 2024-09-24 03:40:56 -06:00
compressed_tensors.md HFQuantizer implementation for compressed-tensors library (#31704) 2024-09-25 14:31:38 +02:00
contribute.md Docs / Quantization: refactor quantization documentation (#30942) 2024-05-23 14:31:52 +02:00
eetq.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
fbgemm_fp8.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
gptq.md Docs / Quantization: refactor quantization documentation (#30942) 2024-05-23 14:31:52 +02:00
hqq.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
optimum.md Docs / Quantization: refactor quantization documentation (#30942) 2024-05-23 14:31:52 +02:00
overview.md HFQuantizer implementation for compressed-tensors library (#31704) 2024-09-25 14:31:38 +02:00
quanto.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
torchao.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00