transformers/docs/source
Marc Sun 28de2f4de3
[Quantization] Quanto quantizer (#29023)
* start integration

* fix

* add and debug tests

* update tests

* make pytorch serialization works

* compatible with device_map and offload

* fix tests

* make style

* add ref

* guard against safetensors

* add float8 and style

* fix is_serializable

* Fix shard_checkpoint compatibility with quanto

* more tests

* docs

* adjust memory

* better

* style

* pass tests

* Update src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add is_safe_serialization instead

* Update src/transformers/quantizers/quantizer_quanto.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add QbitsTensor tests

* fix tests

* simplify activation list

* Update docs/source/en/quantization.md

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* better comment

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* find and fix edge case

* Update docs/source/en/quantization.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pass weights_only_kwarg instead

* fix shard_checkpoint loading

* simplify update_missing_keys

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* recursion to get all tensors

* block serialization

* skip serialization tests

* fix

* change by cuda:0 for now

* fix regression

* update device_map

* fix doc

* add noteboon

* update torch_dtype

* update doc

* typo

* typo

* remove comm

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
2024-03-15 11:51:29 -04:00
..
de Make torch xla available on GPU (#29334) 2024-03-11 14:07:16 +00:00
en [Quantization] Quanto quantizer (#29023) 2024-03-15 11:51:29 -04:00
es [docs] Spanish translate chat_templating.md & yml addition (#29559) 2024-03-13 09:28:11 -07:00
fr Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
hi Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
it Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
ja [docs] Remove broken ChatML format link from chat_templating.md (#29643) 2024-03-13 13:04:51 -07:00
ko Make torch xla available on GPU (#29334) 2024-03-11 14:07:16 +00:00
ms [Docs] Add missing language options and fix broken links (#28852) 2024-02-06 12:01:01 -08:00
pt Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
te Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
tr Translate index.md to Turkish (#27093) 2023-11-08 08:35:20 -05:00
zh [docs] Remove broken ChatML format link from chat_templating.md (#29643) 2024-03-13 13:04:51 -07:00
_config.py [Styling] stylify using ruff (#27144) 2023-11-16 17:43:19 +01:00