Marc Sun
28de2f4de3
[Quantization] Quanto quantizer ( #29023 )
...
* start integration
* fix
* add and debug tests
* update tests
* make pytorch serialization works
* compatible with device_map and offload
* fix tests
* make style
* add ref
* guard against safetensors
* add float8 and style
* fix is_serializable
* Fix shard_checkpoint compatibility with quanto
* more tests
* docs
* adjust memory
* better
* style
* pass tests
* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add is_safe_serialization instead
* Update src/transformers/quantizers/quantizer_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add QbitsTensor tests
* fix tests
* simplify activation list
* Update docs/source/en/quantization.md
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
* better comment
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
* find and fix edge case
* Update docs/source/en/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* pass weights_only_kwarg instead
* fix shard_checkpoint loading
* simplify update_missing_keys
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* recursion to get all tensors
* block serialization
* skip serialization tests
* fix
* change by cuda:0 for now
* fix regression
* update device_map
* fix doc
* add noteboon
* update torch_dtype
* update doc
* typo
* typo
* remove comm
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
2024-03-15 11:51:29 -04:00
Furkan Akkurt
11163fff58
Fix typo ; Update quantization.md ( #29615 )
...
Update quantization.md
2024-03-12 16:32:50 +00:00
Andrei Panferov
1ecf5f7c98
AQLM quantizer support ( #28928 )
...
* aqlm init
* calibration and dtypes
* docs
* Readme update
* is_aqlm_available
* Simpler link in docs
* Test TODO real reference
* init _import_structure fix
* AqlmConfig autodoc
* integration aqlm
* integrations in tests
* docstring fix
* legacy typing
* Less typings
* More kernels information
* Performance -> Accuracy
* correct tests
* remoced multi-gpu test
* Update docs/source/en/quantization.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Brought back multi-gpu tests
* Update src/transformers/integrations/aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/aqlm_integration/test_aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Andrei Panferov <blacksamorez@yandex-team.ru>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-02-14 09:25:41 +01:00
Steven Liu
2418c64a1c
[docs] HfQuantizer ( #28820 )
...
* tidy
* fix path
2024-02-02 08:22:18 +01:00
Steven Liu
bd50402b56
[docs] Quantization ( #27641 )
...
* first draft
* benchmarks
* feedback
2023-11-28 08:41:47 -08:00
Maria Khalusova
9beb2737d7
[docs] fixed links with 404 ( #27327 )
...
* fixed links with 404
* make style
2023-11-06 19:45:03 +00:00
Marc Sun
c9e72f55b2
Add exllamav2 better ( #27111 )
...
* add_ xllamav2 arg
* add test
* style
* add check
* add doc
* replace by use_exllama_v2
* fix tests
* fix doc
* style
* better condition
* fix logic
* add deprecate msg
* deprecate exllama
* remove disable_exllama from the linter
* remove
* fix warning
* Revert the commits deprecating exllama
* deprecate disable_exllama for use_exllama
* fix
* fix loading attribute
* better handling of args
* remove disable_exllama from init and linter
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* better arg
* fix warning
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* switch to dict
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* style
* nits
* style
* better tests
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 13:09:21 -04:00
Younes Belkada
ae093eef01
[core
/ Quantization
] AWQ integration ( #27045 )
...
* working v1
* oops
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fixup
* oops
* push
* more changes
* add docs
* some fixes
* fix copies
* add v1 doc
* added installation guide
* relax constraints
* revert
* attempt llm-awq
* oops
* oops
* fixup
* raise error when incorrect cuda compute capability
* nit
* add instructions for llm-awq
* fixup
* fix copies
* fixup and docs
* change
* few changes + add demo
* add v1 tests
* add autoawq in dockerfile
* finalize
* Update tests/quantization/autoawq/test_awq.py
* fix test
* fix
* fix issue
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add link to example script
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add more content
* add more details
* add link to quantization docs
* camel case + change backend class name
* change to string
* fixup
* raise errors if libs not installed
* change to `bits` and `group_size`
* nit
* nit
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* disable training
* address some comments and fix nits
* fix
* final nits and fix tests
* adapt to our new runners
* make fix-copies
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* move to top
* add conversion test
* final nit
* add more elaborated test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 09:06:31 +01:00
Vivek Khandelwal
2963e196ee
Add support for loading GPTQ models on CPU ( #26719 )
...
* Add support for loading GPTQ models on CPU
Right now, we can only load the GPTQ Quantized model on the CUDA
device. The attribute `gptq_supports_cpu` checks if the current
auto_gptq version is the one which has the cpu support for the
model or not.
The larger variants of the model are hard to load/run/trace on
the GPU and that's the rationale behind adding this attribute.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
* Update quantization.md
* Update quantization.md
* Update quantization.md
2023-10-31 13:45:23 +00:00
Rockerz
84724efd10
Translating en/main_classes
folder docs to Japanese 🇯🇵 ( #26894 )
...
* add
* add
* add
* Add deepspeed.md
* Add
* add
* Update docs/source/ja/main_classes/callback.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/output.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/pipelines.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/text_generation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/main_classes/processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update logging.md
* Update toctree.yml
* Update docs/source/ja/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add suggesitons
* m
* Update docs/source/ja/main_classes/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update toctree.yml
* Update Quantization.md
* Update docs/source/ja/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update toctree.yml
* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-10-30 09:39:14 -07:00
Arthur
90ee9cea19
Revert "add exllamav2 arg" ( #27102 )
...
Revert "add exllamav2 arg (#26437 )"
This reverts commit 8214d6e7b1
.
2023-10-27 11:23:06 +02:00
Marc Sun
8214d6e7b1
add exllamav2 arg ( #26437 )
...
* add_ xllamav2 arg
* add test
* style
* add check
* add doc
* replace by use_exllama_v2
* fix tests
* fix doc
* style
* better condition
* fix logic
* add deprecate msg
2023-10-26 10:15:05 -04:00
Heinz-Alexander Fuetterer
883ed4b344
chore: fix typos ( #26756 )
2023-10-12 18:00:27 +02:00
Marc Sun
06a1d75bd5
fix gptq nits ( #25500 )
...
* fix nits
* fix docstring
* fix doc
* fix damp_percent
* fix doc
2023-08-14 11:43:38 -04:00
Marc Sun
55db70c63d
GPTQ integration ( #25062 )
...
* GTPQ integration
* Add tests for gptq
* support for more quantization model
* fix style
* typo
* fix method
* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add dataclass and fix quantization_method
* fix doc
* Update tests/quantization/gptq/test_gptq.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* modify dataclass
* add gtpqconfig import
* fix typo
* fix tests
* remove dataset as req arg
* remove tokenizer import
* add offload cpu quantization test
* fix check dataset
* modify dockerfile
* protect trainer
* style
* test for config
* add more log
* overwrite torch_dtype
* draft doc
* modify quantization_config docstring
* fix class name in docstring
* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* more warning
* fix 8bit kwargs tests
* peft compatibility
* remove var
* fix is_gptq_quantized
* remove is_gptq_quantized
* fix wrap
* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add exllama
* skip test
* overwrite float16
* style
* fix skip test
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix docsting formatting
* add doc
* better test
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-08-10 16:06:29 -04:00
Younes Belkada
972fdcc778
[Docs
/quantization
] Clearer explanation on how things works under the hood. + remove outdated info ( #25216 )
...
* clearer explanation on how things works under the hood.
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add `load_in_4bit` in `from_pretrained`
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-08-01 10:56:52 +02:00
Stas Bekman
5220606607
[quantization.md] fix ( #25190 )
...
Update quantization.md
2023-07-31 09:37:29 -07:00
Younes Belkada
ca974aff0f
[Docs
] Clarify 4bit docs ( #24878 )
...
* clarify 4bit docs
* Apply suggestions from code review
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2023-07-18 13:39:08 +02:00
Marc Sun
35eac0df75
add link to accelerate doc ( #24601 )
2023-07-10 17:49:30 -04:00
Sylvain Gugger
eb849f6604
Migrate doc files to Markdown. ( #24376 )
...
* Rename index.mdx to index.md
* With saved modifs
* Address review comment
* Treat all files
* .mdx -> .md
* Remove special char
* Update utils/tests_fetcher.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-06-20 18:07:47 -04:00