Commit Graph

17 Commits

Author SHA1 Message Date
Steven Liu
2418c64a1c
[docs] HfQuantizer (#28820)
* tidy

* fix path
2024-02-02 08:22:18 +01:00
Steven Liu
bd50402b56
[docs] Quantization (#27641)
* first draft

* benchmarks

* feedback
2023-11-28 08:41:47 -08:00
Maria Khalusova
9beb2737d7
[docs] fixed links with 404 (#27327)
* fixed links with 404

* make style
2023-11-06 19:45:03 +00:00
Marc Sun
c9e72f55b2
Add exllamav2 better (#27111)
* add_ xllamav2 arg

* add test

* style

* add check

* add doc

* replace by use_exllama_v2

* fix tests

* fix doc

* style

* better condition

* fix logic

* add deprecate msg

* deprecate exllama

* remove disable_exllama from the linter

* remove

* fix warning

* Revert the commits deprecating exllama

* deprecate disable_exllama for use_exllama

* fix

* fix loading attribute

* better handling of args

* remove disable_exllama from init and linter

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* better arg

* fix warning

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* switch to dict

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* style

* nits

* style

* better tests

* style

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 13:09:21 -04:00
Younes Belkada
ae093eef01
[core / Quantization ] AWQ integration (#27045)
* working v1

* oops

* Update src/transformers/modeling_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fixup

* oops

* push

* more changes

* add docs

* some fixes

* fix copies

* add v1 doc

* added installation guide

* relax constraints

* revert

* attempt llm-awq

* oops

* oops

* fixup

* raise error when incorrect cuda compute capability

* nit

* add instructions for llm-awq

* fixup

* fix copies

* fixup and docs

* change

* few changes + add demo

* add v1 tests

* add autoawq in dockerfile

* finalize

* Update tests/quantization/autoawq/test_awq.py

* fix test

* fix

* fix issue

* Update src/transformers/integrations/awq.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/integrations/awq.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/integrations/awq.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add link to example script

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add more content

* add more details

* add link to quantization docs

* camel case + change backend class name

* change to string

* fixup

* raise errors if libs not installed

* change to `bits` and `group_size`

* nit

* nit

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* disable training

* address some comments and fix nits

* fix

* final nits and fix tests

* adapt to our new runners

* make fix-copies

* Update src/transformers/utils/quantization_config.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/integrations/awq.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/integrations/awq.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* move to top

* add conversion test

* final nit

* add more elaborated test

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-01 09:06:31 +01:00
Vivek Khandelwal
2963e196ee
Add support for loading GPTQ models on CPU (#26719)
* Add support for loading GPTQ models on CPU

Right now, we can only load the GPTQ Quantized model on the CUDA
device. The attribute `gptq_supports_cpu` checks if the current
auto_gptq version is the one which has the cpu support for the
model or not.
The larger variants of the model are hard to load/run/trace on
the GPU and that's the rationale behind adding this attribute.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>

* Update quantization.md

* Update quantization.md

* Update quantization.md
2023-10-31 13:45:23 +00:00
Rockerz
84724efd10
Translating en/main_classes folder docs to Japanese 🇯🇵 (#26894)
* add

* add

* add

* Add deepspeed.md

* Add

* add

* Update docs/source/ja/main_classes/callback.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/output.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/pipelines.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/text_generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/main_classes/processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update  logging.md

* Update toctree.yml

* Update docs/source/ja/main_classes/deepspeed.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add suggesitons

* m

* Update docs/source/ja/main_classes/trainer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update toctree.yml

* Update Quantization.md

* Update docs/source/ja/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update toctree.yml

* Update docs/source/en/main_classes/deepspeed.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/main_classes/deepspeed.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-10-30 09:39:14 -07:00
Arthur
90ee9cea19
Revert "add exllamav2 arg" (#27102)
Revert "add exllamav2 arg (#26437)"

This reverts commit 8214d6e7b1.
2023-10-27 11:23:06 +02:00
Marc Sun
8214d6e7b1
add exllamav2 arg (#26437)
* add_ xllamav2 arg

* add test

* style

* add check

* add doc

* replace by use_exllama_v2

* fix tests

* fix doc

* style

* better condition

* fix logic

* add deprecate msg
2023-10-26 10:15:05 -04:00
Heinz-Alexander Fuetterer
883ed4b344
chore: fix typos (#26756) 2023-10-12 18:00:27 +02:00
Marc Sun
06a1d75bd5
fix gptq nits (#25500)
* fix nits

* fix docstring

* fix doc

* fix damp_percent

* fix doc
2023-08-14 11:43:38 -04:00
Marc Sun
55db70c63d
GPTQ integration (#25062)
* GTPQ integration

* Add tests for gptq

* support for more quantization model

* fix style

* typo

* fix method

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add dataclass and fix quantization_method

* fix doc

* Update tests/quantization/gptq/test_gptq.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* modify dataclass

* add gtpqconfig import

* fix typo

* fix tests

* remove dataset as req arg

* remove tokenizer import

* add offload cpu quantization test

* fix check dataset

* modify dockerfile

* protect trainer

* style

* test for config

* add more log

* overwrite torch_dtype

* draft doc

* modify quantization_config docstring

* fix class name in docstring

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* more warning

* fix 8bit kwargs tests

* peft compatibility

* remove var

* fix is_gptq_quantized

* remove is_gptq_quantized

* fix wrap

* Update src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add exllama

* skip test

* overwrite float16

* style

* fix skip test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix docsting formatting

* add doc

* better test

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-08-10 16:06:29 -04:00
Younes Belkada
972fdcc778
[Docs/quantization] Clearer explanation on how things works under the hood. + remove outdated info (#25216)
* clearer explanation on how things works under the hood.

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add `load_in_4bit` in `from_pretrained`

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-08-01 10:56:52 +02:00
Stas Bekman
5220606607
[quantization.md] fix (#25190)
Update quantization.md
2023-07-31 09:37:29 -07:00
Younes Belkada
ca974aff0f
[Docs] Clarify 4bit docs (#24878)
* clarify 4bit docs

* Apply suggestions from code review

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2023-07-18 13:39:08 +02:00
Marc Sun
35eac0df75
add link to accelerate doc (#24601) 2023-07-10 17:49:30 -04:00
Sylvain Gugger
eb849f6604
Migrate doc files to Markdown. (#24376)
* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-06-20 18:07:47 -04:00