Commit Graph

18 Commits

Author SHA1 Message Date
Andrei Panferov
64c05eecd6
HIGGS Quantization Support (#34997)
* higgs init

* working with crunches

* per-model workspaces

* style

* style 2

* tests and style

* higgs tests passing

* protecting torch import

* removed torch.Tensor type annotations

* torch.nn.Module inheritance fix maybe

* hide inputs inside quantizer calls

* style structure something

* Update src/transformers/quantizers/quantizer_higgs.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* reworked num_sms

* Update src/transformers/integrations/higgs.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* revamped device checks

* docstring upd

* Update src/transformers/quantizers/quantizer_higgs.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* edited tests and device map assertions

* minor edits

* updated flute cuda version in docker

* Added p=1 and 2,3bit HIGGS

* flute version check update

* incorporated `modules_to_not_convert`

* less hardcoding

* Fixed comment

* Added docs

* Fixed gemma support

* example in docs

* fixed torch_dtype for HIGGS

* Update docs/source/en/quantization/higgs.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Collection link

* dequantize interface

* newer flute version, torch.compile support

* unittest message fix

* docs update compile

* isort

* ValueError instead of assert

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-23 16:54:49 +01:00
wejoncy
4e27a4009d
FEAT : Adding VPTQ quantization method to HFQuantizer (#34770)
* init vptq

* add integration

* add vptq support

fix readme

* add tests && format

* format

* address comments

* format

* format

* address comments

* format

* address comments

* remove debug code

* Revert "remove debug code"

This reverts commit ed3b3eaaba.

* fix test

---------

Co-authored-by: Yang Wang <wyatuestc@gmail.com>
2024-12-20 09:45:53 +01:00
Mohamed Mekkouri
f491096f7d
Fix docker CI : install autogptq from source (#35000)
* Fixed Docker

* Test ci

* Finally

* add comment
2024-11-28 16:31:36 +01:00
Mohamed Mekkouri
8f48ccf548
Fix : Add PEFT from source to CI docker (#34969)
* Docker fix peft

* Test new docker

* uncomment
2024-11-27 14:10:47 +01:00
Mohamed Mekkouri
b76a292bde
Upgrade torch version to 2.5 in dockerfile for quantization CI (#34924)
* Upgrade Torch 2.5

* uncomment
2024-11-25 17:38:20 +01:00
Benjamin Bossan
b13916c09d
[AWQ, CI] Bump AWQ version used in docker image (#34922)
The old AWQ version is failing with the latest (unreleased)
transformers, giving the error:

> ImportError: cannot import name 'shard_checkpoint' from
'transformers.modeling_utils'

This has been resolved in awq v0.2.7:

https://github.com/casper-hansen/AutoAWQ/pull/644
2024-11-25 16:49:57 +01:00
Yih-Dar
f0e640adfa
Drop support for Python 3.8 (#34314)
* drop python 3.8

* update docker files

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-24 11:16:55 +02:00
Marc Sun
cac4a4876b
[Quantization] Switch to optimum-quanto (#31732)
* switch to optimum-quanto rebase squach

* fix import check

* again

* test try-except

* style
2024-10-02 15:14:34 +02:00
Younes Belkada
658b849aeb
Quantization / TST: Fix remaining quantization tests (#31000)
* Fix remaining quant tests

* Update test_quanto.py
2024-05-24 14:35:59 +02:00
Younes Belkada
fce78fd0e9
FIX / Quantization: Fix Dockerfile build (#30890)
* Update Dockerfile

* Update docker/transformers-quantization-latest-gpu/Dockerfile
2024-05-20 10:08:26 +02:00
Younes Belkada
4e17e7dcf8
TST / Quantization: Reverting to torch==2.2.1 (#30866)
Reverting to 2.2.1
2024-05-16 17:30:02 +02:00
Yih-Dar
2d83324ecf
Use torch 2.3 for CI (#30837)
2.3

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-15 19:31:52 +02:00
mobicham
59952994c4
Add HQQ quantization support (#29637)
* update HQQ transformers integration

* push import_utils.py

* add force_hooks check in modeling_utils.py

* fix | with Optional

* force bias as param

* check bias is Tensor

* force forward for multi-gpu

* review fixes pass

* remove torch grad()

* if any key in linear_tags fix

* add cpu/disk check

* isinstance return

* add multigpu test + refactor tests

* clean hqq_utils imports in hqq.py

* clean hqq_utils imports in quantizer_hqq.py

* delete hqq_utils.py

* Delete src/transformers/utils/hqq_utils.py

* ruff init

* remove torch.float16 from __init__ in test

* refactor test

* isinstance -> type in quantizer_hqq.py

* cpu/disk device_map check in quantizer_hqq.py

* remove type(module) nn.linear check in quantizer_hqq.py

* add BaseQuantizeConfig import inside HqqConfig init

* remove hqq import in hqq.py

* remove accelerate import from test_hqq.py

* quant config.py doc update

* add hqqconfig to main_classes doc

* make style

* __init__ fix

* ruff __init__

* skip_modules list

* hqqconfig format fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* hqqconfig doc fix

* test_hqq.py remove mistral comment

* remove self.using_multi_gpu is False

* torch_dtype default val set and logger.info

* hqq.py isinstance fix

* remove torch=None

* torch_device test_hqq

* rename test_hqq

* MODEL_ID in test_hqq

* quantizer_hqq setattr fix

* quantizer_hqq typo fix

* imports quantizer_hqq.py

* isinstance quantizer_hqq

* hqq_layer.bias reformat quantizer_hqq

* Step 2 as comment in quantizer_hqq

* prepare_for_hqq_linear() comment

* keep_in_fp32_modules fix

* HqqHfQuantizer reformat

* quantization.md hqqconfig

* quantization.md model example reformat

* quantization.md # space

* quantization.md space   })

* quantization.md space   })

* quantization_config fix doc

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* axis value check in quantization_config

* format

* dynamic config explanation

* quant config method in quantization.md

* remove shard-level progress

* .cuda fix modeling_utils

* test_hqq fixes

* make fix-copies

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-02 17:51:49 +01:00
zhong zhuang
b4c18a830a
[FEAT]: EETQ quantizer support (#30262)
* [FEAT]: EETQ quantizer support

* Update quantization.md

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/__init__.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/__init__.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/integrations/eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/eetq_integration/test_eetq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [FEAT]: EETQ quantizer support

* [FEAT]: EETQ quantizer support

* remove whitespaces

* update quantization.md

* style

* Update docs/source/en/quantization.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add copyright

* Update quantization.md

* Update docs/source/en/quantization.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/quantization.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Address the comments by amyeroberts

* style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-22 20:38:58 +01:00
Marc Sun
58a939c6b7
Fix quantization tests (#29914)
* revert back to torch 2.1.1

* run test

* switch to torch 2.2.1

* udapte dockerfile

* fix awq tests

* fix test

* run quanto tests

* update tests

* split quantization tests

* fix

* fix again

* final fix

* fix report artifact

* build docker again

* Revert "build docker again"

This reverts commit 399a5f9d93.

* debug

* revert

* style

* new notification system

* testing notfication

* rebuild docker

* fix_prev_ci_results

* typo

* remove warning

* fix typo

* fix artifact name

* debug

* issue fixed

* debug again

* fix

* fix time

* test notif with faling test

* typo

* issues again

* final fix ?

* run all quantization tests again

* remove name to clear space

* revert modfiication done on workflow

* fix

* build docker

* build only quant docker

* fix quantization ci

* fix

* fix report

* better quantization_matrix

* add print

* revert to the basic one
2024-04-09 17:10:29 +02:00
Marc Sun
28de2f4de3
[Quantization] Quanto quantizer (#29023)
* start integration

* fix

* add and debug tests

* update tests

* make pytorch serialization works

* compatible with device_map and offload

* fix tests

* make style

* add ref

* guard against safetensors

* add float8 and style

* fix is_serializable

* Fix shard_checkpoint compatibility with quanto

* more tests

* docs

* adjust memory

* better

* style

* pass tests

* Update src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add is_safe_serialization instead

* Update src/transformers/quantizers/quantizer_quanto.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add QbitsTensor tests

* fix tests

* simplify activation list

* Update docs/source/en/quantization.md

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* better comment

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* find and fix edge case

* Update docs/source/en/quantization.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pass weights_only_kwarg instead

* fix shard_checkpoint loading

* simplify update_missing_keys

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* recursion to get all tensors

* block serialization

* skip serialization tests

* fix

* change by cuda:0 for now

* fix regression

* update device_map

* fix doc

* add noteboon

* update torch_dtype

* update doc

* typo

* typo

* remove comm

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
2024-03-15 11:51:29 -04:00
Ilyas Moutawwakil
4fc708f98c
Exllama kernels support for AWQ models (#28634)
* added exllama kernels support for awq models

* doc

* style

* Update src/transformers/modeling_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* refactor

* moved exllama post init to after device dispatching

* bump autoawq version

* added exllama test

* style

* configurable exllama kernels

* copy exllama_config from gptq

* moved exllama version check to post init

* moved to quantization dockerfile

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-03-05 03:22:48 +01:00
Marc Sun
f54d82cace
[CI] Quantization workflow (#29046)
* [CI] Quantization workflow

* build dockerfile

* fix dockerfile

* update self-cheduled.yml

* test build dockerfile on push

* fix torch install

* udapte to python 3.10

* update aqlm version

* uncomment build dockerfile

* tests if the scheduler works

* fix docker

* do not trigger on psuh again

* add additional runs

* test again

* all good

* style

* Update .github/workflows/self-scheduled.yml

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* test build dockerfile with torch 2.2.0

* fix extra

* clean

* revert changes

* Revert "revert changes"

This reverts commit 4cb52b8822.

* revert correct change

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-28 10:09:25 -05:00