Yih-Dar
4143f94d51
uninstall kernels
from docker images ( #38083 )
...
uninstall kernels
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-12 18:03:47 +02:00
Wenhua Cheng
b3492ff9f7
Add AutoRound quantization support ( #37393 )
...
* add auto-round support
* Update src/transformers/quantizers/auto.py
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
* fix style issue
Signed-off-by: wenhuach <wenhuach87@gmail.com>
* tiny change
* tiny change
* refine ut and doc
* revert unnecessary change
* tiny change
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* try to fix style issue
* fix doc issue
* Update tests/quantization/autoround/test_auto_round.py
* fix comments
* Update tests/quantization/autoround/test_auto_round.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/autoround/test_auto_round.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* update doc
* Update src/transformers/quantizers/quantizer_auto_round.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* update
* update
* fix
* try to fix style issue
* Update src/transformers/quantizers/auto.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update docs/source/en/quantization/auto_round.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update docs/source/en/quantization/auto_round.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update docs/source/en/quantization/auto_round.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* update
* fix style issue
* update doc
* update doc
* Refine the doc
* refine doc
* revert one change
* set sym to True by default
* Enhance the unit test's robustness.
* update
* add torch dtype
* tiny change
* add awq convert test
* fix typo
* update
* fix packing format issue
* use one gpu
---------
Signed-off-by: wenhuach <wenhuach87@gmail.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Shen, Haihao <haihao.shen@intel.com>
2025-04-22 13:56:54 +02:00
Mohamed Mekkouri
897874748b
Disable kernels for quantization ( #37446 )
...
fix
2025-04-11 16:35:38 +02:00
fxmarty-amd
1a374799ce
Support loading Quark quantized models in Transformers ( #36372 )
...
* add quark quantizer
* add quark doc
* clean up doc
* fix tests
* make style
* more style fixes
* cleanup imports
* cleaning
* precise install
* Update docs/source/en/quantization/quark.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/quark_integration/test_quark.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* remove import guard as suggested
* update copyright headers
* add quark to transformers-quantization-latest-gpu Dockerfile
* make tests pass on transformers main + quark==0.7
* add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-20 15:40:51 +01:00
Mohamed Mekkouri
65b8e38aac
Upgrading torch version and cuda version in quantization docker ( #36264 )
...
* update
* small update
* no spqr quant
* testing
* testing
* test nightly
* gptqmodel
* flute
* fix hadamard
* running tests
* new docker
* fix docker
* run tests
* testing new docker
* new docker
* run tests
* new docker
* run tests
* final test
* update
* update
* run tests
* new docker
* launch tests
* test_docker
* running tests
* add comments
* fixing yml
* revert
2025-03-13 12:39:16 +01:00
Marc Sun
dae8708c36
Add compressed tensor in quant dockerfile ( #36239 )
...
add compressed_tensors in the dockerfile
2025-02-17 17:48:57 +01:00
Elvir Crnčević
845b0a2616
Efficient Inference Kernel for SpQR ( #34976 )
...
* Resolve vptq conflict
* Rename spqr package to spqr_quant
* Get rid of aqlm mention
* Start working on tests
* Resolve ruff code checks
* Ruff format
* Isort
* Test updates
* Add gpu tag
* Rename to modules_to_not_convert
* Config update
* Docs and config update
* Docs and config update
* Update to update_torch_dtype
* spqr config parameter validation
* Ruff update
* Apply ruff fixes
* Test fixes
* Ruff update
* Mark tests as @slow again; Ruff; Docstring update
* Ruff
* Remove absolute path
* Resolve typo
* Remove redundandt log
* Check accelerate/spqr availability
* Ruff fix
* Check if the config contains proper shapes
* Ruff test
* Documentation update
* overview update
* Ruff checks
* Ruff code quality
* Make style
* Update docs/source/en/quantization/spqr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update spqr.md
* Enable gptqmodel (#35012 )
* gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update readme
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* gptqmodel need use checkpoint_format (#1 )
* gptqmodel need use checkpoint_format
* fix quantize
* Update quantization_config.py
* Update quantization_config.py
* Update quantization_config.py
---------
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* Revert quantizer_gptq.py (#2 )
* revert quantizer_gptq.py change
* pass **kwargs
* limit gptqmodel and optimum version
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix warning
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix version check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert unrelated changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable gptqmodel tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix requires gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Fix Transformer compat (#3 )
* revert quantizer_gptq.py change
* pass **kwargs
* add meta info
* cleanup
* cleanup
* Update quantization_config.py
* hf_select_quant_linear pass checkpoint_format and meta
* fix GPTQTestCUDA
* Update test_gptq.py
* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2
* cleanup
* add backend
* cleanup
* cleanup
* no need check exllama version
* Update quantization_config.py
* lower checkpoint_format and backend
* check none
* cleanup
* Update quantization_config.py
* fix self.use_exllama == False
* spell
* fix unittest
* fix unittest
---------
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format again
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update gptqmodel version (#6 )
* update gptqmodel version
* update gptqmodel version
* fix unit test (#5 )
* update gptqmodel version
* update gptqmodel version
* "not self.use_exllama" is not equivalent to "self.use_exllama==False"
* fix unittest
* update gptqmodel version
* backend is loading_attibutes (#7 )
* fix format and tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix memory check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix device mismatch
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix result check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* review: update docs (#10 )
* review: update docs (#12 )
* review: update docs
* fix typo
* update tests for gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update document (#9 )
* update overview.md
* cleanup
* Update overview.md
* Update overview.md
* Update overview.md
* update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* typo
* doc note for asymmetric quant
* typo with apple silicon(e)
* typo for marlin
* column name revert: review
* doc rocm support
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/overview.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/overview.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix : Nemotron Processor in GGUF conversion (#35708 )
* fixing nemotron processor
* make style
* Update docs/source/en/quantization/spqr.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add missing TOC to doc
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-13 16:22:58 +01:00
Andrei Panferov
64c05eecd6
HIGGS Quantization Support ( #34997 )
...
* higgs init
* working with crunches
* per-model workspaces
* style
* style 2
* tests and style
* higgs tests passing
* protecting torch import
* removed torch.Tensor type annotations
* torch.nn.Module inheritance fix maybe
* hide inputs inside quantizer calls
* style structure something
* Update src/transformers/quantizers/quantizer_higgs.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* reworked num_sms
* Update src/transformers/integrations/higgs.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* revamped device checks
* docstring upd
* Update src/transformers/quantizers/quantizer_higgs.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* edited tests and device map assertions
* minor edits
* updated flute cuda version in docker
* Added p=1 and 2,3bit HIGGS
* flute version check update
* incorporated `modules_to_not_convert`
* less hardcoding
* Fixed comment
* Added docs
* Fixed gemma support
* example in docs
* fixed torch_dtype for HIGGS
* Update docs/source/en/quantization/higgs.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Collection link
* dequantize interface
* newer flute version, torch.compile support
* unittest message fix
* docs update compile
* isort
* ValueError instead of assert
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-23 16:54:49 +01:00
wejoncy
4e27a4009d
FEAT : Adding VPTQ quantization method to HFQuantizer ( #34770 )
...
* init vptq
* add integration
* add vptq support
fix readme
* add tests && format
* format
* address comments
* format
* format
* address comments
* format
* address comments
* remove debug code
* Revert "remove debug code"
This reverts commit ed3b3eaaba
.
* fix test
---------
Co-authored-by: Yang Wang <wyatuestc@gmail.com>
2024-12-20 09:45:53 +01:00
Mohamed Mekkouri
f491096f7d
Fix docker CI : install autogptq from source ( #35000 )
...
* Fixed Docker
* Test ci
* Finally
* add comment
2024-11-28 16:31:36 +01:00
Mohamed Mekkouri
8f48ccf548
Fix : Add PEFT from source to CI docker ( #34969 )
...
* Docker fix peft
* Test new docker
* uncomment
2024-11-27 14:10:47 +01:00
Mohamed Mekkouri
b76a292bde
Upgrade torch version to 2.5 in dockerfile for quantization CI ( #34924 )
...
* Upgrade Torch 2.5
* uncomment
2024-11-25 17:38:20 +01:00
Benjamin Bossan
b13916c09d
[AWQ, CI] Bump AWQ version used in docker image ( #34922 )
...
The old AWQ version is failing with the latest (unreleased)
transformers, giving the error:
> ImportError: cannot import name 'shard_checkpoint' from
'transformers.modeling_utils'
This has been resolved in awq v0.2.7:
https://github.com/casper-hansen/AutoAWQ/pull/644
2024-11-25 16:49:57 +01:00
Yih-Dar
f0e640adfa
Drop support for Python 3.8 ( #34314 )
...
* drop python 3.8
* update docker files
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-24 11:16:55 +02:00
Marc Sun
cac4a4876b
[Quantization] Switch to optimum-quanto ( #31732 )
...
* switch to optimum-quanto rebase squach
* fix import check
* again
* test try-except
* style
2024-10-02 15:14:34 +02:00
Younes Belkada
658b849aeb
Quantization / TST: Fix remaining quantization tests ( #31000 )
...
* Fix remaining quant tests
* Update test_quanto.py
2024-05-24 14:35:59 +02:00
Younes Belkada
fce78fd0e9
FIX / Quantization: Fix Dockerfile build ( #30890 )
...
* Update Dockerfile
* Update docker/transformers-quantization-latest-gpu/Dockerfile
2024-05-20 10:08:26 +02:00
Younes Belkada
4e17e7dcf8
TST / Quantization: Reverting to torch==2.2.1 ( #30866 )
...
Reverting to 2.2.1
2024-05-16 17:30:02 +02:00
Yih-Dar
2d83324ecf
Use torch 2.3
for CI ( #30837 )
...
2.3
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-15 19:31:52 +02:00
mobicham
59952994c4
Add HQQ quantization support ( #29637 )
...
* update HQQ transformers integration
* push import_utils.py
* add force_hooks check in modeling_utils.py
* fix | with Optional
* force bias as param
* check bias is Tensor
* force forward for multi-gpu
* review fixes pass
* remove torch grad()
* if any key in linear_tags fix
* add cpu/disk check
* isinstance return
* add multigpu test + refactor tests
* clean hqq_utils imports in hqq.py
* clean hqq_utils imports in quantizer_hqq.py
* delete hqq_utils.py
* Delete src/transformers/utils/hqq_utils.py
* ruff init
* remove torch.float16 from __init__ in test
* refactor test
* isinstance -> type in quantizer_hqq.py
* cpu/disk device_map check in quantizer_hqq.py
* remove type(module) nn.linear check in quantizer_hqq.py
* add BaseQuantizeConfig import inside HqqConfig init
* remove hqq import in hqq.py
* remove accelerate import from test_hqq.py
* quant config.py doc update
* add hqqconfig to main_classes doc
* make style
* __init__ fix
* ruff __init__
* skip_modules list
* hqqconfig format fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* test_hqq.py remove mistral comment
* remove self.using_multi_gpu is False
* torch_dtype default val set and logger.info
* hqq.py isinstance fix
* remove torch=None
* torch_device test_hqq
* rename test_hqq
* MODEL_ID in test_hqq
* quantizer_hqq setattr fix
* quantizer_hqq typo fix
* imports quantizer_hqq.py
* isinstance quantizer_hqq
* hqq_layer.bias reformat quantizer_hqq
* Step 2 as comment in quantizer_hqq
* prepare_for_hqq_linear() comment
* keep_in_fp32_modules fix
* HqqHfQuantizer reformat
* quantization.md hqqconfig
* quantization.md model example reformat
* quantization.md # space
* quantization.md space })
* quantization.md space })
* quantization_config fix doc
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* axis value check in quantization_config
* format
* dynamic config explanation
* quant config method in quantization.md
* remove shard-level progress
* .cuda fix modeling_utils
* test_hqq fixes
* make fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-02 17:51:49 +01:00
zhong zhuang
b4c18a830a
[FEAT]: EETQ quantizer support ( #30262 )
...
* [FEAT]: EETQ quantizer support
* Update quantization.md
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/__init__.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/__init__.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* [FEAT]: EETQ quantizer support
* [FEAT]: EETQ quantizer support
* remove whitespaces
* update quantization.md
* style
* Update docs/source/en/quantization.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add copyright
* Update quantization.md
* Update docs/source/en/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Address the comments by amyeroberts
* style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-22 20:38:58 +01:00
Marc Sun
58a939c6b7
Fix quantization tests ( #29914 )
...
* revert back to torch 2.1.1
* run test
* switch to torch 2.2.1
* udapte dockerfile
* fix awq tests
* fix test
* run quanto tests
* update tests
* split quantization tests
* fix
* fix again
* final fix
* fix report artifact
* build docker again
* Revert "build docker again"
This reverts commit 399a5f9d93
.
* debug
* revert
* style
* new notification system
* testing notfication
* rebuild docker
* fix_prev_ci_results
* typo
* remove warning
* fix typo
* fix artifact name
* debug
* issue fixed
* debug again
* fix
* fix time
* test notif with faling test
* typo
* issues again
* final fix ?
* run all quantization tests again
* remove name to clear space
* revert modfiication done on workflow
* fix
* build docker
* build only quant docker
* fix quantization ci
* fix
* fix report
* better quantization_matrix
* add print
* revert to the basic one
2024-04-09 17:10:29 +02:00
Marc Sun
28de2f4de3
[Quantization] Quanto quantizer ( #29023 )
...
* start integration
* fix
* add and debug tests
* update tests
* make pytorch serialization works
* compatible with device_map and offload
* fix tests
* make style
* add ref
* guard against safetensors
* add float8 and style
* fix is_serializable
* Fix shard_checkpoint compatibility with quanto
* more tests
* docs
* adjust memory
* better
* style
* pass tests
* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add is_safe_serialization instead
* Update src/transformers/quantizers/quantizer_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add QbitsTensor tests
* fix tests
* simplify activation list
* Update docs/source/en/quantization.md
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
* better comment
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
* find and fix edge case
* Update docs/source/en/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* pass weights_only_kwarg instead
* fix shard_checkpoint loading
* simplify update_missing_keys
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* recursion to get all tensors
* block serialization
* skip serialization tests
* fix
* change by cuda:0 for now
* fix regression
* update device_map
* fix doc
* add noteboon
* update torch_dtype
* update doc
* typo
* typo
* remove comm
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
2024-03-15 11:51:29 -04:00
Ilyas Moutawwakil
4fc708f98c
Exllama kernels support for AWQ models ( #28634 )
...
* added exllama kernels support for awq models
* doc
* style
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* refactor
* moved exllama post init to after device dispatching
* bump autoawq version
* added exllama test
* style
* configurable exllama kernels
* copy exllama_config from gptq
* moved exllama version check to post init
* moved to quantization dockerfile
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-03-05 03:22:48 +01:00
Marc Sun
f54d82cace
[CI] Quantization workflow ( #29046 )
...
* [CI] Quantization workflow
* build dockerfile
* fix dockerfile
* update self-cheduled.yml
* test build dockerfile on push
* fix torch install
* udapte to python 3.10
* update aqlm version
* uncomment build dockerfile
* tests if the scheduler works
* fix docker
* do not trigger on psuh again
* add additional runs
* test again
* all good
* style
* Update .github/workflows/self-scheduled.yml
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* test build dockerfile with torch 2.2.0
* fix extra
* clean
* revert changes
* Revert "revert changes"
This reverts commit 4cb52b8822
.
* revert correct change
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-28 10:09:25 -05:00