Commit Graph

120 Commits

Author SHA1 Message Date
David LaPalomento
b45cf0e90a
Guard against unset resolved_archive_file (#35628)
* archive_file may not be specified
When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.

* Remap partial disk offload to cpu for GGUF files
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.

* Don't remap auto device_map and raise RuntimeError
If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-14 14:44:31 +01:00
Mohamed Mekkouri
cb586a3999
Add require_read_token to fp8 tests (#36189)
fix
2025-02-14 12:27:35 +01:00
Andrei Panferov
5f726f8b8e
New HIGGS quantization interfaces, JIT kernel compilation support. (#36148)
* new flute

* new higgs working

* small adjustments

* progress and quallity

* small updates

* style

---------

Co-authored-by: Andrey Panferov <panferov.andrey3@wb.ru>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-02-14 12:26:45 +01:00
Elvir Crnčević
845b0a2616
Efficient Inference Kernel for SpQR (#34976)
* Resolve vptq conflict

* Rename spqr package to spqr_quant

* Get rid of aqlm mention

* Start working on tests

* Resolve ruff code checks

* Ruff format

* Isort

* Test updates

* Add gpu tag

* Rename to modules_to_not_convert

* Config update

* Docs and config update

* Docs and config update

* Update to update_torch_dtype

* spqr config parameter validation

* Ruff update

* Apply ruff fixes

* Test fixes

* Ruff update

* Mark tests as @slow again; Ruff; Docstring update

* Ruff

* Remove absolute path

* Resolve typo

* Remove redundandt log

* Check accelerate/spqr availability

* Ruff fix

* Check if the config contains proper shapes

* Ruff test

* Documentation update

* overview update

* Ruff checks

* Ruff code quality

* Make style

* Update docs/source/en/quantization/spqr.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update spqr.md

* Enable gptqmodel (#35012)

* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update readme

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format (#1)

* gptqmodel need use checkpoint_format

* fix quantize

* Update quantization_config.py

* Update quantization_config.py

* Update quantization_config.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Revert quantizer_gptq.py (#2)

* revert quantizer_gptq.py change

* pass **kwargs

* limit gptqmodel and optimum version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix warning

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix requires gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix Transformer compat (#3)

* revert quantizer_gptq.py change

* pass **kwargs

* add meta info

* cleanup

* cleanup

* Update quantization_config.py

* hf_select_quant_linear pass checkpoint_format and meta

* fix GPTQTestCUDA

* Update test_gptq.py

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* cleanup

* add backend

* cleanup

* cleanup

* no need check exllama version

* Update quantization_config.py

* lower checkpoint_format and backend

* check none

* cleanup

* Update quantization_config.py

* fix self.use_exllama == False

* spell

* fix unittest

* fix unittest

---------

Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format again

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gptqmodel version (#6)

* update gptqmodel version

* update gptqmodel version

* fix unit test (#5)

* update gptqmodel version

* update gptqmodel version

* "not self.use_exllama" is not equivalent to "self.use_exllama==False"

* fix unittest

* update gptqmodel version

* backend is loading_attibutes (#7)

* fix format and tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device mismatch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix result check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* review: update docs (#10)

* review: update docs (#12)

* review: update docs

* fix typo

* update tests for gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update document (#9)

* update overview.md

* cleanup

* Update overview.md

* Update overview.md

* Update overview.md

* update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* typo

* doc note for asymmetric quant

* typo with apple silicon(e)

* typo for marlin

* column name revert: review

* doc rocm support

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix : Nemotron Processor in GGUF conversion (#35708)

* fixing nemotron processor

* make style

* Update docs/source/en/quantization/spqr.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add missing TOC to doc

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-13 16:22:58 +01:00
Mohamed Mekkouri
efe72fe21f
Adding FP8 Quantization to transformers (#36026)
* first commit

* adding kernels

* fix create_quantized_param

* fix quantization logic

* end2end

* fix style

* fix imports

* fix consistency

* update

* fix style

* update

* udpate after review

* make style

* update

* update

* fix

* update

* fix docstring

* update

* update after review

* update

* fix scheme

* update

* update

* fix

* update

* fix docstring

* add source

* fix test

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-13 13:01:19 +01:00
湛露先生
1590c66430
Fix words typos in ggml test. (#36060)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-02-06 15:32:40 +00:00
Isotr0py
e57b459997
Split and clean up GGUF quantization tests (#35502)
* clean up ggml test

Signed-off-by: Isotr0py <2037008807@qq.com>

* port remaining tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* further cleanup

Signed-off-by: Isotr0py <2037008807@qq.com>

* format

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix broken tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* update comment

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* reorganize tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* k-quants use qwen2.5-0.5B

Signed-off-by: Isotr0py <2037008807@qq.com>

* move ggml tokenization test

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove dead code

Signed-off-by: Isotr0py <2037008807@qq.com>

* add assert for serilization test

Signed-off-by: Isotr0py <2037008807@qq.com>

* use str for parameterize

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-27 15:46:57 +01:00
Arthur
b912f5ee43
use torch.testing.assertclose instead to get more details about error in cis (#35659)
* use torch.testing.assertclose instead to get more details about error in cis

* fix

* style

* test_all

* revert for I bert

* fixes and updates

* more image processing fixes

* more image processors

* fix mamba and co

* style

* less strick

* ok I won't be strict

* skip and be done

* up
2025-01-24 16:55:28 +01:00
Mohamed Mekkouri
a7738f5a89
Fix : Nemotron tokenizer for GGUF format (#35836)
fix nemotron gguf
2025-01-22 12:28:40 +01:00
Mohamed Mekkouri
dbd8474125
Fix : BLOOM tie_word_embeddings in GGUF (#35812)
* fix bloom ggml

* fix falcon output

* make style
2025-01-21 15:35:54 +01:00
Mohamed Mekkouri
b80e334e71
Skip Falcon 7B GGML Test (#35783)
skip test
2025-01-20 15:00:34 +01:00
jiqing-feng
387663e571
Enable gptqmodel (#35012)
* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update readme

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format (#1)

* gptqmodel need use checkpoint_format

* fix quantize

* Update quantization_config.py

* Update quantization_config.py

* Update quantization_config.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Revert quantizer_gptq.py (#2)

* revert quantizer_gptq.py change

* pass **kwargs

* limit gptqmodel and optimum version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix warning

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix requires gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix Transformer compat (#3)

* revert quantizer_gptq.py change

* pass **kwargs

* add meta info

* cleanup

* cleanup

* Update quantization_config.py

* hf_select_quant_linear pass checkpoint_format and meta

* fix GPTQTestCUDA

* Update test_gptq.py

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* cleanup

* add backend

* cleanup

* cleanup

* no need check exllama version

* Update quantization_config.py

* lower checkpoint_format and backend

* check none

* cleanup

* Update quantization_config.py

* fix self.use_exllama == False

* spell

* fix unittest

* fix unittest

---------

Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format again

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gptqmodel version (#6)

* update gptqmodel version

* update gptqmodel version

* fix unit test (#5)

* update gptqmodel version

* update gptqmodel version

* "not self.use_exllama" is not equivalent to "self.use_exllama==False"

* fix unittest

* update gptqmodel version

* backend is loading_attibutes (#7)

* fix format and tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device mismatch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix result check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* review: update docs (#10)

* review: update docs (#12)

* review: update docs

* fix typo

* update tests for gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update document (#9)

* update overview.md

* cleanup

* Update overview.md

* Update overview.md

* Update overview.md

* update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* typo

* doc note for asymmetric quant

* typo with apple silicon(e)

* typo for marlin

* column name revert: review

* doc rocm support

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-15 14:22:49 +01:00
Mohamed Mekkouri
a11041ffad
Fix : add require_read_token for gemma2 gated model (#35687)
fix gemma2 gated model test
2025-01-14 11:47:05 +01:00
Mohamed Mekkouri
df2a812e95
Fix expected output for ggml test (#35686)
fix expected output
2025-01-14 11:46:55 +01:00
Mohamed Mekkouri
050636518a
Fix : HQQ config when hqq not available (#35655)
* fix

* make style

* adding require_hqq

* make style
2025-01-14 11:37:37 +01:00
Fanli Lin
2fa876d2d8
[tests] make cuda-only tests device-agnostic (#35607)
* intial commit

* remove unrelated files

* further remove

* Update test_trainer.py

* fix style
2025-01-13 14:48:39 +01:00
Yijun Lee
e5fd865eba
Add Gemma2 GGUF support (#34002)
* initial setup for ggml.py

* initial setup of GGUFGemma2Converter class

* Add gemma2 model to gguf.md doc

* Partial work on GGUF_TENSOR_MAPPING

* initial setup of GGUF_TENSOR_MAPPING for Gemma2

* refactor: rename GemmaConvert class to GemmaConverter for naming consistency

* feat: complete gemma2 tensor mapping implementation

* feat: add initial implementation of GGUFGemmaConverter

* feat: complete GGUFGemmaConverter implementation

* feat: add test code for gemma2

* refactor: minor code cleanup

* refactor: minor code cleanup

* fix: resolve suggestions

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Isotr0py <2037008807@qq.com>

---------

Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-03 14:50:07 +01:00
Matthew Douglas
6b1e86fd4d
Fix new BNB test failures (#35345) 2025-01-02 11:24:52 +01:00
Andrei Panferov
64c05eecd6
HIGGS Quantization Support (#34997)
* higgs init

* working with crunches

* per-model workspaces

* style

* style 2

* tests and style

* higgs tests passing

* protecting torch import

* removed torch.Tensor type annotations

* torch.nn.Module inheritance fix maybe

* hide inputs inside quantizer calls

* style structure something

* Update src/transformers/quantizers/quantizer_higgs.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* reworked num_sms

* Update src/transformers/integrations/higgs.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* revamped device checks

* docstring upd

* Update src/transformers/quantizers/quantizer_higgs.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* edited tests and device map assertions

* minor edits

* updated flute cuda version in docker

* Added p=1 and 2,3bit HIGGS

* flute version check update

* incorporated `modules_to_not_convert`

* less hardcoding

* Fixed comment

* Added docs

* Fixed gemma support

* example in docs

* fixed torch_dtype for HIGGS

* Update docs/source/en/quantization/higgs.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Collection link

* dequantize interface

* newer flute version, torch.compile support

* unittest message fix

* docs update compile

* isort

* ValueError instead of assert

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-23 16:54:49 +01:00
Mohamed Mekkouri
59178780a6
Fix : VPTQ test (#35394)
fix_test
2024-12-23 16:27:46 +01:00
wejoncy
4e27a4009d
FEAT : Adding VPTQ quantization method to HFQuantizer (#34770)
* init vptq

* add integration

* add vptq support

fix readme

* add tests && format

* format

* address comments

* format

* format

* address comments

* format

* address comments

* remove debug code

* Revert "remove debug code"

This reverts commit ed3b3eaaba.

* fix test

---------

Co-authored-by: Yang Wang <wyatuestc@gmail.com>
2024-12-20 09:45:53 +01:00
jiqing-feng
69e31eb1bf
change bnb tests (#34713)
* fix training tests

* fix xpu check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm pdb

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix 4bit logits check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix 4bit logits check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add xpu check on int8 training

* fix training tests

* add llama test on bnb

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* only cpu and xpu disable autocast training

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
2024-12-18 09:49:59 -05:00
Mohamed Mekkouri
85eb339231
Fix : model used to test ggml conversion of Falcon-7b is incorrect (#35083)
fixing test model
2024-12-16 13:21:44 +01:00
George
e4e404fdd0
Run model as compressed/uncompressed mode (#34719)
* draft, run model as compreszed/uncompressed mode

* draft

* run run_compressed=False

* run_compressed as attr

* set run_compressed=False using quantization_config

* remove redundant line

* make is_qat_trainable dependent on run_compressed status

* add tests

* lint

* full in docstring

* add decompress

* comments

* decompress if model is compresssed and not run_compressed

* apply_quant_config logic fix -- populate statedict properly

* comments

* remove non  compressed model

* make is_compressed as property

* cosmetic

* run apply_quant_config for non-compressed models -- popualte scales and zeropoints

* add pahtway for decompressing sparse models

* typo on is_quantization_compressed

* lint

* fix typo
2024-12-13 08:23:31 +01:00
Matthew Douglas
34f4080ff5
[CI] Fix bnb quantization tests with accelerate>=1.2.0 (#35172) 2024-12-09 13:55:16 -05:00
Mohamed Mekkouri
7238387f67
Fix typo in EETQ Tests (#35160)
fix
2024-12-09 14:13:36 +01:00
Mohamed Mekkouri
0e805e6d1e
Skipping aqlm non working inference tests till fix merged (#34865) 2024-11-26 11:09:30 +01:00
Mohamed Mekkouri
890ea7de93
Fix failling GGML test (#34871)
fix_test
2024-11-25 18:04:52 +01:00
Mohamed Mekkouri
4e6b19cd95
Fix : BitNet tests (#34895)
* fix_tests_bitnet

* fix format
2024-11-25 16:47:14 +01:00
Mohamed Mekkouri
54be2d7ae8
Bitnet test fix to avoid using gated model (#34863)
small test fix
2024-11-22 17:18:49 +01:00
farrosalferro
c57eafdaa1
Add Nemotron GGUF Loading Support (#34725)
* Add Nemotron GGUF Loading Support

* fix the Nemotron architecture assignation

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-11-21 11:37:34 +01:00
Marc Sun
3cb8676a91
Fix CI by tweaking torchao tests (#34832) 2024-11-20 20:28:51 +01:00
Marc Sun
67890de3b8
Torchao weights only + prequantized compability (#34355)
* weights only compability

* better tests from code review

* ping torch version

* add weights_only check
2024-11-20 17:24:45 +01:00
Isotr0py
e83aaaa86b
Fix use_parallel_residual and qkv_bias for StableLM GGUF config extraction (#34450)
* fix stablelm qkv_bias

* fix stablelm qkv_bias and use_parallel_residual

* remove original_model.config for stablelm gguf test
2024-11-05 18:26:20 +01:00
Benjamin Bossan
5e1fd4e204
FIX: Broken repr of TorchAoConfig (#34560)
FIX Broken repr of TorchAoConfig

The __repr__ method references a non-existent self.kwargs. This is now
fixed.

There does not appear to be a uniform way of defining __repr__ for
quantization configs. I copied the method as implemented for HQQ:

e2ac16b28a/src/transformers/utils/quantization_config.py (L285-L287)
2024-11-05 10:26:13 +01:00
Vladislav Bronzov
5251fe6271
Add GGUF for Mamba (#34200)
* add mamba architecture for gguf

* add logic for weights conversion, some fixes and refactoring

* add lm_head layers, unit test refactoring

* more fixes for tests

* remove lm_head creation

* remove unused comments
2024-10-30 16:52:17 +01:00
Marc Sun
004530aa05
Fix regression loading dtype (#34409)
* fix regression

* add test for torchao

* expected output

* better fix
2024-10-29 11:41:04 +01:00
Matthew Douglas
e447185b1f
Fix bnb training test failure (#34414)
* Fix bnb training test: compatibility with OPTSdpaAttention
2024-10-25 10:23:20 -04:00
김준재
dd267fca72
Add T5 GGUF loading support (#33389)
* add: GGUFT5Converter

* add: tensormapping for t5

* add: test code for t5

* fix: Remove whitespace from blank line

* add: t5 fp16 tests

* fix: whitespace formatting

* fix: minor formatting

* fix: testing every weights
2024-10-24 15:10:59 +02:00
Vladislav Bronzov
cb5ca3265f
Add GGUF for starcoder2 (#34094)
* add starcoder2 arch support for gguf

* fix q6 test
2024-10-14 10:22:49 +02:00
Vladislav Bronzov
c9afee5392
Add gguf support for gpt2 (#34044)
* add gpt2 gguf support

* add doc change

* small refactoring
2024-10-10 13:42:18 +02:00
Mohamed Mekkouri
36d410dab6
FEAT : Adding BitNet quantization method to HFQuantizer (#33410)
* rebasing changes

* fixing style

* adding some doc to functions

* remove bitblas

* change dtype

* fixing check_code_quality

* fixing import order

* adding doc to tree

* Small update on BitLinear

* adding some tests

* sorting imports

* small update

* reformatting

* reformatting

* reformatting with ruff

* adding assert

* changes after review

* update disk offloading

* adapting after review

* Update after review

* add is_serializable back

* fixing style

* adding serialization test

* make style

* small updates after review
2024-10-09 17:51:41 +02:00
Vladislav Bronzov
faa0f63b93
Add gguf support for StableLM (#33793)
* add stablelm gguf architecture support

* add additional quantization tests

* resolve merge conflict, add weight conversion tests for fp16
2024-10-09 12:16:13 +02:00
Vladislav Bronzov
22e102ad98
Bug fix gguf qwen2moe (#33940)
* fix qwen2moe tensors mapping, add unit tests

* add expert tensor split logic, test refactoring

* small params refactoring

* add comment to tensor reshaping
2024-10-05 16:19:01 +02:00
jiqing-feng
b916efcb3c
Enables CPU AWQ model with IPEX version. (#33460)
* enable cpu awq ipex linear

* add doc for cpu awq with ipex kernel

* add tests for cpu awq

* fix code style

* fix doc and tests

* Update docs/source/en/quantization/awq.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/autoawq/test_awq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix comments

* fix log

* fix log

* fix style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-10-04 16:25:10 +02:00
Marc Sun
cac4a4876b
[Quantization] Switch to optimum-quanto (#31732)
* switch to optimum-quanto rebase squach

* fix import check

* again

* test try-except

* style
2024-10-02 15:14:34 +02:00
g-prz
fe484726aa
Add falcon gguf (#33437)
* feat(gguf): add falcon q2 k

* fix(gguf): remove useless renaming

* feat(gguf): seperate falcon 7b and 40b

* feat(gguf): apply fixup

* fix(test): error rebase

* feat(gguf): add fp16 weight comparison for falcon

* feat(gguf): test weight of all layers

* test(gguf): add falcon 40b under skip decorator

* feat(gguf): quick example for extracting model size
2024-10-02 14:10:39 +02:00
mobicham
f5247aca01
Hqq serialization (#33141)
* HQQ model serialization attempt

* fix hqq dispatch and unexpected keys

* style

* remove check_old_param

* revert to check HQQLinear in quantizer_hqq.py

* revert to check HQQLinear in quantizer_hqq.py

* update HqqConfig default params

* make ci happy

* make ci happy

* revert to HQQLinear check in quantizer_hqq.py

* check hqq_min version 0.2.0

* set axis=1 as default in quantization_config.py

* validate_env with hqq>=0.2.0 version message

* deprecated hqq kwargs message

* make ci happy

* remove run_expected_keys_check hack + bump to 0.2.1 min hqq version

* fix unexpected_keys hqq update

* add pre_quantized check

* add update_expected_keys to base quantizerr

* ci base.py fix?

* ci base.py fix?

* fix "quantization typo" src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix post merge

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-30 14:47:18 +02:00
Vladislav Bronzov
9d200cfbee
Add gguf support for bloom (#33473)
* add bloom arch support for gguf

* apply format

* small refactoring, bug fix in GGUF_TENSOR_MAPPING naming

* optimize bloom GGUF_TENSOR_MAPPING

* implement reverse reshaping for bloom gguf

* add qkv weights test

* add q_8 test for bloom
2024-09-27 12:13:40 +02:00
Benjamin Fineran
574a9e12bb
HFQuantizer implementation for compressed-tensors library (#31704)
* Add compressed-tensors HFQuantizer implementation

* flag serializable as False

* run

* revive lines deleted by ruff

* fixes to load+save from sparseml, edit config to quantization_config, and load back

* address satrat comment

* compressed_tensors to compressed-tensors and revert back is_serializable

* rename quant_method from sparseml to compressed-tensors

* tests

* edit tests

* clean up tests

* make style

* cleanup

* cleanup

* add test skip for when compressed tensors is not installed

* remove pydantic import + style

* delay torch import in test

* initial docs

* update main init for compressed tensors config

* make fix-copies

* docstring

* remove fill_docstring

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* review comments

* review comments

* comments - suppress warnings on state dict load, tests, fixes

* bug-fix - remove unnecessary call to apply quant lifecycle

* run_compressed compatability

* revert changes not needed for compression

* no longer need unexpected keys fn

* unexpected keys not needed either

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* add to_diff_dict

* update docs and expand testing

* Update _toctree.yml with compressed-tensors

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update doc

* add note about saving a loaded model

---------

Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
2024-09-25 14:31:38 +02:00