Mohamed Mekkouri
47cc4da351
Changing the test model in Quanto kv cache ( #36670 )
...
changing model
2025-03-13 12:23:34 +01:00
Mohamed Mekkouri
0013ba61e5
Fix Failing GPTQ tests ( #36666 )
...
fix tests
2025-03-12 20:03:02 +01:00
Mohamed Mekkouri
a7fbab33ae
Fix Expected output for compressed-tensors tests ( #36425 )
...
fix
2025-02-26 21:17:24 +01:00
Fanli Lin
c3700b0eee
[tests] enable autoawq tests on XPU ( #36327 )
...
add autoawq
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 13:38:09 +01:00
Dmitry Rogozhkin
b4b9da6d9b
tests: revert change of torch_require_multi_gpu to be device agnostic ( #35721 )
...
* tests: revert change of torch_require_multi_gpu to be device agnostic
The 11c27dd33
modified `torch_require_multi_gpu()` to be device agnostic
instead of being CUDA specific. This broke some tests which are rightfully
CUDA specific, such as:
* `tests/trainer/test_trainer_distributed.py::TestTrainerDistributed`
In the current Transformers tests architecture `require_torch_multi_accelerator()`
should be used to mark multi-GPU tests agnostic to device.
This change addresses the issue introduced by 11c27dd33
and reverts
modification of `torch_require_multi_gpu()`.
Fixes: 11c27dd33
("Enable BNB multi-backend support (#31098 )")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* fix bug: modification of frozen set
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 13:36:10 +01:00
jiqing-feng
9d6abf9778
enable torchao quantization on CPU ( #36146 )
...
* enable torchao quantization on CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix int4
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable CPU torchao tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix cuda tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix cpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix style
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix cuda tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torchao available
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torchao available
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix torchao config cannot convert to json
* fix docs
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* rm to_dict to rebase
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* limited torchao version for CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix skip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Update src/transformers/testing_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix cpu test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-25 11:06:52 +01:00
Jerry Zhang
2af272c101
Add autoquant support for torchao quantizer ( #35503 )
...
* Add autoquant support for torchao quantizer
Summary:
att, also verified that autoquantized model can be saved and loaded:
save: https://gist.github.com/jerryzh168/01d367aaf44dbbbfd4068a4a10a00061
load: https://gist.github.com/jerryzh168/d5c6c401b2abdf18e0b6771341f1525c
Test Plan:
tested locally with above script
model uploaded to https://huggingface.co/jerryzh168/llama3-8b-autoquant
Reviewers:
Subscribers:
Tasks:
Tags:
* add test
* ruff fix
* ruff reformat
* add docs and min_sqnr support
* format
* format
* fix test
* update doc
* format
* remove disable_compile
* format
2025-02-24 15:54:16 +01:00
Rahul Tuli
884a8ea1f0
Improve model loading for compressed tensor models ( #36152 )
...
* Disable warnings for stacked compressors
* Introduce two new hooks in HfQuantizer lifecycle
to allow updates to missing and unexpected keys
* Update missing and unexpected keys
for stacked compressors
* Add tests
* Fix: run_compressed cases
* Fix: uncompressed cases
* Rename compressed_tensor folder to compressed_tensors
Move RunCompressedTest to the same file
Update tests to unittest
2025-02-24 13:47:21 +01:00
Fanli Lin
4dbf17c17f
[tests] enable bnb tests on xpu ( #36233 )
...
* fix failed test
* fix device
* fix more device cases
* add more cases
* fix empty cache
* Update test_4bit.py
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-24 11:30:15 +01:00
Fanli Lin
7c5bd24ffa
[tests] make quanto tests device-agnostic ( #36328 )
...
* make device-agnostic
* name change
2025-02-21 14:20:40 +01:00
andrewor14
fdcfdbfd22
Fix TorchAoConfig not JSON serializable ( #36206 )
...
**Summary:** TorchAoConfig optionally contains a
`torchao.dtypes.Layout` object which is a dataclass and not
JSON serializable, and so the following fails:
```
import json
from torchao.dtypes import TensorCoreTiledLayout
from transformers import TorchAoConfig
config = TorchAoConfig("int4_weight_only", layout=TensorCoreTiledLayout())
config.to_json_string()
json.dumps(config.to_dict())
```
This also causes `quantized_model.save_pretrained(...)` to
fail because the first step of this call is to JSON serialize
the config. Fixes https://github.com/pytorch/ao/issues/1704 .
**Test Plan:**
python tests/quantization/torchao_integration/test_torchao.py -k test_json_serializable
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-18 11:05:42 +01:00
David LaPalomento
b45cf0e90a
Guard against unset resolved_archive_file ( #35628 )
...
* archive_file may not be specified
When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.
* Remap partial disk offload to cpu for GGUF files
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.
* Don't remap auto device_map and raise RuntimeError
If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-14 14:44:31 +01:00
Mohamed Mekkouri
cb586a3999
Add require_read_token to fp8 tests ( #36189 )
...
fix
2025-02-14 12:27:35 +01:00
Andrei Panferov
5f726f8b8e
New HIGGS quantization interfaces, JIT kernel compilation support. ( #36148 )
...
* new flute
* new higgs working
* small adjustments
* progress and quallity
* small updates
* style
---------
Co-authored-by: Andrey Panferov <panferov.andrey3@wb.ru>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-02-14 12:26:45 +01:00
Elvir Crnčević
845b0a2616
Efficient Inference Kernel for SpQR ( #34976 )
...
* Resolve vptq conflict
* Rename spqr package to spqr_quant
* Get rid of aqlm mention
* Start working on tests
* Resolve ruff code checks
* Ruff format
* Isort
* Test updates
* Add gpu tag
* Rename to modules_to_not_convert
* Config update
* Docs and config update
* Docs and config update
* Update to update_torch_dtype
* spqr config parameter validation
* Ruff update
* Apply ruff fixes
* Test fixes
* Ruff update
* Mark tests as @slow again; Ruff; Docstring update
* Ruff
* Remove absolute path
* Resolve typo
* Remove redundandt log
* Check accelerate/spqr availability
* Ruff fix
* Check if the config contains proper shapes
* Ruff test
* Documentation update
* overview update
* Ruff checks
* Ruff code quality
* Make style
* Update docs/source/en/quantization/spqr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update spqr.md
* Enable gptqmodel (#35012 )
* gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update readme
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* gptqmodel need use checkpoint_format (#1 )
* gptqmodel need use checkpoint_format
* fix quantize
* Update quantization_config.py
* Update quantization_config.py
* Update quantization_config.py
---------
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* Revert quantizer_gptq.py (#2 )
* revert quantizer_gptq.py change
* pass **kwargs
* limit gptqmodel and optimum version
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix warning
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix version check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert unrelated changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable gptqmodel tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix requires gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Fix Transformer compat (#3 )
* revert quantizer_gptq.py change
* pass **kwargs
* add meta info
* cleanup
* cleanup
* Update quantization_config.py
* hf_select_quant_linear pass checkpoint_format and meta
* fix GPTQTestCUDA
* Update test_gptq.py
* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2
* cleanup
* add backend
* cleanup
* cleanup
* no need check exllama version
* Update quantization_config.py
* lower checkpoint_format and backend
* check none
* cleanup
* Update quantization_config.py
* fix self.use_exllama == False
* spell
* fix unittest
* fix unittest
---------
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format again
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update gptqmodel version (#6 )
* update gptqmodel version
* update gptqmodel version
* fix unit test (#5 )
* update gptqmodel version
* update gptqmodel version
* "not self.use_exllama" is not equivalent to "self.use_exllama==False"
* fix unittest
* update gptqmodel version
* backend is loading_attibutes (#7 )
* fix format and tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix memory check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix device mismatch
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix result check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* review: update docs (#10 )
* review: update docs (#12 )
* review: update docs
* fix typo
* update tests for gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update document (#9 )
* update overview.md
* cleanup
* Update overview.md
* Update overview.md
* Update overview.md
* update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* typo
* doc note for asymmetric quant
* typo with apple silicon(e)
* typo for marlin
* column name revert: review
* doc rocm support
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/overview.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/overview.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix : Nemotron Processor in GGUF conversion (#35708 )
* fixing nemotron processor
* make style
* Update docs/source/en/quantization/spqr.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add missing TOC to doc
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-13 16:22:58 +01:00
Mohamed Mekkouri
efe72fe21f
Adding FP8 Quantization to transformers ( #36026 )
...
* first commit
* adding kernels
* fix create_quantized_param
* fix quantization logic
* end2end
* fix style
* fix imports
* fix consistency
* update
* fix style
* update
* udpate after review
* make style
* update
* update
* fix
* update
* fix docstring
* update
* update after review
* update
* fix scheme
* update
* update
* fix
* update
* fix docstring
* add source
* fix test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-13 13:01:19 +01:00
湛露先生
1590c66430
Fix words typos in ggml test. ( #36060 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-02-06 15:32:40 +00:00
Isotr0py
e57b459997
Split and clean up GGUF quantization tests ( #35502 )
...
* clean up ggml test
Signed-off-by: Isotr0py <2037008807@qq.com>
* port remaining tests
Signed-off-by: Isotr0py <2037008807@qq.com>
* further cleanup
Signed-off-by: Isotr0py <2037008807@qq.com>
* format
Signed-off-by: Isotr0py <2037008807@qq.com>
* fix broken tests
Signed-off-by: Isotr0py <2037008807@qq.com>
* update comment
Signed-off-by: Isotr0py <2037008807@qq.com>
* fix
Signed-off-by: Isotr0py <2037008807@qq.com>
* reorganize tests
Signed-off-by: Isotr0py <2037008807@qq.com>
* k-quants use qwen2.5-0.5B
Signed-off-by: Isotr0py <2037008807@qq.com>
* move ggml tokenization test
Signed-off-by: Isotr0py <2037008807@qq.com>
* remove dead code
Signed-off-by: Isotr0py <2037008807@qq.com>
* add assert for serilization test
Signed-off-by: Isotr0py <2037008807@qq.com>
* use str for parameterize
Signed-off-by: Isotr0py <2037008807@qq.com>
---------
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-27 15:46:57 +01:00
Arthur
b912f5ee43
use torch.testing.assertclose instead to get more details about error in cis ( #35659 )
...
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
2025-01-24 16:55:28 +01:00
Mohamed Mekkouri
a7738f5a89
Fix : Nemotron tokenizer for GGUF format ( #35836 )
...
fix nemotron gguf
2025-01-22 12:28:40 +01:00
Mohamed Mekkouri
dbd8474125
Fix : BLOOM tie_word_embeddings in GGUF ( #35812 )
...
* fix bloom ggml
* fix falcon output
* make style
2025-01-21 15:35:54 +01:00
Mohamed Mekkouri
b80e334e71
Skip Falcon 7B GGML Test ( #35783 )
...
skip test
2025-01-20 15:00:34 +01:00
jiqing-feng
387663e571
Enable gptqmodel ( #35012 )
...
* gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update readme
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* gptqmodel need use checkpoint_format (#1 )
* gptqmodel need use checkpoint_format
* fix quantize
* Update quantization_config.py
* Update quantization_config.py
* Update quantization_config.py
---------
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* Revert quantizer_gptq.py (#2 )
* revert quantizer_gptq.py change
* pass **kwargs
* limit gptqmodel and optimum version
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix warning
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix version check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert unrelated changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable gptqmodel tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix requires gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Fix Transformer compat (#3 )
* revert quantizer_gptq.py change
* pass **kwargs
* add meta info
* cleanup
* cleanup
* Update quantization_config.py
* hf_select_quant_linear pass checkpoint_format and meta
* fix GPTQTestCUDA
* Update test_gptq.py
* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2
* cleanup
* add backend
* cleanup
* cleanup
* no need check exllama version
* Update quantization_config.py
* lower checkpoint_format and backend
* check none
* cleanup
* Update quantization_config.py
* fix self.use_exllama == False
* spell
* fix unittest
* fix unittest
---------
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format again
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update gptqmodel version (#6 )
* update gptqmodel version
* update gptqmodel version
* fix unit test (#5 )
* update gptqmodel version
* update gptqmodel version
* "not self.use_exllama" is not equivalent to "self.use_exllama==False"
* fix unittest
* update gptqmodel version
* backend is loading_attibutes (#7 )
* fix format and tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix memory check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix device mismatch
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix result check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_gptq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* review: update docs (#10 )
* review: update docs (#12 )
* review: update docs
* fix typo
* update tests for gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update document (#9 )
* update overview.md
* cleanup
* Update overview.md
* Update overview.md
* Update overview.md
* update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
* Update gptq.md
---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* typo
* doc note for asymmetric quant
* typo with apple silicon(e)
* typo for marlin
* column name revert: review
* doc rocm support
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/gptq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/overview.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/overview.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-15 14:22:49 +01:00
Mohamed Mekkouri
a11041ffad
Fix : add require_read_token for gemma2 gated model ( #35687 )
...
fix gemma2 gated model test
2025-01-14 11:47:05 +01:00
Mohamed Mekkouri
df2a812e95
Fix expected output for ggml test ( #35686 )
...
fix expected output
2025-01-14 11:46:55 +01:00
Mohamed Mekkouri
050636518a
Fix : HQQ config when hqq not available ( #35655 )
...
* fix
* make style
* adding require_hqq
* make style
2025-01-14 11:37:37 +01:00
Fanli Lin
2fa876d2d8
[tests] make cuda-only tests device-agnostic ( #35607 )
...
* intial commit
* remove unrelated files
* further remove
* Update test_trainer.py
* fix style
2025-01-13 14:48:39 +01:00
Yijun Lee
e5fd865eba
Add Gemma2 GGUF support ( #34002 )
...
* initial setup for ggml.py
* initial setup of GGUFGemma2Converter class
* Add gemma2 model to gguf.md doc
* Partial work on GGUF_TENSOR_MAPPING
* initial setup of GGUF_TENSOR_MAPPING for Gemma2
* refactor: rename GemmaConvert class to GemmaConverter for naming consistency
* feat: complete gemma2 tensor mapping implementation
* feat: add initial implementation of GGUFGemmaConverter
* feat: complete GGUFGemmaConverter implementation
* feat: add test code for gemma2
* refactor: minor code cleanup
* refactor: minor code cleanup
* fix: resolve suggestions
* Update tests/quantization/ggml/test_ggml.py
Co-authored-by: Isotr0py <2037008807@qq.com>
---------
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-03 14:50:07 +01:00
Matthew Douglas
6b1e86fd4d
Fix new BNB test failures ( #35345 )
2025-01-02 11:24:52 +01:00
Andrei Panferov
64c05eecd6
HIGGS Quantization Support ( #34997 )
...
* higgs init
* working with crunches
* per-model workspaces
* style
* style 2
* tests and style
* higgs tests passing
* protecting torch import
* removed torch.Tensor type annotations
* torch.nn.Module inheritance fix maybe
* hide inputs inside quantizer calls
* style structure something
* Update src/transformers/quantizers/quantizer_higgs.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* reworked num_sms
* Update src/transformers/integrations/higgs.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* revamped device checks
* docstring upd
* Update src/transformers/quantizers/quantizer_higgs.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* edited tests and device map assertions
* minor edits
* updated flute cuda version in docker
* Added p=1 and 2,3bit HIGGS
* flute version check update
* incorporated `modules_to_not_convert`
* less hardcoding
* Fixed comment
* Added docs
* Fixed gemma support
* example in docs
* fixed torch_dtype for HIGGS
* Update docs/source/en/quantization/higgs.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Collection link
* dequantize interface
* newer flute version, torch.compile support
* unittest message fix
* docs update compile
* isort
* ValueError instead of assert
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-23 16:54:49 +01:00
Mohamed Mekkouri
59178780a6
Fix : VPTQ test ( #35394 )
...
fix_test
2024-12-23 16:27:46 +01:00
wejoncy
4e27a4009d
FEAT : Adding VPTQ quantization method to HFQuantizer ( #34770 )
...
* init vptq
* add integration
* add vptq support
fix readme
* add tests && format
* format
* address comments
* format
* format
* address comments
* format
* address comments
* remove debug code
* Revert "remove debug code"
This reverts commit ed3b3eaaba
.
* fix test
---------
Co-authored-by: Yang Wang <wyatuestc@gmail.com>
2024-12-20 09:45:53 +01:00
jiqing-feng
69e31eb1bf
change bnb tests ( #34713 )
...
* fix training tests
* fix xpu check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* rm pdb
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix 4bit logits check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix 4bit logits check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* add xpu check on int8 training
* fix training tests
* add llama test on bnb
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* only cpu and xpu disable autocast training
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
2024-12-18 09:49:59 -05:00
Mohamed Mekkouri
85eb339231
Fix : model used to test ggml conversion of Falcon-7b is incorrect ( #35083 )
...
fixing test model
2024-12-16 13:21:44 +01:00
George
e4e404fdd0
Run model as compressed/uncompressed mode ( #34719 )
...
* draft, run model as compreszed/uncompressed mode
* draft
* run run_compressed=False
* run_compressed as attr
* set run_compressed=False using quantization_config
* remove redundant line
* make is_qat_trainable dependent on run_compressed status
* add tests
* lint
* full in docstring
* add decompress
* comments
* decompress if model is compresssed and not run_compressed
* apply_quant_config logic fix -- populate statedict properly
* comments
* remove non compressed model
* make is_compressed as property
* cosmetic
* run apply_quant_config for non-compressed models -- popualte scales and zeropoints
* add pahtway for decompressing sparse models
* typo on is_quantization_compressed
* lint
* fix typo
2024-12-13 08:23:31 +01:00
Matthew Douglas
34f4080ff5
[CI] Fix bnb quantization tests with accelerate>=1.2.0 ( #35172 )
2024-12-09 13:55:16 -05:00
Mohamed Mekkouri
7238387f67
Fix typo in EETQ Tests ( #35160 )
...
fix
2024-12-09 14:13:36 +01:00
Mohamed Mekkouri
0e805e6d1e
Skipping aqlm non working inference tests till fix merged ( #34865 )
2024-11-26 11:09:30 +01:00
Mohamed Mekkouri
890ea7de93
Fix failling GGML test ( #34871 )
...
fix_test
2024-11-25 18:04:52 +01:00
Mohamed Mekkouri
4e6b19cd95
Fix : BitNet tests ( #34895 )
...
* fix_tests_bitnet
* fix format
2024-11-25 16:47:14 +01:00
Mohamed Mekkouri
54be2d7ae8
Bitnet test fix to avoid using gated model ( #34863 )
...
small test fix
2024-11-22 17:18:49 +01:00
farrosalferro
c57eafdaa1
Add Nemotron GGUF Loading Support ( #34725 )
...
* Add Nemotron GGUF Loading Support
* fix the Nemotron architecture assignation
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-11-21 11:37:34 +01:00
Marc Sun
3cb8676a91
Fix CI by tweaking torchao tests ( #34832 )
2024-11-20 20:28:51 +01:00
Marc Sun
67890de3b8
Torchao weights only + prequantized compability ( #34355 )
...
* weights only compability
* better tests from code review
* ping torch version
* add weights_only check
2024-11-20 17:24:45 +01:00
Isotr0py
e83aaaa86b
Fix use_parallel_residual
and qkv_bias
for StableLM GGUF config extraction ( #34450 )
...
* fix stablelm qkv_bias
* fix stablelm qkv_bias and use_parallel_residual
* remove original_model.config for stablelm gguf test
2024-11-05 18:26:20 +01:00
Benjamin Bossan
5e1fd4e204
FIX: Broken repr of TorchAoConfig ( #34560 )
...
FIX Broken repr of TorchAoConfig
The __repr__ method references a non-existent self.kwargs. This is now
fixed.
There does not appear to be a uniform way of defining __repr__ for
quantization configs. I copied the method as implemented for HQQ:
e2ac16b28a/src/transformers/utils/quantization_config.py (L285-L287)
2024-11-05 10:26:13 +01:00
Vladislav Bronzov
5251fe6271
Add GGUF for Mamba ( #34200 )
...
* add mamba architecture for gguf
* add logic for weights conversion, some fixes and refactoring
* add lm_head layers, unit test refactoring
* more fixes for tests
* remove lm_head creation
* remove unused comments
2024-10-30 16:52:17 +01:00
Marc Sun
004530aa05
Fix regression loading dtype ( #34409 )
...
* fix regression
* add test for torchao
* expected output
* better fix
2024-10-29 11:41:04 +01:00
Matthew Douglas
e447185b1f
Fix bnb training test failure ( #34414 )
...
* Fix bnb training test: compatibility with OPTSdpaAttention
2024-10-25 10:23:20 -04:00
김준재
dd267fca72
Add T5 GGUF loading support ( #33389 )
...
* add: GGUFT5Converter
* add: tensormapping for t5
* add: test code for t5
* fix: Remove whitespace from blank line
* add: t5 fp16 tests
* fix: whitespace formatting
* fix: minor formatting
* fix: testing every weights
2024-10-24 15:10:59 +02:00