transformers/tests/quantization
Marc Sun 9ea1eacd11
remove to restriction for 4-bit model (#33122)
* remove to restiction for 4-bit model

* Update src/transformers/modeling_utils.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda

* quality fix

* Improve warning message for .to() and .cuda() on bnb quantized models

---------

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
2024-09-02 16:28:50 +02:00
..
aqlm_integration Cache: use batch_size instead of max_batch_size (#32657) 2024-08-16 11:48:45 +01:00
autoawq Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
bnb remove to restriction for 4-bit model (#33122) 2024-09-02 16:28:50 +02:00
eetq_integration [FEAT]: EETQ quantizer support (#30262) 2024-04-22 20:38:58 +01:00
fbgemm_fp8 Add new quant method (#32047) 2024-07-22 20:21:59 +02:00
ggml Support dequantizing GGUF FP16 format (#31783) 2024-07-24 17:59:59 +02:00
gptq 🚨 Remove dataset with restrictive license (#31452) 2024-06-17 17:56:51 +01:00
hqq Quantization / HQQ: Fix HQQ tests on our runner (#30668) 2024-05-06 11:33:52 +02:00
quanto_integration Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
torchao_integration Add TorchAOHfQuantizer (#32306) 2024-08-14 16:14:24 +02:00