# SpQR [SpQR](https://github.com/Vahe1994/SpQR) quantization algorithm involves a 16x16 tiled bi-level group 3-bit quantization structure, with sparse outliers as detailed in [SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression](https://arxiv.org/abs/2306.03078). To SpQR-quantize a model, refer to the [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR) repository. Load a pre-SpQR-quantized model in [`~PreTrainedModel.from_pretrained`]. ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch quantized_model = AutoModelForCausalLM.from_pretrained( "elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf", torch_dtype=torch.half, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf") ```