* v1 fusing modules
* add fused mlp support
* up
* fix CI
* block save_pretrained
* fixup
* small fix
* add new condition
* add v1 docs
* add some comments
* style
* fix nit
* adapt from suggestion
* add check
* change arg names
* change variables name
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* style
* split up into 3 different private methods
* more conditions
* more checks
* add fused tests for custom models
* fix
* fix tests
* final update docs
* final fixes
* fix importlib metadata
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change it to `do_fuse`
* nit
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* few fixes
* revert
* fix test
* fix copies
* raise error if model is not quantized
* add test
* use quantization_config.config when fusing
* Update src/transformers/modeling_utils.py
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>