* Add fast image processor support for Swin2SR
* Add Swin2SR tests of fast image processing
* Update docs and remove unnecessary test func
* Fix docstring formatting
* Skip fast vs slow processing test
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* i guessreverted all CdGen classes
* style
* llava onevision
* fix copies
* fix some tests
* some more tests
* dump
* skip these
* nevermind, i am dumb
* revert fix not needed
* fixup
* fixup
* another fixup
* more fixup to make ci finally happy
* fixup after rebasing
* fix qwen tests
* add internVL + typos here and there
* image token index -> id
* style
* fix init weights
* revert blip-2 not supported
* address comments
* fix copies
* revert blip2 test file as well
* as discussed internally, revert back CdGen models
* fix some tests
* fix more tests for compile
* CI red
* fix copies
* enumerate explicitly allowed models
* address comments
* fix tests
* fixup
* style again
* add tests for new model class
* another fixup ( x _ x )
* [fixup] unused attributes can be removed post-deprecation
* Enable granite speech 3.3 tests
* skip sdpa test for granite speech
* Explicitly move model to device
* Use granite speech 2b in tests
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* args keep_torch_compile=False in _save and _wwrap_method
* Fix FSDP execution on evaluation for torch_compile mode
* add test trainer FSDP + Torch Compile
* fix quality code
* make style
* Revert " make style"
This reverts commit 77e797f8829c50992cc21496be3d9a3e480e1c97.
* make style
* enable xpu in test_trainer
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* enhance _device_agnostic_dispatch to cover value
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* add default values for torch not available case
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* [fix] one pixel should be added when length is odd
* [fix] add vision_aspect_ratio args & typo
* [fix] style
* [fix] do not fix fast file directly
* [fix] convert using modular
* remove duplicate codes
* match unpad logic with pad logic
* test odd-sized images for llava & aria
* test unpad odd-sized padding for llava family
* fix style
* add kwarg to onvision modular
* move vision_aspect_ratio from image_processor to processor
(llava_onevision)
* add num_tokens_to_discard to the forward of Dinov2ForImageClassification
* redefine forward in modular file, remove change to modeling_dinov2 file
* run make fixup
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Implements last migrations for generation from `config.vocab_size` to `config.get_text_config().vocab.size`
In doing so, we enable multimodal models to fully leverage all existing generation features.
* Let notification service succeed even when artifacts and reported jobs on github have mismatch
* Use default trace msg if no trace msg available
* Add pop_default helper fn
* style
Summary:
Currently when we try to quantize input_embedding for some models, the output embedding
(lm_head) will also be quantized the same way, since they are tied, and this may not be what
we want. To break the tie, we added the option to allow people to
1. load unquantized weight
2. tie weights
3. quantize
so that the tie will be broken
Test Plan:
```
from transformers import (
AutoModelForCausalLM,
AutoProcessor,
AutoTokenizer,
TorchAoConfig,
)
from torchao.quantization.quant_api import (
IntxWeightOnlyConfig,
Int8DynamicActivationIntxWeightConfig,
AOPerModuleConfig
)
from torchao.quantization.granularity import PerGroup, PerAxis
import torch
model_id = "microsoft/Phi-4-mini-instruct"
embedding_config = IntxWeightOnlyConfig(
weight_dtype=torch.int8,
granularity=PerAxis(0),
)
linear_config = Int8DynamicActivationIntxWeightConfig(
weight_dtype=torch.int4,
weight_granularity=PerGroup(32),
weight_scale_dtype=torch.bfloat16,
)
quant_config = AOPerModuleConfig({"_default": linear_config, "model.embed_tokens": embedding_config})
quantization_config = TorchAoConfig(quant_type=quant_config, include_embedding=True, untie_embedding_weights=True)
quantized_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32, device_map="auto", quantization_config=quantization_config)
tokenizer = AutoTokenizer.from_pretrained(model_id)
print(quantized_model)
print("embed_tokens.weight:", quantized_model.model.embed_tokens.weight)
print("lm head weight:", quantized_model.lm_head.weight)
from transformers.modeling_utils import find_tied_parameters
print(find_tied_parameters(quantized_model))
```
Reviewers:
Subscribers:
Tasks:
Tags:
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* aligning for vllm
* using input shape rather than attn outputs
* remove demo
* revert Conv1D
* style
* style
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix copies
* Apply suggestions from code review
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* adding docs about vllm
* chore: style
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* rm already deprecated padding max length
* truncate_strategy AS AN ARG is already deprecated for a few years
* fix
* rm test_padding_to_max_length
* rm pad_to_max_length=True in other tests
* rm from common
* missed fnet
* Support `AOPerModuleConfig` and include_embedding
Summary:
This PR adds support per module configuration for torchao
Also added per module quantization examples:
1. Quantizing different layers with different quantization configs
2. Skip quantization for certain layers
Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding
python tests/quantization/torchao_integration/test_torchao.py -k test_per_module_config_skip
Reviewers:
Subscribers:
Tasks:
Tags:
* format
* format
* inlcude embedding remove input embedding from module not to convert
* more docs
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_torchao.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update src/transformers/quantizers/quantizer_torchao.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Enhance documentation to explain chat-based few-shot prompting
Updates the documentation on few-shot prompting to illustrate how to structure examples using the chat-based format for instruction-tuned models.
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix typos
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Support FlaxPreTrainedModel to load model checkpoint from subfolder in local directory as safetensors format
Signed-off-by: Yan Zhao <zhao.y4@northeastern.edu>