* Initial commit for MyT5 model
* custom implementation of MyT5 tokenizer, unused files deleted
* unittest for myt5 tokenizer
* upadate of import structure and style
* removed remmanents of MyT5Config
* fixed docstrings
* Updates after review: filled documentaion file, new docstrings and tests added
* Fixed code style issues
* fixed copied from to refer to function
* updated loading myt5 tokenizer in tests, added sample byte map file to fixtures
* changes after review
* removed redundant copied from
* removed redundant copied from
* optimalization and loading model from hf
* [run_slow] myt5
* [run-slow] myt5
* Updated en documentation for myt5
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* onboard phimoe model
* removed debug code
* added unit tests
* updated docs
* formatted
* fixed unit tests
* fixed test case
* fixed format
* refactored code
* fixed expected outputs in the integration tests
* Added a warning msg
* Addressed comments
* Addressed comments
* fixed test cases
* added paper link
* Addressed comments
* Refactored PhimoeForCausalLM forward fn
* Refactored PhimoeRotaryEmbedding class
* fixed test cases
* fixed testcase
* fixed test case
* Addressed comments
* fixed test cases
* fixed testcases
* Used cache position instead to get the seq len
* intilize new embeddings from normal distrib
* Fix typo in comments
* Fix typo in comments
* Fix style
* Fix variables naming
* Add tests
* Fix style
* code consistency nit
* Add deepspeed support
* Add deepspeed support
* Conver embeddings weights to float32 before computations
* Add deepspeed tests
* Cover when vocab_size is smaller than embedding_size
* Style fix
* Add tests for vocab_size smaller than hiddin_size
* Style fix
* Nits in tests
* Nits in tests
* Check for deepspeed before importing it
* Increase vocab_size for positive definite covariance matrix test
* Add warning
* Add multivariate_resizing flag and implement resizing for lm_heads
* Fix typo
* Fix wrong bias indexing
* Fix bias is zero check
* remove multivariate_resizing flag from tests
* Intialize bias from old bias normal distribution
* Fixup
* Code usability
* Use mean_resizing instead of multivariate_resizing
* Fix up
* Fix comments and docs
* enable cpu awq ipex linear
* add doc for cpu awq with ipex kernel
* add tests for cpu awq
* fix code style
* fix doc and tests
* Update docs/source/en/quantization/awq.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/autoawq/test_awq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix comments
* fix log
* fix log
* fix style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Error condition bug fix
* Update error message
* Update src/transformers/models/qwen2_vl/modeling_qwen2_vl.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Making change in the rest of the repo
* Formatting
* Formatting with ruff
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add support for `weights_only` flag when loading state_dict
Summary:
This is to enable loading a state_dict with wrapper tensor subclasses (used in torchao to
for quantized weights)
Test Plan:
tested locally with torchao weights, also need https://github.com/huggingface/transformers/pull/32306:
```
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TorchAoConfig
from torchao.utils import benchmark_model
import torchao
DEVICE_TYPE = "cuda"
def init_model_and_benchmark(model_id, torch_dtype=torch.bfloat16, quantization_config=None):
tokenizer = AutoTokenizer.from_pretrained(model_id)
if quantization_config is not None:
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, quantization_config=quantization_config)
else:
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, weights_only=False)
# sanity check: run the model
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt").to(DEVICE_TYPE)
output = model.generate(**input_ids, max_new_tokens=1000)
print(tokenizer.decode(output[0], skip_special_tokens=True))
NUM_WARMUP = 1
NUM_RUNS = 5
if quantization_config is not None:
torchao.quantization.utils.recommended_inductor_config_setter()
model = torch.compile(model, mode="max-autotune")
benchmark_model(model.generate, NUM_WARMUP, kwargs=input_ids, device_type=DEVICE_TYPE)
print("running benchmark")
results = benchmark_model(model.generate, NUM_RUNS, kwargs=input_ids, device_type=DEVICE_TYPE)
return model, results
model_id = "jerryzh168/test-model"
torchao.quantization.utils.recommended_inductor_config_setter()
bf16_model, bf16_time = init_model_and_benchmark(model_id)
print(f"bf16: {bf16_time}")
```
Reviewers:
Subscribers:
Tasks:
Tags:
* format
* [PEFT] Support low_cpu_mem_usage for PEFT loading
PEFT added support for low_cpu_mem_usage=True when loading adapters in
https://github.com/huggingface/peft/pull/1961. This feature is now
available when installing PEFT v0.13.0. With this PR, this option is
also supported when loading PEFT adapters directly into transformers
models.
Additionally, with this PR,
https://github.com/huggingface/diffusers/pull/9510 will be unblocked,
which implements this option in diffusers.
* Fix typo
* fix beam indices in token_timestamps
* fix attention_mask in FA2
* correct translation example with the right example
* correct how somes tests are using outputs + correct num_frames
* fix shortform batch prev cond tests
* make fix-copies
* make fix-copies
* take care of shifting beam indices
* [run-slow] whisper
* [run-slow] whisper
* Fix: use unidic-lite instead of ipadic as the tokenizer dictionary of Japanese
Signed-off-by: Kan Takahiro <kan@Kans-Mac-mini.local>
* fix the default name
---------
Signed-off-by: Kan Takahiro <kan@Kans-Mac-mini.local>
Co-authored-by: Kan Takahiro <kan@Kans-Mac-mini.local>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* add unit tests for splinter_tokenizer
* add unit test for splinter tokenizer, pass in the question_token to be saved on save_pretrained called
* remove unused import
* remove vocab_splinter.txt, add Copied from, use fmt:on and fmt:off to prevent autoformatting on long lines
* remove all the spaces
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Use all state dict keys when checking if root module is initialized.
* Apply style corrections
* Add comment explaining change.
* Change comment phrasing.
* try fixing push-ci
* move to new runners
* move benchmark.yml to new runners
* move doctest_job.yml to new runners
* move doctests.yml to new runners
* move push-important-models.yml to new runners
* move self-pr-slow-ci.yml to new runners
* fix typo
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix working directory
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix working directory
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* improve code
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Update an keyerror on _save_check_point prevent confusion of missing metric keys
* Update grammar error and case sensitive.
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* adding update KeyError on _evaluate function to align with _save_checkpoint function
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object.
* When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object.
* When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object.
* Trainer - deprecate tokenizer for processing_class
* Extend chage across Seq2Seq trainer and docs
* Add tests
* Update to FutureWarning and add deprecation version