* Optimize Qwen2VL vision model by precomputing cos/sin embeds before ViT blocks
* Make rotary_pos_emb optional & fix type
* Adapt pre-computed cos/sin to Qwen2.5VL
* More concise
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* fix training issues
* Update
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Add implementation for DataCollatorForMultipleChoice based on docs.
* Add DataCollatorForMultipleChoice to import structure.
* Remove custom DataCollatorForMultipleChoice implementations from example scripts.
* Remove custom implementations of DataCollatorForMultipleChoice from docs in English, Spanish, Japanese and Korean.
* Refactor torch version of DataCollatorForMultipleChoice to be more easily understandable.
* Apply suggested changes and run make fixup.
* fix copies, style and fixup
* add missing documentation
* nits
* fix docstring
* style
* nits
* isort
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* update env command to log deepspeed version
* suppress deepspeed import logging
* Add reminder to include configs to repro description in bug report.
* make fixup
* [WIP] update import utils for deepspeed
* Change to using is_deepspeed_available() from integrations.
* make fixup
* change order of unmasking of tokens
* library import
* class setup
* test function
* refactor
* add commit message
* test modified
* explict initiliasation of weights + made model smaller
* removed sepete testing file
* fixup
* fixup core
* test attention mask with token types
* tests fixup
* removed PaliGemmaAttentionMaskTest class
---------
Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>
* Adding option to save/reload scaler
* Removing duplicate variable
* Adding save/reload test
* Small fixes on deterministic algorithm call
* Moving LLM test to another file to isolate its environment
* Moving back to old file and using subprocess to run test isolated
* Reverting back accidental change
* Reverting back accidental change
* milti-gpu: fix inputs_embeds + position_embeds
Fixing the following errors in few models:
```
> hidden_states = inputs_embeds + pos_embeds
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:2 and xpu:3!
```
Fixes: #35762
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* multi-gpu: fix tensor device placements for various models
Fixes: #35762
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Apply make fix-copies
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* feat: added warning to Trainer when label_names is not specified for PeftModel
* Update trainer.py
* feat: peft detectw ith `_is_peft_model`
* Update src/transformers/trainer.py
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* Applied formatting in trainer.py
---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* add RAdamScheduleFree optimizer
* revert schedulefree version to the minimum requirement
* refine is_schedulefree_available so that it can take min_version
* refine documents
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add `base_model_pp_plan` to `PretrainedConfig`
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add `_pp_plan` to `PreTrainedModel`
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add both to Llama for testing
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Fix type error
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Update to suggested schema
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* `_pp_plan` keys are not patterns
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Simplify schema
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Fix typing error
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Update input name for Llama
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to Aria
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to Bamba
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to Cohere 1 & 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to diffllama and emu3
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to Gemma 1 & 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to GLM and GPT NeoX
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to Granite and Helium
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to Mistral and Mixtral
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to OLMo 1 & 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan to Phi and Phi 3
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan for Qwen 2, 2 MoE, 2 VL and 2.5 VL
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add pp plan for Starcoder 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Add enum for accessing inputs and outputs
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Update type hints to use tuples
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Change outer list to tuple
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
---------
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* update awq doc
* Update docs/source/en/quantization/awq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/awq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/awq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/awq.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add note for inference
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* make output_dir optional
* inintaied a basic testing module to validate and verify the changes
* Test output_dir default to 'tmp_trainer' when unspecified.
* test existing functionality of output_dir.
* test that output dir only created when needed
* final check
* added doc string and changed the tmp_trainer to trainer_output
* amke style fixes to test file.
* another round of fixup
---------
Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>