* add: initial script to train clm fim
* fix: if training model from scratch, new tokens will be added and embeddings resized
* fix: fixed attention_mask errors when generating FIM data
* fix: file formatted using black
* add: run_fim_no_trainer.py and fixed some comments in run_fim.py
* add: added fim examples to the README.md and ran code fixup
* fix: little bug in both fim training scripts
* fix: remove comment from notebook and added a note on fim related params
* fix: minor typo in README
* add: suggested minor changes to README and run_fim.py
* add: gradient_accumulation_steps and gradient_checkpointing args
* add: improved model embedding resizing
* add: pad_to_multiple_of and attn_implementation params
* add: requested minor changes
* add: deepspeed zero compatibility
* add: resize embeddings layer with zero3 support for fim model initialization
* fix stablelm dropout argument type error
* fix docs of _flash_attention_forward
* fix all docs of _flash_attention_forward
* fix docs of _flash_attention_forward in starcoder2
---------
Co-authored-by: oliang <oliang@tencent.com>
* fix image-to-text batch incorrect output issue
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add ci test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* update ci test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* left-padding test revisited
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Set `inputs` as kwarg in `TextClassificationPipeline`
This change has been done to align the `TextClassificationPipeline` with the rest of the pipelines, and to be able to e.g. `pipeline(**{"inputs": "text"})` which wouldn't be possible since the `*args` were being used instead.
* Add `noqa: C409` on `tuple([inputs],)`
Even though is discouraged by the linter, the cast `tuple(list(...),)` is required here, as otherwise the original list in `inputs` will be transformed into a `tuple` and the elements 1...N will be ignored by the `Pipeline`
* Run `ruff format`
* Simplify `tuple` conversion with `(inputs,)`
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS devices
* Update src/transformers/models/gemma/modeling_gemma.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update llama ang gemma rope use cpu in mps device
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* try to fix gemma mem use
* fix: handle attention mask dim==2 case
* remove logits=logits.float()
* clean up + add llama
* apply formatting
* readability edit: swap order of items being multiplied
* revert change unrelated to PR
* revert black autoformat
* switch to one .to
* Accept style edits
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* added the max_matching_ngram_size parameter into the GenerationConfig, for the PromptLookupCandidateGenerator
* switched back to keyword arguments
* added PromptLookupCandidateGenerator docstring for its parameters
* ruff reformat
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix TrainingArguments regression with torch <2.0.0 for dataloader_prefetch_factor
dataloader_prefetch_factor was added to TrainingArguments in #28498 with the default value None, but versions of torch<2.0.0 do not accept None and will raise an error if num_workers == 0 and prefetch_factor != 2
* Add is_torch_available() check
* Use is_torch_greater_or_equal_than_2_0
add back check for dataloader_prefetch_factor