* Move normalization for numerical stability
* Apply suggestions from code review
Remove useless x=x line
* PR comment - normalize later to preserve var name meaning
* torchscript and trainer md es translation
* corrected md es files and even corrected spelling in en md
* made es corrections to trainer.md
* deleted entrenamiento... title on yml
* placed entrenamiento in right place
* translated es chat_templating.md w/ yml addition
* requested es changes to md and yml
* last es changes to md
* initial implementation of flash attention for gptj
* modify flash attention and overwrite test_flash_attn_2_generate_padding_right
* update flash attention support list
* remove the copy line in the `CodeGenBlock`
* address copy mechanism
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add GPTJ attention classes
* add expected outputs in the gptj test
* Ensure repo consistency with 'make fix-copies'
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add tests for batching support
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* fixes and comments
* use cosine distance for conv models
* skip mra model testing
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* finzalize and make style
* check model type by input names
* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fixed batch size for all testers
* Revert "fixed batch size for all testers"
This reverts commit 525f3a0a05.
* add batch_size for all testers
* dict from model output
* do not skip layoutlm
* bring back some code from git revert
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* clean-up
* where did minus go in tolerance
* make whisper happy
* deal with consequences of losing minus
* deal with consequences of losing minus
* maskformer needs its own test for happiness
* fix more models
* tag flaky CV models from Amy's approval
* make codestyle
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update legacy Repository usage in `examples/pytorch/text-classification/run_glue_no_trainer.py`
Marked for deprecation here https://huggingface.co/docs/huggingface_hub/guides/upload#legacy-upload-files-with-git-lfs
* Fix import order
* Replace all example usage of deprecated Repository
* Fix remaining repo call and rename args variable
* Revert removing creation of gitignore files and don't change research examples
* add: initial script to train clm fim
* fix: if training model from scratch, new tokens will be added and embeddings resized
* fix: fixed attention_mask errors when generating FIM data
* fix: file formatted using black
* add: run_fim_no_trainer.py and fixed some comments in run_fim.py
* add: added fim examples to the README.md and ran code fixup
* fix: little bug in both fim training scripts
* fix: remove comment from notebook and added a note on fim related params
* fix: minor typo in README
* add: suggested minor changes to README and run_fim.py
* add: gradient_accumulation_steps and gradient_checkpointing args
* add: improved model embedding resizing
* add: pad_to_multiple_of and attn_implementation params
* add: requested minor changes
* add: deepspeed zero compatibility
* add: resize embeddings layer with zero3 support for fim model initialization
* fix stablelm dropout argument type error
* fix docs of _flash_attention_forward
* fix all docs of _flash_attention_forward
* fix docs of _flash_attention_forward in starcoder2
---------
Co-authored-by: oliang <oliang@tencent.com>
* fix image-to-text batch incorrect output issue
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add ci test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* update ci test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>