transformers/docs/source
Younes Belkada 368a58e61c
[core ] Integrate Flash attention 2 in most used models (#25598)
* v1

* oops

* working v1

* fixup

* add some TODOs

* fixup

* padding support + try with module replacement

* nit

* alternative design

* oops

* add `use_cache` support for llama

* v1 falcon

* nit

* a bit of refactor

* nit

* nits nits

* add v1 padding support falcon (even though it seemed to work before)

* nit

* falcon works

* fixup

* v1 tests

* nit

* fix generation llama flash

* update tests

* fix tests + nits

* fix copies

* fix nit

* test- padding mask

* stype

* add more mem efficient support

* Update src/transformers/modeling_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fixup

* nit

* fixup

* remove it from config when saving

* fixup

* revert docstring

* add more checks

* use values

* oops

* new version

* fixup

* add same trick for falcon

* nit

* add another test

* change tests

* fix issues with GC and also falcon

* fixup

* oops

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add init_rope

* updates

* fix copies

* fixup

* fixup

* more clarification

* fixup

* right padding tests

* add docs

* add FA in docker image

* more clarifications

* add some figures

* add todo

* rectify comment

* Change to FA2

* Update docs/source/en/perf_infer_gpu_one.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* split in two lines

* change test name

* add more tests

* some clean up

* remove `rearrange` deps

* add more docs

* revert changes on dockerfile

* Revert "revert changes on dockerfile"

This reverts commit 8d72a66b4b.

* revert changes on dockerfile

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* address some comments

* docs

* use inheritance

* Update src/transformers/testing_utils.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* fixup

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

* final comments

* clean up

* style

* add cast + warning for PEFT models

* fixup

---------

Co-authored-by: Felix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-09-22 17:42:10 +02:00
..
de [TYPO] fix typo/format in quicktour.md (#25519) 2023-08-16 08:03:23 +02:00
en [core ] Integrate Flash attention 2 in most used models (#25598) 2023-09-22 17:42:10 +02:00
es docs: update link huggingface map (#26077) 2023-09-11 12:57:04 +01:00
fr Fix typos (#25936) 2023-09-04 11:15:12 +01:00
it [Docs] Fix un-rendered images (#25561) 2023-08-17 12:08:11 +02:00
ja Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
ko 🌐 [i18n-KO] Translated whisper.md to Korean (#26002) 2023-09-18 22:12:41 +02:00
ms Add BROS (#23190) 2023-09-14 18:02:37 +01:00
pt docs: update link huggingface map (#26077) 2023-09-11 12:57:04 +01:00
zh Fix small typo README.md (#25934) 2023-09-06 14:07:29 +01:00
_config.py Adding evaluate to the list of libraries required in generated notebooks (#20850) 2022-12-21 14:04:08 +01:00