RaymondLi0
63caa370e6
Starcoder2 model - bis ( #29215 )
...
* Copy model
* changes
* misc
* fixes
* add embed and residual dropout (#30 )
* misc
* remove rms norm and gated MLP
* remove copied mentions where its not a copy anymore
* remove unused _shape
* copied from mistral instead
* fix copies
* fix copies
* add not doctested
* fix
* fix copyright
* Update docs/source/en/model_doc/starcoder2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/starcoder2/configuration_starcoder2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/starcoder2/configuration_starcoder2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix doc
* revert some changes
* add fa2 tests
* fix styling nit
* fix
* push dummy docs
---------
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-28 01:24:34 +01:00
fxmarty
2cc8cf6ce7
Fix torch.compile
with fullgraph=True
when attention_mask
input is used ( #29211 )
...
* fix torch.export.export for llama
* do not change doc title
* make fix copies
2024-02-22 16:40:06 +01:00
Arthur
594c1277b2
[ gemma
] Adds support for Gemma 💎 ( #29167 )
...
* inital commit
* update
* update conversion checkpoint
* update conversion script
* nits
* some fixes
* nits
* merge
* fix permute
* nits
* fix
* nits
* nits
* nits
* fix rope
* fix both rope
* nites
* style
* make sure flax works
* fix flax init code
* fix foward
* nits
* print flax generation out
* current code
* nits
* SIIIIIIIIIIIIIIIIIII
* update
* add new tokenizer
* correct fast tokenizer
* fix conversion
* more comments
* fix modeling and conversion
* nits and nits
* nits testing
* add some tokenization tests
* add some edge cases
* add slow tests and fix them
* fixup
* fix copies for modeling
* fix copies
* add 7B slow tests
* fix
* fix
* fix tests
* make tokenizer cis go green
* styling
* last tokenizer nits
* update jax tests
* fix flax for 7b
* add jit testing 🤗
* cleanups
* isolated nit, inv_freq for rotary_emb.inv_freq
* propagate to jax
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adjust test
* fix conversion script
* change name
* correct file names
* update conversion script
* Fix bos and eos token ids in the model configuration (#3 )
* update modelling
* update conversion script
* add static cache for gemma
* fix sdpa generate
* fix batched
* multiple fixes
* fix FA2
* final fix
* Rename a few missing strings and filenames (#4 )
* merge with upstream main
* fix copies
* fix copies
* fix fixup
* fix fixup
* fix
* fix
* final tests
* fix fx gemma tests
* fix fx bf16/fp16 tests
* update slow fx tests
* fx slow tests: one logits, one generation
* move jit test standalone
* Apply suggestions from code review
* nits
* tokenizer updates
* more tokenization updates: custom GemmaSentencepieceExtrator
* style
* Update src/transformers/cache_utils.py
* Update src/transformers/models/gemma/__init__.py
* Update tests/models/gemma/test_modeling_flax_gemma.py
* small nits
* style
* update tokenization test
* fix the rotary embedding
* with style
* fix slow tests
* WARNING this commit might be very important for precisions
* Update tests/models/gemma/test_modeling_flax_gemma.py
* Update src/transformers/models/gemma/configuration_gemma.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update src/transformers/models/gemma/modeling_flax_gemma.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* small nits here and there!
* forgotten nit
* remove on the fly computation of inv_freq
* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_flax_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_tokenization_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update tests/models/gemma/test_modeling_gemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* nit conversion script link
* fix some tests
* add not doctest and pr doctest
* repo consistency
* fix last CIs 🚀
* update all readmes
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-02-21 14:21:28 +01:00
Ekaterina Aidova
1d0ea7abe0
support SDPA Attention in stablelm ( #29106 )
...
* support SDPA Attention in stablelm
* add integration test
* add fallback for output_attentions
* Update src/transformers/models/stablelm/modeling_stablelm.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/stablelm/test_modeling_stablelm.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/stablelm/modeling_stablelm.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* handle non-contiguous states
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-21 13:12:49 +01:00
JB (Don)
b8b16475d4
[Phi] Add support for sdpa ( #29108 )
2024-02-20 14:33:12 +01:00
Lysandre Debut
f497f564bb
Update all references to canonical models ( #29001 )
...
* Script & Manual edition
* Update
2024-02-16 08:16:58 +01:00
Jonathan Tow
de6029a059
Add StableLM
( #28810 )
...
* Add `StableLM`
* fix(model): re-create from `huggingface-cli add-new-model-like persimmon`
* fix: re-add changes to address comments
* fix(readme): add links to paper
* fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref
* fix(tests): re-add `@slow` decorator to integration tests
* fix(tests): import slow...
* fix(readme_hd): remove whitespace edit
* fix(tokenizer): auto tokenizer tuple
* skip doctests for `modeling_stablelm`
2024-02-14 07:15:18 +01:00
Junyang Lin
d6ffe74dfa
Add qwen2 ( #28436 )
...
* add config, modeling, and tokenization
* add auto and init
* update readme
* update readme
* update team name
* fixup
* fixup
* update config
* update code style
* update for fixup
* update for fixup
* update for fixup
* update for testing
* update for testing
* fix bug for config and tokenization
* fix bug for bos token
* not doctest
* debug tokenizer
* not doctest
* debug tokenization
* debug init for tokenizer
* fix style
* update init
* delete if in token auto
* add tokenizer doc
* add tokenizer in init
* Update dummy_tokenizers_objects.py
* update
* update
* debug
* Update tokenization_qwen2.py
* debug
* Update convert_slow_tokenizer.py
* add copies
* add copied from and make style
* update files map
* update test
* fix style
* fix merge reading and update tests
* fix tests
* fix tests
* fix style
* debug a variable in readme
* Update src/transformers/models/qwen2/configuration_qwen2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update test and copied from
* fix style
* update qwen2 tokenization and tests
* Update tokenization_qwen2.py
* delete the copied from after property
* fix style
* update tests
* update tests
* add copied from
* fix bugs
* update doc
* add warning for sliding window attention
* update qwen2 tokenization
* fix style
* Update src/transformers/models/qwen2/modeling_qwen2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix tokenizer fast
---------
Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-17 16:02:22 +01:00
Yih-Dar
71f460578d
Update docs/source/en/perf_infer_gpu_one.md
( #28198 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-22 10:40:22 +01:00
Steven Liu
a52e180a0f
[docs] General doc fixes ( #28087 )
...
* doc fix friday
* deprecated objects
* update not_doctested
* update toctree
2023-12-18 10:44:09 -08:00
Younes Belkada
c7f076a00e
Adds VIP-llava to transformers ( #27932 )
...
* v1
* add-new-model-like
* revert
* fix forward and conversion script
* revert
* fix copies
* fixup
* fix
* Update docs/source/en/index.md
* Apply suggestions from code review
* push
* fix
* fixes here and there
* up
* fixup and fix tests
* Apply suggestions from code review
* add docs
* fixup
* fixes
* docstring
* add docstring
* fixup
* docstring
* fixup
* nit
* docs
* more copies
* fix copies
* nit
* update test
2023-12-13 10:42:24 +01:00
Stas Bekman
9936143014
[doc] fix typo ( #27981 )
2023-12-12 20:32:42 +00:00
Arthur
accccdd008
[Add Mixtral
] Adds support for the Mixtral MoE ( #27942 )
...
* up
* up
* test
* logits ok
* up
* up
* few fixes
* conversion script
* up
* nits
* nits
* update
* nuke
* more updates
* nites
* fix many issues
* nit
* scatter
* nit
* nuke megablocks
* nits
* fix conversion script
* nit
* remove
* nits
* nit
* update
* oupsssss
* change
* nits device
* nits
* fixup
* update
* merge
* add copied from
* fix the copy mentions
* update tests
* more fixes
* nits
* conversion script
* add parts of the readme
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* new test + conversion script
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix copies
* fix copies
* ooops
* fix config
* Apply suggestions from code review
* fix nits
* nit
* add copies
* add batched tests
* docs
* fix flash attention
* let's add more verbose
* add correct outputs
* support router ouptus
* ignore copies where needed
* fix
* cat list if list is given for now
* nits
* Update docs/source/en/model_doc/mixtral.md
* finish router refactoring
* fix forward
* fix expected values
* nits
* fixup
* fix
* fix bug
* fix
* fix dtype mismatch
* fix
* grrr grrr I support item assignment
* fix CI
* docs
* fixup
* remove some copied form
* fix weird diff
* skip doctest fast on the config and modeling
* mark that is supports flash attention in the doc
* update
* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert router logits config issue
* update doc accordingly
* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
* nits
* use torch testing asssert close
* fixup
* doc nits
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
fxmarty
80377eb018
F.scaled_dot_product_attention support ( #26572 )
...
* add sdpa
* wip
* cleaning
* add ref
* yet more cleaning
* and more :)
* wip llama
* working llama
* add output_attentions=True support
* bigcode sdpa support
* fixes
* gpt-bigcode support, require torch>=2.1.1
* add falcon support
* fix conflicts falcon
* style
* fix attention_mask definition
* remove output_attentions from attnmaskconverter
* support whisper without removing any Copied from statement
* fix mbart default to eager renaming
* fix typo in falcon
* fix is_causal in SDPA
* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained
* add warnings when falling back on the manual implementation
* precise doc
* wip replace _flash_attn_enabled by config.attn_implementation
* fix typo
* add tests
* style
* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace
* obey to config.attn_implementation if a config is passed in from_pretrained
* fix is_torch_sdpa_available when torch is not installed
* remove dead code
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove duplicate pretraining_tp code
* add dropout in llama
* precise comment on attn_mask
* add fmt: off for _unmask_unattended docstring
* precise num_masks comment
* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion
* cleanup modeling_utils
* backward compatibility
* fix style as requested
* style
* improve documentation
* test pass
* style
* add _unmask_unattended tests
* skip meaningless tests for idefics
* hard_check SDPA requirements when specifically requested
* standardize the use if XXX_ATTENTION_CLASSES
* fix SDPA bug with mem-efficient backend on CUDA when using fp32
* fix test
* rely on SDPA is_causal parameter to handle the causal mask in some cases
* fix FALCON_ATTENTION_CLASSES
* remove _flash_attn_2_enabled occurences
* fix test
* add OPT to the list of supported flash models
* improve test
* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test
* remove remaining _flash_attn_2_enabled occurence
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_attn_mask_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove use_attn_implementation
* fix docstring & slight bug
* make attn_implementation internal (_attn_implementation)
* typos
* fix tests
* deprecate use_flash_attention_2=True
* fix test
* add back llama that was removed by mistake
* fix tests
* remove _flash_attn_2_enabled occurences bis
* add check & test that passed attn_implementation is valid
* fix falcon torchscript export
* fix device of mask in tests
* add tip about torch.jit.trace and move bt doc below sdpa
* fix parameterized.expand order
* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there
* update sdpaattention class with the new cache
* Update src/transformers/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bark/modeling_bark.py
* address review comments
* WIP torch.jit.trace fix. left: test both eager & sdpa
* add test for torch.jit.trace for both eager/sdpa
* fix falcon with torch==2.0 that needs to use sdpa
* fix doc
* hopefully last fix
* fix key_value_length that has no default now in mask converter
* is it flacky?
* fix speculative decoding bug
* tests do pass
* fix following #27907
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-09 05:38:14 +09:00
fxmarty
1da1302ec8
Flash Attention 2 support for RoCm ( #27611 )
...
* support FA2
* fix typo
* fix broken tests
* fix more test errors
* left/right
* fix bug
* more test
* typo
* fix layout flash attention falcon
* do not support this case
* use allclose instead of equal
* fix various bugs with flash attention
* bump
* fix test
* fix mistral
* use skiptest instead of return that may be misleading
* add fix causal arg flash attention
* fix copies
* more explicit comment
* still use self.is_causal
* fix causal argument
* comment
* fixes
* update documentation
* add link
* wrong test
* simplify FA2 RoCm requirements
* update opt
* make flash_attn_uses_top_left_mask attribute private and precise comment
* better error handling
* fix copy & mistral
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/import_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use is_flash_attn_greater_or_equal_2_10 instead of is_flash_attn_greater_or_equal_210
* fix merge
* simplify
* inline args
---------
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-04 21:52:17 +09:00
fxmarty
c13a43aaf2
Reflect RoCm support in the documentation ( #27636 )
...
* reflect RoCm support in the documentation
* Update docs/source/en/main_classes/trainer.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* fix review comments
* use ROCm instead of RoCm
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-11-25 00:59:17 +09:00
Yoach Lacombe
9dd58c53dd
update Bark FA2 docs ( #27400 )
...
* update Bark FA2 docs
* update benchmark section
* Update bark.md
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* rephrase
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-11-10 13:40:30 +00:00
Steven Liu
77930f8a01
[docs] Update CPU/GPU inference docs ( #26881 )
...
* first draft
* remove non-existent paths
* edits
* feedback
* feedback and optimum
* Apply suggestions from code review
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
* redirect to correct doc
* _redirects.yml
---------
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
2023-10-31 09:44:51 -07:00
Susnato Dhar
b5db8ca66f
Add flash attention for gpt_bigcode
( #26479 )
...
* added flash attention of gpt_bigcode
* changed docs
* Update src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py
* add FA-2 docs
* oops
* Update docs/source/en/perf_infer_gpu_one.md Last Nit
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix
* oops
* remove padding_mask
* change getattr->hasattr logic
* changed .md file
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-31 11:21:02 +00:00
Younes Belkada
ae9a344cce
[Mistral
] Add Flash Attention-2 support for mistral
( #26464 )
...
* add FA-2 support for mistral
* fixup
* add sliding windows
* fixing few nits
* v1 slicing cache - logits do not match
* add comment
* fix bugs
* more mem efficient
* add warning once
* add warning once
* oops
* fixup
* more comments
* copy
* add safety checker
* fixup
* Update src/transformers/models/mistral/modeling_mistral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* copied from
* up
* raise when padding side is right
* fixup
* add doc + few minor changes
* fixup
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-03 13:44:46 +02:00
titi
a8531f3bfd
Deleted duplicate sentence ( #26394 )
2023-09-26 10:11:28 +02:00
Younes Belkada
368a58e61c
[core
] Integrate Flash attention 2 in most used models ( #25598 )
...
* v1
* oops
* working v1
* fixup
* add some TODOs
* fixup
* padding support + try with module replacement
* nit
* alternative design
* oops
* add `use_cache` support for llama
* v1 falcon
* nit
* a bit of refactor
* nit
* nits nits
* add v1 padding support falcon (even though it seemed to work before)
* nit
* falcon works
* fixup
* v1 tests
* nit
* fix generation llama flash
* update tests
* fix tests + nits
* fix copies
* fix nit
* test- padding mask
* stype
* add more mem efficient support
* Update src/transformers/modeling_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fixup
* nit
* fixup
* remove it from config when saving
* fixup
* revert docstring
* add more checks
* use values
* oops
* new version
* fixup
* add same trick for falcon
* nit
* add another test
* change tests
* fix issues with GC and also falcon
* fixup
* oops
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add init_rope
* updates
* fix copies
* fixup
* fixup
* more clarification
* fixup
* right padding tests
* add docs
* add FA in docker image
* more clarifications
* add some figures
* add todo
* rectify comment
* Change to FA2
* Update docs/source/en/perf_infer_gpu_one.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* split in two lines
* change test name
* add more tests
* some clean up
* remove `rearrange` deps
* add more docs
* revert changes on dockerfile
* Revert "revert changes on dockerfile"
This reverts commit 8d72a66b4b
.
* revert changes on dockerfile
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* address some comments
* docs
* use inheritance
* Update src/transformers/testing_utils.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* fixup
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
* final comments
* clean up
* style
* add cast + warning for PEFT models
* fixup
---------
Co-authored-by: Felix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-09-22 17:42:10 +02:00
Younes Belkada
dc0c102954
[Docs
] More clarifications on BT + FA ( #25823 )
2023-08-29 13:52:25 +02:00
Younes Belkada
940d1a76b0
[Docs
/ BetterTransformer
] Added more details about flash attention + SDPA ( #25265 )
...
* added more details about flash attention
* correct and add more details
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* few modifs
* more details
* up
* Apply suggestions from code review
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
* adapt from suggestion
* Apply suggestions from code review
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
* trigger CI
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix nits and copies
* add new section
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
2023-08-18 10:32:28 +02:00
Younes Belkada
e7e9261a20
[Docs
] Fix un-rendered images ( #25561 )
...
fix un-rendered images
2023-08-17 12:08:11 +02:00
Injin Paek
5dba88b2d2
fix: add TOC anchor link ( #25066 )
2023-07-25 08:02:33 -04:00
Joao Gante
4f1b31c2ee
Docs: 4 bit doc corrections ( #24572 )
...
4 bit doc corrections
2023-06-29 13:13:20 +01:00
Sylvain Gugger
eb849f6604
Migrate doc files to Markdown. ( #24376 )
...
* Rename index.mdx to index.md
* With saved modifs
* Address review comment
* Treat all files
* .mdx -> .md
* Remove special char
* Update utils/tests_fetcher.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-06-20 18:07:47 -04:00