transformers/docs/source/zh
fxmarty 80377eb018
F.scaled_dot_product_attention support (#26572)
* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-09 05:38:14 +09:00
..
internal translate internal folder files to chinese (#27638) 2023-12-04 10:04:28 -08:00
main_classes F.scaled_dot_product_attention support (#26572) 2023-12-09 05:38:14 +09:00
_toctree.yml translate internal folder files to chinese (#27638) 2023-12-04 10:04:28 -08:00
accelerate.md Translated the accelerate.md file of the documentation to Chinese (#26161) 2023-10-11 10:54:22 -07:00
autoclass_tutorial.md translate autoclass_tutorial to chinese (#27269) 2023-11-03 09:16:55 -07:00
big_models.md translate big_models.md and performance.md to chinese (#27334) 2023-11-08 08:48:46 -08:00
create_a_model.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
custom_models.md 🌐 [i18n-ZH] Translate custom_models.md into Chinese (#27065) 2023-10-25 11:20:32 -07:00
debugging.md translate debugging.md to chinese (#27374) 2023-11-08 14:04:06 -08:00
fast_tokenizers.md [i18n-ZH] Translated fast_tokenizers.md to Chinese (#26910) 2023-10-18 10:45:41 -07:00
hpo_train.md translate hpo_train.md and perf_hardware.md to chinese (#27431) 2023-11-14 09:57:17 -08:00
index.md docs(zh): review and punctuation & space fix (#26627) 2023-10-06 09:24:28 -07:00
installation.md docs(zh): review and punctuation & space fix (#26627) 2023-10-06 09:24:28 -07:00
llm_tutorial.md translate model_sharing.md and llm_tutorial.md to chinese (#27283) 2023-11-07 15:34:33 -08:00
model_sharing.md translate model_sharing.md and llm_tutorial.md to chinese (#27283) 2023-11-07 15:34:33 -08:00
multilingual.md 🌐 [i18n-ZH] Translate multilingual into Chinese (#26935) 2023-10-23 10:35:17 -07:00
peft.md translate peft.md to chinese (#27215) 2023-11-02 10:42:29 -07:00
perf_hardware.md docs: replace torch.distributed.run by torchrun (#27528) 2023-11-27 16:26:33 +00:00
perf_torch_compile.md Perf torch compile (#27422) 2023-11-13 09:46:40 -08:00
performance.md translate big_models.md and performance.md to chinese (#27334) 2023-11-08 08:48:46 -08:00
pipeline_tutorial.md Translate pipeline_tutorial.md to chinese (#26954) 2023-10-23 08:58:00 -07:00
preprocessing.md Broken links fixed related to datasets docs (#27569) 2023-11-17 13:44:09 -08:00
quicktour.md docs(zh): review and punctuation & space fix (#26627) 2023-10-06 09:24:28 -07:00
run_scripts.md docs: replace torch.distributed.run by torchrun (#27528) 2023-11-27 16:26:33 +00:00
serialization.md 🌐 [i18n-ZH] Translate serialization.md into Chinese (#27076) 2023-10-30 08:50:29 -07:00
task_summary.md Translate task summary to chinese (#27180) 2023-11-01 09:28:34 -07:00
tf_xla.md Perf torch compile (#27422) 2023-11-13 09:46:40 -08:00
tflite.md 🌐 [i18n-ZH] Translate tflite.md into Chinese (#27134) 2023-10-31 12:50:48 -07:00
tokenizer_summary.md translate the en tokenizer_summary.md to Chinese (#27291) 2023-11-07 15:31:51 -08:00
training.md Broken links fixed related to datasets docs (#27569) 2023-11-17 13:44:09 -08:00
transformers_agents.md translate transformers_agents.md to Chinese (#27046) 2023-10-27 12:45:43 -07:00