transformers/docs/source/en
Younes Belkada 163ac3d3ee
Add Switch transformers (#19323)
* first commit

* add more comments

* add router v1

* clean up

- remove `tf` modeling files

* clean up

- remove `tf` modeling files

* clean up

* v0 routers

* added more router

- Implemented `ExpertsChooseMaskedRouter`

- added tests
- 2 more routers to implement

* last router

* improved docstring

- completed the docstring in `router.py`
- added more args in the config

* v0 sparse mlp

* replace wrong naming

* forward pass run

* update MOE layer

* small router update

* fixup

* consistency

* remove scatter router

* remove abstract layer

* update test and model for integration testing

* v1 conversion

* update

* hardcode hack

* all keys match

* add gin conversion, without additional libraries

* update conversion sctipy

* delete router file

* update tests wrt router deletion

* fix router issues

* update expert code

* update, logits match, code needsREFACTORING

* Refactor code

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* add generate tests

Co-authored-by: younesbelkada <younesbelkada@gmail.com>

* add support for router loss

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix forward error

* refactor a bit

* remove `FlaxSwitchTransformers` modules

* more tests pass

* Update code

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fixup

* fix tests

* fix doc

* fix doc + tokenization

* fix tokenizer test

* fix test

* fix loss output

* update code for backward pass

* add loss support

* update documentation

* fix documentation, clean tokenizer

* more doc fix, cleanup example_switch

* fix failing test

* fix test

* fix test

* fix loss issue

* move layer

* update doc and fix router capacity usage

* fixup

* add sparse mlp index for documentation on hub

* fixup

* test sparse mix architecture

* Apply suggestions from code review

* Update docs/source/en/model_doc/switch_transformers.mdx

* fixup on update

* fix tests

* fix another test

* attempt fix

* Update src/transformers/models/switch_transformers/configuration_switch_transformers.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/switch_transformers/convert_switch_transformers_original_flax_checkpoint_to_pytorch.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* try

* all tests pass

* fix jitter noise

* Apply suggestions from code review

* doc tests pass

* Update src/transformers/models/switch_transformers/modeling_switch_transformers.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/switch_transformers/modeling_switch_transformers.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove assert

* change config order

* fix readme japanese

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove parallelizable tests + add one liners

* remove ONNX config

* fix nits

- add `T5Tokenizer` in auto mapping
- remove `Switch Transformers` from ONNX supported models

* remove `_get_router`

* remove asserts

* add check in test for `router_dtype`

* add `SwitchTransformersConfig` in `run_pipeline_test`

* Update tests/pipelines/test_pipelines_summarization.py

* add huge model conversion script

* fix slow tests

- add better casting for `Linear8bitLt`
- remove `torchscript` tests

* add make dir

* style on new script

* fix nits

- doctest
- remove `_keys_to_ignore_on_load_unexpected`

* Update src/transformers/models/switch_transformers/configuration_switch_transformers.py

* add google as authors

* fix year

* remove last `assert` statements

* standardize vertical spaces

* fix failing import

* fix another failing test

* Remove strange àuthorized_keys`

* removing todo and padding that is never used

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: ybelkada <younes@huggingface.co>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur@huggingface.co>
2022-11-15 13:06:45 +01:00
..
internal Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
main_classes Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
model_doc Add Switch transformers (#19323) 2022-11-15 13:06:45 +01:00
tasks docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
_config.py Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
_toctree.yml Add Switch transformers (#19323) 2022-11-15 13:06:45 +01:00
accelerate.mdx update to use interlibrary links instead of Markdown (#18500) 2022-08-08 10:53:52 -05:00
add_new_model.mdx add small updates only (#19847) 2022-10-24 10:18:20 -07:00
add_new_pipeline.mdx Update add_new_pipeline.mdx (#18224) 2022-07-21 07:55:30 +02:00
add_tensorflow_model.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
autoclass_tutorial.mdx Mention TF and Flax checkpoints (#18894) 2022-09-05 11:09:39 +02:00
benchmarks.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bertology.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
big_models.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
community.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
converting_tensorflow_models.mdx Docs - Guide to add a new TensorFlow model (#19256) 2022-09-30 20:30:38 +01:00
create_a_model.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
custom_models.mdx Replace awkward timm link with the expected one (#20109) 2022-11-07 13:57:39 -05:00
debugging.mdx [doc] debug: fix import (#19042) 2022-09-14 16:29:58 -07:00
fast_tokenizers.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
glossary.mdx add cv + audio labels (#20114) 2022-11-09 07:40:15 -08:00
hpo_train.mdx update doc for perf_train_cpu_many (#19506) 2022-10-11 22:54:19 -04:00
index.mdx Add Switch transformers (#19323) 2022-11-15 13:06:45 +01:00
installation.mdx Move cache folder to huggingface/hub for consistency with hf_hub (#18492) 2022-08-05 13:14:00 -04:00
migration.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
model_sharing.mdx Just re-reading the whole doc every couple of months 😬 (#18489) 2022-08-06 09:38:55 +02:00
model_summary.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
multilingual.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
pad_truncation.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
perf_hardware.mdx [WIP] [doc] performance/scalability revamp (#15723) 2022-05-16 13:36:41 +02:00
perf_infer_cpu.mdx fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891) 2022-11-03 10:50:03 -04:00
perf_infer_gpu_many.mdx Update perf_infer_gpu_many.mdx (#18744) 2022-08-24 10:37:52 +02:00
perf_infer_gpu_one.mdx [bnb] Move documentation (#18671) 2022-08-18 17:34:48 +02:00
perf_infer_special.mdx Improve performance docs (#17750) 2022-06-23 14:51:54 +02:00
perf_train_cpu_many.mdx update doc for perf_train_cpu_many (#19506) 2022-10-11 22:54:19 -04:00
perf_train_cpu.mdx fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891) 2022-11-03 10:50:03 -04:00
perf_train_gpu_many.mdx Improve performance docs (#17750) 2022-06-23 14:51:54 +02:00
perf_train_gpu_one.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
perf_train_special.mdx Improve performance docs (#17750) 2022-06-23 14:51:54 +02:00
perf_train_tpu.mdx Improve performance docs (#17750) 2022-06-23 14:51:54 +02:00
performance.mdx Improve performance docs (#17750) 2022-06-23 14:51:54 +02:00
perplexity.mdx Fix incorrect size of input for 1st strided window length in Perplexity of fixed-length models (#18906) 2022-09-06 15:20:12 -04:00
philosophy.mdx Update philosophy to include other preprocessing classes (#18550) 2022-08-10 13:20:39 -05:00
pipeline_tutorial.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
pr_checks.mdx 📝 update documentation build section (#18548) 2022-08-09 18:22:55 -05:00
preprocessing.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
quicktour.mdx Fix doctest (#20023) 2022-11-02 19:37:25 +01:00
run_scripts.mdx Just re-reading the whole doc every couple of months 😬 (#18489) 2022-08-06 09:38:55 +02:00
sagemaker.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
serialization.mdx add MobileNetV2 model (#17845) 2022-11-14 01:00:10 -05:00
task_summary.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
testing.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
tokenizer_summary.mdx Update tokenizer_summary.mdx (#20135) 2022-11-15 01:18:13 +01:00
torchscript.mdx Breakup export guide (#19271) 2022-10-03 13:18:29 -07:00
training.mdx Update training.mdx (#19791) 2022-10-21 09:46:44 -04:00
troubleshooting.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00