transformers/utils
Arthur 19ade2426a
[WIP]NLLB-MoE Adds the moe model (#22024)
* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉

* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 19:42:00 +02:00
..
test_module AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
tf_ops Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
check_config_attributes.py [Time-Series] informer model (#21099) 2023-03-07 21:36:38 +01:00
check_config_docstrings.py LLaMA Implementation (#21955) 2023-03-16 09:00:53 -04:00
check_copies.py Apply ruff flake8-comprehensions (#21694) 2023-02-22 09:14:54 +01:00
check_doc_toc.py Apply ruff flake8-comprehensions (#21694) 2023-02-22 09:14:54 +01:00
check_doctest_list.py Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
check_dummies.py Cleanup quality (#21493) 2023-02-07 12:27:31 -05:00
check_inits.py refactor: Make direct_transformers_import util (#21652) 2023-02-16 11:32:32 -05:00
check_model_tester.py Add a new script to check model testers' config (#22063) 2023-03-13 19:11:19 +01:00
check_repo.py [WIP]NLLB-MoE Adds the moe model (#22024) 2023-03-27 19:42:00 +02:00
check_self_hosted_runner.py Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
check_table.py refactor: Make direct_transformers_import util (#21652) 2023-02-16 11:32:32 -05:00
check_task_guides.py Depth estimation task guide (#22205) 2023-03-17 08:36:23 -04:00
check_tf_ops.py Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
create_dummy_models.py Automatically create/update tiny models (#22275) 2023-03-23 19:14:17 +01:00
custom_init_isort.py Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
documentation_tests.txt Final update of doctest (#22299) 2023-03-22 01:00:33 +01:00
download_glue_data.py Raise exceptions instead of asserts (#13907) 2021-10-07 12:44:23 +05:30
extract_warnings.py Make Slack CI reporting stronger (#21823) 2023-02-28 17:12:44 +01:00
get_ci_error_statistics.py Make Slack CI reporting stronger (#21823) 2023-02-28 17:12:44 +01:00
get_github_job_time.py Make Slack CI reporting stronger (#21823) 2023-02-28 17:12:44 +01:00
get_modified_files.py exclude deleted files in the fixup script (#21436) 2023-02-03 12:57:02 -05:00
get_test_info.py Add an utility file to get information from test files (#21856) 2023-03-01 17:53:29 +01:00
notification_service_doc_tests.py Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
notification_service.py Show the number of huggingface_hub warnings in CI report (#22054) 2023-03-09 15:39:05 +01:00
past_ci_versions.py Make Slack CI reporting stronger (#21823) 2023-02-28 17:12:44 +01:00
prepare_for_doc_test.py Add a check regarding the number of occurrences of ``` (#18389) 2022-08-01 14:23:02 +02:00
print_env.py Print more library versions in CI (#17384) 2022-06-02 10:24:16 +02:00
release.py Clean README in post release job as well. (#17519) 2022-06-02 07:44:03 -04:00
sort_auto_mappings.py Automatically sort auto mappings (#17250) 2022-05-16 13:24:20 -04:00
tests_fetcher.py 🔥Rework pipeline testing by removing PipelineTestCaseMeta 🚀 (#21516) 2023-02-28 19:40:57 +01:00
update_metadata.py Add AutoModelForZeroShotImageClassification (#22087) 2023-03-13 12:46:14 +03:00
update_tiny_models.py Automatically create/update tiny models (#22275) 2023-03-23 19:14:17 +01:00