transformers/docs/source/en/tasks
Arthur 19ade2426a
[WIP]NLLB-MoE Adds the moe model (#22024)
* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉

* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 19:42:00 +02:00
..
asr.mdx Added "Open in Colab" to task guides (#21729) 2023-02-22 08:32:35 -05:00
audio_classification.mdx [Whisper] Add model for audio classification (#21754) 2023-03-07 16:20:21 +01:00
document_question_answering.mdx Add: document question answering task guide (#21518) 2023-02-13 09:24:56 -05:00
image_captioning.mdx [Tasks] Adds image captioning (#21512) 2023-02-10 22:52:12 +05:30
image_classification.mdx Fix doc links (#22274) 2023-03-20 17:07:31 +00:00
language_modeling.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
masked_language_modeling.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
monocular_depth_estimation.mdx Depth estimation task guide (#22205) 2023-03-17 08:36:23 -04:00
multiple_choice.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
object_detection.mdx Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
question_answering.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
semantic_segmentation.mdx Fix doc links (#22274) 2023-03-20 17:07:31 +00:00
sequence_classification.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
summarization.mdx [WIP]NLLB-MoE Adds the moe model (#22024) 2023-03-27 19:42:00 +02:00
token_classification.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
translation.mdx [WIP]NLLB-MoE Adds the moe model (#22024) 2023-03-27 19:42:00 +02:00
video_classification.mdx Automated compatible models list for task guides (#21338) 2023-01-27 13:19:28 -05:00
zero_shot_image_classification.mdx Zero-shot image classification task guide (#22132) 2023-03-13 10:57:17 -04:00
zero_shot_object_detection.mdx Add: task guide for zero shot object detection (#21829) 2023-02-28 10:23:08 -05:00