transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-13 01:30:04 +06:00

History

Arthur 19ade2426a [WIP]`NLLB-MoE` Adds the moe model (#22024 ) * Initial commit * update modeling code * update doc * add functions necessary * fix impotrs * revert changes * fixup * more styling to get going * remove standalone encoder * update code * styling * fix config and model * update code and some refactoring * make more tests pass * Adding NLLB-200 - MoE - 54.5B for no language left behind Fixes #21300 * fix mor common tests * styke * update testing file * update * update * Router2 doc * update check config with sparse layer * add dummy router * update current conversion script * create on the fly conversion script * Fixup * style * style 2 * fix empty return * fix return * Update default config sparse layers * easier to create sparse layers * update * update conversion script * update modeling * add to toctree * styling * make ruff happy * update docstring * update conversion script * update, will break tests but impelemting top2 * update * ❗local groups are supported here * ⚠️ Support for local groups is now removed ⚠️ This is because it has to work with model parallelism that we do not support * finish simplificaiton * Fix forward * style * fixup * Update modelling and test, refactoring * update tests * remove final layer)norm as it is done in the FF * routing works! Logits test added * nit in test * remove top1router * style * make sure sparse are tested. Had to change route_tokens a liottle bit * add support for unslip models when converting * fixup * style * update test s * update test * REFACTOR * encoder outputs match! * style * update testing * 🎉encoder and decoder logits match 🎉 * styleing * update tests * cleanup tests * fix router test and CIs * cleanup * cleanup test styling * fix tests * Finally the generation tests match! * cleanup * update test * style testing file * remove script * cleanup * more cleanup * nits * update * NLLB tokenizer is wrong and will be fixed soon * use LongTensors * update tests * revert some small changes * fix second expert sampling and batch prioritized routing * update tests * finish last tests * make ruff happy * update * ruff again * style * Update docs/source/en/model_doc/nllb-moe.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Updates based on review * style and fix import issue * nit * more nits * cleanup * styling * update test_seconde_expert_policy * fix name * last nit on the markdown examples --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>		2023-03-27 19:42:00 +02:00
..
asr.mdx	Added "Open in Colab" to task guides (#21729 )	2023-02-22 08:32:35 -05:00
audio_classification.mdx	[Whisper] Add model for audio classification (#21754 )	2023-03-07 16:20:21 +01:00
document_question_answering.mdx	Add: document question answering task guide (#21518 )	2023-02-13 09:24:56 -05:00
image_captioning.mdx	[Tasks] Adds image captioning (#21512 )	2023-02-10 22:52:12 +05:30
image_classification.mdx	Fix doc links (#22274 )	2023-03-20 17:07:31 +00:00
language_modeling.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
masked_language_modeling.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
monocular_depth_estimation.mdx	Depth estimation task guide (#22205 )	2023-03-17 08:36:23 -04:00
multiple_choice.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
object_detection.mdx	Update quality tooling for formatting (#21480 )	2023-02-06 18:10:56 -05:00
question_answering.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
semantic_segmentation.mdx	Fix doc links (#22274 )	2023-03-20 17:07:31 +00:00
sequence_classification.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
summarization.mdx	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
token_classification.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
translation.mdx	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
video_classification.mdx	Automated compatible models list for task guides (#21338 )	2023-01-27 13:19:28 -05:00
zero_shot_image_classification.mdx	Zero-shot image classification task guide (#22132 )	2023-03-13 10:57:17 -04:00
zero_shot_object_detection.mdx	Add: task guide for zero shot object detection (#21829 )	2023-02-28 10:23:08 -05:00