transformers/docs/source/en/model_doc/mt5.md
Sylvain Gugger eb849f6604
Migrate doc files to Markdown. (#24376)
* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-06-20 18:07:47 -04:00

3.9 KiB

mT5

Overview

The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

The abstract from the paper is the following:

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Note: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is usable on a downstream task, unlike the original T5 model. Since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.

Google has released the following variants:

This model was contributed by patrickvonplaten. The original code can be found here.

Documentation resources

MT5Config

autodoc MT5Config

MT5Tokenizer

autodoc MT5Tokenizer

See [T5Tokenizer] for all details.

MT5TokenizerFast

autodoc MT5TokenizerFast

See [T5TokenizerFast] for all details.

MT5Model

autodoc MT5Model

MT5ForConditionalGeneration

autodoc MT5ForConditionalGeneration

MT5EncoderModel

autodoc MT5EncoderModel

TFMT5Model

autodoc TFMT5Model

TFMT5ForConditionalGeneration

autodoc TFMT5ForConditionalGeneration

TFMT5EncoderModel

autodoc TFMT5EncoderModel

FlaxMT5Model

autodoc FlaxMT5Model

FlaxMT5ForConditionalGeneration

autodoc FlaxMT5ForConditionalGeneration

FlaxMT5EncoderModel

autodoc FlaxMT5EncoderModel