[Docs] Improve docs for MMS loading of other languages (#24292)

* Improve docs

* Apply suggestions from code review

* upload readme

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Patrick von Platen 2023-06-15 14:29:32 +02:00 committed by GitHub
parent e6122c3f40
commit 604a21b1e6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -44,11 +44,51 @@ MMS's architecture is based on the Wav2Vec2 model, so one can refer to [Wav2Vec2
The original code can be found [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms).
## Loading
By default MMS loads adapter weights for English. If you want to load adapter weights of another language
make sure to specify `target_lang=<your-chosen-target-lang>` as well as `"ignore_mismatched_sizes=True`.
The `ignore_mismatched_sizes=True` keyword has to be passed to allow the language model head to be resized according
to the vocabulary of the specified language.
Similarly, the processor should be loaded with the same target language
```py
from transformers import Wav2Vec2ForCTC, AutoProcessor
model_id = "facebook/mms-1b-all"
target_lang = "fra"
processor = AutoProcessor.from_pretrained(model_id, target_lang=target_lang)
model = Wav2Vec2ForCTC.from_pretrained(model_id, target_lang=target_lang, ignore_mismatched_sizes=True)
```
<Tip>
You can safely ignore a warning such as:
```text
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/mms-1b-all and are newly initialized because the shapes did not match:
- lm_head.bias: found shape torch.Size([154]) in the checkpoint and torch.Size([314]) in the model instantiated
- lm_head.weight: found shape torch.Size([154, 1280]) in the checkpoint and torch.Size([314, 1280]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
```
</Tip>
If you want to use the ASR pipeline, you can load your chosen target language as such:
```py
from transformers import pipeline
model_id = "facebook/mms-1b-all"
target_lang = "fra"
pipe = pipeline(model=model_id, model_kwargs={"target_lang": "fra", "ignore_mismatched_sizes": True})
```
## Inference
By default MMS loads adapter weights for English, but those can be easily switched out for another language.
Let's look at an example.
Next, let's look at how we can run MMS in inference and change adapter layers after having called [`~PretrainedModel.from_pretrained`]
First, we load audio data in different languages using the [Datasets](https://github.com/huggingface/datasets).
```py