Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
emanrissha 2025-07-02 05:45:07 +03:00 committed by GitHub
parent 306fd7a102
commit 62b66d795d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -150,7 +150,19 @@ print(result)
## Notes
- MarianMT models are smaller than many other translation models, enabling faster inference, low memory usage, and suitability for CPU environments.
- MarianMT models are ~298MB on disk and there are more than 1000 models. Check this [list](https://huggingface.co/Helsinki-NLP) for supported language pairs. The language codes may be inconsistent. Two digit codes can be found [here](https://developers.google.com/admin-sdk/directory/v1/languages) while three digit codes may require further searching.
- Models that require BPE preprocessing are not supported.
- All model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`. Language codes formatted like `es_AR` usually refer to the `code_{region}`. For example, `es_AR` refers to Spanish from Argentina.
- If a model can output multiple languages, prepend the desired output language to `src_txt` as shown below. New multilingual models from the [Tatoeba-Challenge](https://github.com/Helsinki-NLP/Tatoeba-Challenge) require 3 character language codes.
```py
add code snippet here
- Older multilingual models use 2 character language codes.
```py
add code snippet here