[docs] Translation guide (#32547)

clarify
This commit is contained in:
Steven Liu 2024-08-08 13:43:14 -07:00 committed by GitHub
parent 54ac39c648
commit 85817d98fb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -90,7 +90,7 @@ The next step is to load a T5 tokenizer to process the English-French language p
The preprocessing function you want to create needs to:
1. Prefix the input with a prompt so T5 knows this is a translation task. Some models capable of multiple NLP tasks require prompting for specific tasks.
2. Tokenize the input (English) and target (French) separately because you can't tokenize French text with a tokenizer pretrained on an English vocabulary.
2. Set the target language (French) in the `text_target` parameter to ensure the tokenizer processes the target text correctly. If you don't set `text_target`, the tokenizer processes the target text as English.
3. Truncate sequences to be no longer than the maximum length set by the `max_length` parameter.
```py