Add tip on setting tokenizer attributes (#28764)

* Add tip on setting tokenizer attributes

* Grammar

* Remove the bit that was causing doc builds to fail
This commit is contained in:
Matt 2024-02-01 14:44:58 +00:00 committed by GitHub
parent 709dc43239
commit 7bc6d76396
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -343,6 +343,15 @@ tokenizer.push_to_hub("model_name") # Upload your new template to the Hub!
The method [`~PreTrainedTokenizer.apply_chat_template`] which uses your chat template is called by the [`ConversationalPipeline`] class, so
once you set the correct chat template, your model will automatically become compatible with [`ConversationalPipeline`].
<Tip>
If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat
control tokens as special tokens in the tokenizer. Special tokens are never split,
ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You
should also set the tokenizer's `eos_token` attribute to the token that marks the end of assistant generations in your
template. This will ensure that text generation tools can correctly figure out when to stop generating text.
</Tip>
### What are "default" templates?
Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards