Repeating an important warning in the chat template docs (#31796)

* Repeating an important warning in the chat template docs * Update docs/source/en/chat_templating.md Co-authored-by: Lysandre Debut <hi@lysand.re> * Reword for clarity * Reword for clarity --------- Co-authored-by: Lysandre Debut <hi@lysand.re>
2025-08-01 02:31:11 +06:00 · 2024-07-05 15:30:24 +01:00 · 2024-07-05 15:30:24 +01:00 · e786844425
commit e786844425
parent 1d3eaa6f7e
1 changed files with 12 additions and 1 deletions
--- a/docs/source/en/chat_templating.md
+++ b/docs/source/en/chat_templating.md
@ -199,7 +199,8 @@ effect that `add_generation_prompt` has will depend on the template being used.
 ## Can I use chat templates in training?
-Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you
+Yes! This is a good way to ensure that the chat template matches the tokens the model sees during training.
 We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you
 can simply continue like any other language model training task. When training, you should usually set 
 `add_generation_prompt=False`, because the added tokens to prompt an assistant response will not be helpful during 
 training. Let's see an example:
@ -233,6 +234,16 @@ The sun.</s>
 From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column.
 <Tip>
 If you format text with `apply_chat_template(tokenize=False)` and then tokenize it in a separate step, you should set the argument
 `add_special_tokens=False`. If you use `apply_chat_template(tokenize=True)`, you don't need to worry about this!
 By default, some tokenizers add special tokens like `<bos>` and `<eos>` to text they tokenize. Chat templates should 
 always include all of the special tokens they need, and so adding extra special tokens with
 the default `add_special_tokens=True` can result in incorrect or duplicated special tokens, which will hurt model
 performance.
 </Tip>
 ## Advanced: Extra inputs to chat templates
 The only argument that `apply_chat_template` requires is `messages`. However, you can pass any keyword