mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 10:12:23 +06:00
Update and reorder docs for chat templates (#27443)
* Update and reorder docs for chat templates * Fix Mistral docstring * Add section link and small fixes * Remove unneeded line in Mistral example * Add comment on saving memory * Fix generation prompts linl * Fix code block languages
This commit is contained in:
parent
fe472b1db4
commit
5468ab3555
@ -20,25 +20,11 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
An increasingly common use case for LLMs is **chat**. In a chat context, rather than continuing a single string
|
||||
of text (as is the case with a standard language model), the model instead continues a conversation that consists
|
||||
of one or more **messages**, each of which includes a **role** as well as message text.
|
||||
of one or more **messages**, each of which includes a **role**, like "user" or "assistant", as well as message text.
|
||||
|
||||
Most commonly, these roles are "user" for messages sent by the user, and "assistant" for messages sent by the model.
|
||||
Some models also support a "system" role. System messages are usually sent at the beginning of the conversation
|
||||
and include directives about how the model should behave in the subsequent chat.
|
||||
|
||||
All language models, including models fine-tuned for chat, operate on linear sequences of tokens and do not intrinsically
|
||||
have special handling for roles. This means that role information is usually injected by adding control tokens
|
||||
between messages, to indicate both the message boundary and the relevant roles.
|
||||
|
||||
Unfortunately, there isn't (yet!) a standard for which tokens to use, and so different models have been trained
|
||||
with wildly different formatting and control tokens for chat. This can be a real problem for users - if you use the
|
||||
wrong format, then the model will be confused by your input, and your performance will be a lot worse than it should be.
|
||||
This is the problem that **chat templates** aim to resolve.
|
||||
|
||||
Chat conversations are typically represented as a list of dictionaries, where each dictionary contains `role`
|
||||
and `content` keys, and represents a single chat message. Chat templates are strings containing a Jinja template that
|
||||
specifies how to format a conversation for a given model into a single tokenizable sequence. By storing this information
|
||||
with the tokenizer, we can ensure that models get input data in the format they expect.
|
||||
Much like tokenization, different models expect very different input formats for chat. This is the reason we added
|
||||
**chat templates** as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations,
|
||||
represented as lists of messages, into a single tokenizable string in the format that the model expects.
|
||||
|
||||
Let's make this concrete with a quick example using the `BlenderBot` model. BlenderBot has an extremely simple default
|
||||
template, which mostly just adds whitespace between rounds of dialogue:
|
||||
@ -48,9 +34,9 @@ template, which mostly just adds whitespace between rounds of dialogue:
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
|
||||
|
||||
>>> chat = [
|
||||
... {"role": "user", "content": "Hello, how are you?"},
|
||||
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
||||
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
|
||||
... {"role": "user", "content": "Hello, how are you?"},
|
||||
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
||||
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
|
||||
... ]
|
||||
|
||||
>>> tokenizer.apply_chat_template(chat, tokenize=False)
|
||||
@ -59,28 +45,196 @@ template, which mostly just adds whitespace between rounds of dialogue:
|
||||
|
||||
Notice how the entire chat is condensed into a single string. If we use `tokenize=True`, which is the default setting,
|
||||
that string will also be tokenized for us. To see a more complex template in action, though, let's use the
|
||||
`meta-llama/Llama-2-7b-chat-hf` model. Note that this model has gated access, so you will have to
|
||||
[request access on the repo](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) if you want to run this code yourself:
|
||||
`mistralai/Mistral-7B-Instruct-v0.1` model.
|
||||
|
||||
```python
|
||||
>> from transformers import AutoTokenizer
|
||||
>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
|
||||
|
||||
>> chat = [
|
||||
>>> chat = [
|
||||
... {"role": "user", "content": "Hello, how are you?"},
|
||||
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
||||
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
|
||||
... ]
|
||||
|
||||
>> tokenizer.use_default_system_prompt = False
|
||||
>> tokenizer.apply_chat_template(chat, tokenize=False)
|
||||
"<s>[INST] Hello, how are you? [/INST] I'm doing great. How can I help you today? </s><s>[INST] I'd like to show off how chat templating works! [/INST]"
|
||||
>>> tokenizer.apply_chat_template(chat, tokenize=False)
|
||||
"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
|
||||
```
|
||||
|
||||
Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of
|
||||
user messages (but not assistant messages!)
|
||||
user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not.
|
||||
|
||||
## How do chat templates work?
|
||||
## How do I use chat templates?
|
||||
|
||||
As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with `role`
|
||||
and `content` keys, and then pass it to the [`~PreTrainedTokenizer.apply_chat_template`] method. Once you do that,
|
||||
you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
|
||||
to use `add_generation_prompt=True` to add a [generation prompt](#what-are-generation-prompts).
|
||||
|
||||
Here's an example of preparing input for `model.generate()`, using the `Zephyr` assistant model:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
checkpoint = "HuggingFaceH4/zephyr-7b-beta"
|
||||
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||
model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a friendly chatbot who always responds in the style of a pirate",
|
||||
},
|
||||
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
|
||||
]
|
||||
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
|
||||
print(tokenizer.decode(tokenized_chat[0]))
|
||||
```
|
||||
This will yield a string in the input format that Zephyr expects.
|
||||
```text
|
||||
<|system|>
|
||||
You are a friendly chatbot who always responds in the style of a pirate</s>
|
||||
<|user|>
|
||||
How many helicopters can a human eat in one sitting?</s>
|
||||
<|assistant|>
|
||||
```
|
||||
|
||||
Now that our input is formatted correctly for Zephyr, we can use the model to generate a response to the user's question:
|
||||
|
||||
```python
|
||||
outputs = model.generate(tokenized_chat, max_new_tokens=128)
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
This will yield:
|
||||
|
||||
```text
|
||||
<|system|>
|
||||
You are a friendly chatbot who always responds in the style of a pirate</s>
|
||||
<|user|>
|
||||
How many helicopters can a human eat in one sitting?</s>
|
||||
<|assistant|>
|
||||
Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.
|
||||
```
|
||||
|
||||
Arr, 'twas easy after all!
|
||||
|
||||
## Is there an automated pipeline for chat?
|
||||
|
||||
Yes, there is: [`ConversationalPipeline`]. This pipeline is designed to make it easy to use chat models. Let's try
|
||||
the `Zephyr` example again, but this time using the pipeline:
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
pipe = pipeline("conversational", "HuggingFaceH4/zephyr-7b-beta")
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a friendly chatbot who always responds in the style of a pirate",
|
||||
},
|
||||
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
|
||||
]
|
||||
print(pipe(messages))
|
||||
```
|
||||
|
||||
```text
|
||||
Conversation id: 76d886a0-74bd-454e-9804-0467041a63dc
|
||||
system: You are a friendly chatbot who always responds in the style of a pirate
|
||||
user: How many helicopters can a human eat in one sitting?
|
||||
assistant: Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.
|
||||
```
|
||||
|
||||
[`ConversationalPipeline`] will take care of all the details of tokenization and calling `apply_chat_template` for you -
|
||||
once the model has a chat template, all you need to do is initialize the pipeline and pass it the list of messages!
|
||||
|
||||
## What are "generation prompts"?
|
||||
|
||||
You may have noticed that the `apply_chat_template` method has an `add_generation_prompt` argument. This argument tells
|
||||
the template to add tokens that indicate the start of a bot response. For example, consider the following chat:
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "user", "content": "Hi there!"},
|
||||
{"role": "assistant", "content": "Nice to meet you!"},
|
||||
{"role": "user", "content": "Can I ask a question?"}
|
||||
]
|
||||
```
|
||||
|
||||
Here's what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example:
|
||||
|
||||
```python
|
||||
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
|
||||
"""<|im_start|>user
|
||||
Hi there!<|im_end|>
|
||||
<|im_start|>assistant
|
||||
Nice to meet you!<|im_end|>
|
||||
<|im_start|>user
|
||||
Can I ask a question?<|im_end|>
|
||||
"""
|
||||
```
|
||||
|
||||
And here's what it looks like **with** a generation prompt:
|
||||
|
||||
```python
|
||||
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
"""<|im_start|>user
|
||||
Hi there!<|im_end|>
|
||||
<|im_start|>assistant
|
||||
Nice to meet you!<|im_end|>
|
||||
<|im_start|>user
|
||||
Can I ask a question?<|im_end|>
|
||||
<|im_start|>assistant
|
||||
"""
|
||||
```
|
||||
|
||||
Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model
|
||||
generates text it will write a bot response instead of doing something unexpected, like continuing the user's
|
||||
message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a
|
||||
special kind of text to them! You need to guide them with the appropriate control tokens so they know what they're
|
||||
supposed to be doing.
|
||||
|
||||
Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any
|
||||
special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact
|
||||
effect that `add_generation_prompt` has will depend on the template being used.
|
||||
|
||||
## Can I use chat templates in training?
|
||||
|
||||
Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you
|
||||
can simply continue like any other language model training task. When training, you should usually set
|
||||
`add_generation_prompt=False`, because the added tokens to prompt an assistant response will not be helpful during
|
||||
training. Let's see an example:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
from datasets import Dataset
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
|
||||
|
||||
chat1 = [
|
||||
{"role": "user", "content": "Which is bigger, the moon or the sun?"},
|
||||
{"role": "assistant", "content": "The sun."}
|
||||
]
|
||||
chat2 = [
|
||||
{"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
|
||||
{"role": "assistant", "content": "A bacterium."}
|
||||
]
|
||||
|
||||
dataset = Dataset.from_dict({"chat": [chat1, chat2]})
|
||||
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
|
||||
print(dataset['formatted_chat'][0])
|
||||
```
|
||||
And we get:
|
||||
```text
|
||||
<|user|>
|
||||
Which is bigger, the moon or the sun?</s>
|
||||
<|assistant|>
|
||||
The sun.</s>
|
||||
```
|
||||
|
||||
From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column.
|
||||
|
||||
## Advanced: How do chat templates work?
|
||||
|
||||
The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the
|
||||
default template for that model class is used instead. Let's take a look at the template for `BlenderBot`:
|
||||
@ -154,7 +308,9 @@ Hopefully if you stare at this for a little bit you can see what this template i
|
||||
on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly
|
||||
distinguishable to the model because of the tokens they're wrapped in.
|
||||
|
||||
## How do I create a chat template?
|
||||
## Advanced: Adding and editing chat templates
|
||||
|
||||
### How do I create a chat template?
|
||||
|
||||
Simple, just write a jinja template and set `tokenizer.chat_template`. You may find it easier to start with an
|
||||
existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template
|
||||
@ -187,7 +343,7 @@ tokenizer.push_to_hub("model_name") # Upload your new template to the Hub!
|
||||
The method [`~PreTrainedTokenizer.apply_chat_template`] which uses your chat template is called by the [`ConversationalPipeline`] class, so
|
||||
once you set the correct chat template, your model will automatically become compatible with [`ConversationalPipeline`].
|
||||
|
||||
## What are "default" templates?
|
||||
### What are "default" templates?
|
||||
|
||||
Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards
|
||||
compatibility, we have retained this class-specific handling as default templates, also set at the class level. If a
|
||||
@ -200,7 +356,7 @@ the class template is appropriate for your model, we strongly recommend overridi
|
||||
setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured
|
||||
for chat, and to future-proof in case the default templates are ever altered or deprecated.
|
||||
|
||||
## What template should I use?
|
||||
### What template should I use?
|
||||
|
||||
When setting the template for a model that's already been trained for chat, you should ensure that the template
|
||||
exactly matches the message formatting that the model saw during training, or else you will probably experience
|
||||
@ -229,7 +385,7 @@ tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set a
|
||||
This template wraps each message in `<|im_start|>` and `<|im_end|>` tokens, and simply writes the role as a string, which
|
||||
allows for flexibility in the roles you train with. The output looks like this:
|
||||
|
||||
```
|
||||
```text
|
||||
<|im_start|>system
|
||||
You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|>
|
||||
<|im_start|>user
|
||||
@ -242,62 +398,12 @@ The "user", "system" and "assistant" roles are the standard for chat, and we rec
|
||||
particularly if you want your model to operate well with [`ConversationalPipeline`]. However, you are not limited
|
||||
to these roles - templating is extremely flexible, and any string can be a role.
|
||||
|
||||
## What are "generation prompts"?
|
||||
|
||||
You may notice that the `apply_chat_template` method has an `add_generation_prompt` argument. This argument tells
|
||||
the template to add tokens that indicate the start of a bot response. For example, consider the following chat:
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "user", "content": "Hi there!"},
|
||||
{"role": "assistant", "content": "Nice to meet you!"},
|
||||
{"role": "user", "content": "Can I ask a question?"}
|
||||
]
|
||||
```
|
||||
|
||||
Here's what this will look like without a generation prompt, using the ChatML template we described above:
|
||||
|
||||
```python
|
||||
>> tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
|
||||
"""<|im_start|>user
|
||||
Hi there!<|im_end|>
|
||||
<|im_start|>assistant
|
||||
Nice to meet you!<|im_end|>
|
||||
<|im_start|>user
|
||||
Can I ask a question?<|im_end|>
|
||||
"""
|
||||
```
|
||||
|
||||
And here's what it looks like **with** a generation prompt:
|
||||
|
||||
```python
|
||||
>> tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
"""<|im_start|>user
|
||||
Hi there!<|im_end|>
|
||||
<|im_start|>assistant
|
||||
Nice to meet you!<|im_end|>
|
||||
<|im_start|>user
|
||||
Can I ask a question?<|im_end|>
|
||||
<|im_start|>assistant
|
||||
"""
|
||||
```
|
||||
|
||||
Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model
|
||||
generates text it will write a bot response instead of doing something unexpected, like continuing the user's
|
||||
message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a
|
||||
special kind of text to them! You need to guide them with the appropriate control tokens so they know what they're
|
||||
supposed to be doing.
|
||||
|
||||
Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any
|
||||
special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact
|
||||
effect that `add_generation_prompt` has will depend on the template being used.
|
||||
|
||||
## I want to use chat templates! How should I get started?
|
||||
### I want to add some chat templates! How should I get started?
|
||||
|
||||
If you have any chat models, you should set their `tokenizer.chat_template` attribute and test it using
|
||||
[`~PreTrainedTokenizer.apply_chat_template`]. This applies even if you're not the model owner - if you're using a model
|
||||
with an empty chat template, or one that's still using the default class template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to
|
||||
the model repository so that this attribute can be set properly!
|
||||
[`~PreTrainedTokenizer.apply_chat_template`], then push the updated tokenizer to the Hub. This applies even if you're
|
||||
not the model owner - if you're using a model with an empty chat template, or one that's still using the default class
|
||||
template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to the model repository so that this attribute can be set properly!
|
||||
|
||||
Once the attribute is set, that's it, you're done! `tokenizer.apply_chat_template` will now work correctly for that
|
||||
model, which means it is also automatically supported in places like `ConversationalPipeline`!
|
||||
@ -306,7 +412,7 @@ By ensuring that models have this attribute, we can make sure that the whole com
|
||||
open-source models. Formatting mismatches have been haunting the field and silently harming performance for too long -
|
||||
it's time to put an end to them!
|
||||
|
||||
## Template writing tips
|
||||
## Advanced: Template writing tips
|
||||
|
||||
If you're unfamiliar with Jinja, we generally find that the easiest way to write a chat template is to first
|
||||
write a short Python script that formats messages the way you want, and then convert that script into a template.
|
||||
|
Loading…
Reference in New Issue
Block a user