From 5468ab355567063ede0cc6c685a8feed4bfb2d1b Mon Sep 17 00:00:00 2001 From: Matt Date: Tue, 14 Nov 2023 18:26:13 +0000 Subject: [PATCH] Update and reorder docs for chat templates (#27443) * Update and reorder docs for chat templates * Fix Mistral docstring * Add section link and small fixes * Remove unneeded line in Mistral example * Add comment on saving memory * Fix generation prompts linl * Fix code block languages --- docs/source/en/chat_templating.md | 286 ++++++++++++++++++++---------- 1 file changed, 196 insertions(+), 90 deletions(-) diff --git a/docs/source/en/chat_templating.md b/docs/source/en/chat_templating.md index 115c9f51677..82bdf591ae5 100644 --- a/docs/source/en/chat_templating.md +++ b/docs/source/en/chat_templating.md @@ -20,25 +20,11 @@ rendered properly in your Markdown viewer. An increasingly common use case for LLMs is **chat**. In a chat context, rather than continuing a single string of text (as is the case with a standard language model), the model instead continues a conversation that consists -of one or more **messages**, each of which includes a **role** as well as message text. +of one or more **messages**, each of which includes a **role**, like "user" or "assistant", as well as message text. -Most commonly, these roles are "user" for messages sent by the user, and "assistant" for messages sent by the model. -Some models also support a "system" role. System messages are usually sent at the beginning of the conversation -and include directives about how the model should behave in the subsequent chat. - -All language models, including models fine-tuned for chat, operate on linear sequences of tokens and do not intrinsically -have special handling for roles. This means that role information is usually injected by adding control tokens -between messages, to indicate both the message boundary and the relevant roles. - -Unfortunately, there isn't (yet!) a standard for which tokens to use, and so different models have been trained -with wildly different formatting and control tokens for chat. This can be a real problem for users - if you use the -wrong format, then the model will be confused by your input, and your performance will be a lot worse than it should be. -This is the problem that **chat templates** aim to resolve. - -Chat conversations are typically represented as a list of dictionaries, where each dictionary contains `role` -and `content` keys, and represents a single chat message. Chat templates are strings containing a Jinja template that -specifies how to format a conversation for a given model into a single tokenizable sequence. By storing this information -with the tokenizer, we can ensure that models get input data in the format they expect. +Much like tokenization, different models expect very different input formats for chat. This is the reason we added +**chat templates** as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations, +represented as lists of messages, into a single tokenizable string in the format that the model expects. Let's make this concrete with a quick example using the `BlenderBot` model. BlenderBot has an extremely simple default template, which mostly just adds whitespace between rounds of dialogue: @@ -48,9 +34,9 @@ template, which mostly just adds whitespace between rounds of dialogue: >>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") >>> chat = [ -... {"role": "user", "content": "Hello, how are you?"}, -... {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, -... {"role": "user", "content": "I'd like to show off how chat templating works!"}, +... {"role": "user", "content": "Hello, how are you?"}, +... {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, +... {"role": "user", "content": "I'd like to show off how chat templating works!"}, ... ] >>> tokenizer.apply_chat_template(chat, tokenize=False) @@ -59,28 +45,196 @@ template, which mostly just adds whitespace between rounds of dialogue: Notice how the entire chat is condensed into a single string. If we use `tokenize=True`, which is the default setting, that string will also be tokenized for us. To see a more complex template in action, though, let's use the -`meta-llama/Llama-2-7b-chat-hf` model. Note that this model has gated access, so you will have to -[request access on the repo](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) if you want to run this code yourself: +`mistralai/Mistral-7B-Instruct-v0.1` model. ```python ->> from transformers import AutoTokenizer ->> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf") +>>> from transformers import AutoTokenizer +>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") ->> chat = [ +>>> chat = [ ... {"role": "user", "content": "Hello, how are you?"}, ... {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, ... {"role": "user", "content": "I'd like to show off how chat templating works!"}, ... ] ->> tokenizer.use_default_system_prompt = False ->> tokenizer.apply_chat_template(chat, tokenize=False) -"[INST] Hello, how are you? [/INST] I'm doing great. How can I help you today? [INST] I'd like to show off how chat templating works! [/INST]" +>>> tokenizer.apply_chat_template(chat, tokenize=False) +"[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today? [INST] I'd like to show off how chat templating works! [/INST]" ``` Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of -user messages (but not assistant messages!) +user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not. -## How do chat templates work? +## How do I use chat templates? + +As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with `role` +and `content` keys, and then pass it to the [`~PreTrainedTokenizer.apply_chat_template`] method. Once you do that, +you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea +to use `add_generation_prompt=True` to add a [generation prompt](#what-are-generation-prompts). + +Here's an example of preparing input for `model.generate()`, using the `Zephyr` assistant model: + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer + +checkpoint = "HuggingFaceH4/zephyr-7b-beta" +tokenizer = AutoTokenizer.from_pretrained(checkpoint) +model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here + +messages = [ + { + "role": "system", + "content": "You are a friendly chatbot who always responds in the style of a pirate", + }, + {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, + ] +tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") +print(tokenizer.decode(tokenized_chat[0])) +``` +This will yield a string in the input format that Zephyr expects. +```text +<|system|> +You are a friendly chatbot who always responds in the style of a pirate +<|user|> +How many helicopters can a human eat in one sitting? +<|assistant|> +``` + +Now that our input is formatted correctly for Zephyr, we can use the model to generate a response to the user's question: + +```python +outputs = model.generate(tokenized_chat, max_new_tokens=128) +print(tokenizer.decode(outputs[0])) +``` + +This will yield: + +```text +<|system|> +You are a friendly chatbot who always responds in the style of a pirate +<|user|> +How many helicopters can a human eat in one sitting? +<|assistant|> +Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all. +``` + +Arr, 'twas easy after all! + +## Is there an automated pipeline for chat? + +Yes, there is: [`ConversationalPipeline`]. This pipeline is designed to make it easy to use chat models. Let's try +the `Zephyr` example again, but this time using the pipeline: + +```python +from transformers import pipeline + +pipe = pipeline("conversational", "HuggingFaceH4/zephyr-7b-beta") +messages = [ + { + "role": "system", + "content": "You are a friendly chatbot who always responds in the style of a pirate", + }, + {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, +] +print(pipe(messages)) +``` + +```text +Conversation id: 76d886a0-74bd-454e-9804-0467041a63dc +system: You are a friendly chatbot who always responds in the style of a pirate +user: How many helicopters can a human eat in one sitting? +assistant: Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all. +``` + +[`ConversationalPipeline`] will take care of all the details of tokenization and calling `apply_chat_template` for you - +once the model has a chat template, all you need to do is initialize the pipeline and pass it the list of messages! + +## What are "generation prompts"? + +You may have noticed that the `apply_chat_template` method has an `add_generation_prompt` argument. This argument tells +the template to add tokens that indicate the start of a bot response. For example, consider the following chat: + +```python +messages = [ + {"role": "user", "content": "Hi there!"}, + {"role": "assistant", "content": "Nice to meet you!"}, + {"role": "user", "content": "Can I ask a question?"} +] +``` + +Here's what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example: + +```python +tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) +"""<|im_start|>user +Hi there!<|im_end|> +<|im_start|>assistant +Nice to meet you!<|im_end|> +<|im_start|>user +Can I ask a question?<|im_end|> +""" +``` + +And here's what it looks like **with** a generation prompt: + +```python +tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) +"""<|im_start|>user +Hi there!<|im_end|> +<|im_start|>assistant +Nice to meet you!<|im_end|> +<|im_start|>user +Can I ask a question?<|im_end|> +<|im_start|>assistant +""" +``` + +Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model +generates text it will write a bot response instead of doing something unexpected, like continuing the user's +message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a +special kind of text to them! You need to guide them with the appropriate control tokens so they know what they're +supposed to be doing. + +Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any +special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact +effect that `add_generation_prompt` has will depend on the template being used. + +## Can I use chat templates in training? + +Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you +can simply continue like any other language model training task. When training, you should usually set +`add_generation_prompt=False`, because the added tokens to prompt an assistant response will not be helpful during +training. Let's see an example: + +```python +from transformers import AutoTokenizer +from datasets import Dataset + +tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta") + +chat1 = [ + {"role": "user", "content": "Which is bigger, the moon or the sun?"}, + {"role": "assistant", "content": "The sun."} +] +chat2 = [ + {"role": "user", "content": "Which is bigger, a virus or a bacterium?"}, + {"role": "assistant", "content": "A bacterium."} +] + +dataset = Dataset.from_dict({"chat": [chat1, chat2]}) +dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)}) +print(dataset['formatted_chat'][0]) +``` +And we get: +```text +<|user|> +Which is bigger, the moon or the sun? +<|assistant|> +The sun. +``` + +From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column. + +## Advanced: How do chat templates work? The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the default template for that model class is used instead. Let's take a look at the template for `BlenderBot`: @@ -154,7 +308,9 @@ Hopefully if you stare at this for a little bit you can see what this template i on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly distinguishable to the model because of the tokens they're wrapped in. -## How do I create a chat template? +## Advanced: Adding and editing chat templates + +### How do I create a chat template? Simple, just write a jinja template and set `tokenizer.chat_template`. You may find it easier to start with an existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template @@ -187,7 +343,7 @@ tokenizer.push_to_hub("model_name") # Upload your new template to the Hub! The method [`~PreTrainedTokenizer.apply_chat_template`] which uses your chat template is called by the [`ConversationalPipeline`] class, so once you set the correct chat template, your model will automatically become compatible with [`ConversationalPipeline`]. -## What are "default" templates? +### What are "default" templates? Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards compatibility, we have retained this class-specific handling as default templates, also set at the class level. If a @@ -200,7 +356,7 @@ the class template is appropriate for your model, we strongly recommend overridi setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured for chat, and to future-proof in case the default templates are ever altered or deprecated. -## What template should I use? +### What template should I use? When setting the template for a model that's already been trained for chat, you should ensure that the template exactly matches the message formatting that the model saw during training, or else you will probably experience @@ -229,7 +385,7 @@ tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set a This template wraps each message in `<|im_start|>` and `<|im_end|>` tokens, and simply writes the role as a string, which allows for flexibility in the roles you train with. The output looks like this: -``` +```text <|im_start|>system You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|> <|im_start|>user @@ -242,62 +398,12 @@ The "user", "system" and "assistant" roles are the standard for chat, and we rec particularly if you want your model to operate well with [`ConversationalPipeline`]. However, you are not limited to these roles - templating is extremely flexible, and any string can be a role. -## What are "generation prompts"? - -You may notice that the `apply_chat_template` method has an `add_generation_prompt` argument. This argument tells -the template to add tokens that indicate the start of a bot response. For example, consider the following chat: - -```python -messages = [ - {"role": "user", "content": "Hi there!"}, - {"role": "assistant", "content": "Nice to meet you!"}, - {"role": "user", "content": "Can I ask a question?"} -] -``` - -Here's what this will look like without a generation prompt, using the ChatML template we described above: - -```python ->> tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) -"""<|im_start|>user -Hi there!<|im_end|> -<|im_start|>assistant -Nice to meet you!<|im_end|> -<|im_start|>user -Can I ask a question?<|im_end|> -""" -``` - -And here's what it looks like **with** a generation prompt: - -```python ->> tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) -"""<|im_start|>user -Hi there!<|im_end|> -<|im_start|>assistant -Nice to meet you!<|im_end|> -<|im_start|>user -Can I ask a question?<|im_end|> -<|im_start|>assistant -""" -``` - -Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model -generates text it will write a bot response instead of doing something unexpected, like continuing the user's -message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a -special kind of text to them! You need to guide them with the appropriate control tokens so they know what they're -supposed to be doing. - -Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any -special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact -effect that `add_generation_prompt` has will depend on the template being used. - -## I want to use chat templates! How should I get started? +### I want to add some chat templates! How should I get started? If you have any chat models, you should set their `tokenizer.chat_template` attribute and test it using -[`~PreTrainedTokenizer.apply_chat_template`]. This applies even if you're not the model owner - if you're using a model -with an empty chat template, or one that's still using the default class template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to -the model repository so that this attribute can be set properly! +[`~PreTrainedTokenizer.apply_chat_template`], then push the updated tokenizer to the Hub. This applies even if you're +not the model owner - if you're using a model with an empty chat template, or one that's still using the default class +template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to the model repository so that this attribute can be set properly! Once the attribute is set, that's it, you're done! `tokenizer.apply_chat_template` will now work correctly for that model, which means it is also automatically supported in places like `ConversationalPipeline`! @@ -306,7 +412,7 @@ By ensuring that models have this attribute, we can make sure that the whole com open-source models. Formatting mismatches have been haunting the field and silently harming performance for too long - it's time to put an end to them! -## Template writing tips +## Advanced: Template writing tips If you're unfamiliar with Jinja, we generally find that the easiest way to write a chat template is to first write a short Python script that formats messages the way you want, and then convert that script into a template.