mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
Cleanup tool calling documentation and rename doc (#32337)
* Rename "Templates for Chat Models" doc to "Chat Templates" * Small formatting fix * Small formatting fix * Small formatting fix * Cleanup tool calling docs as well * Remove unneeded 'revision' * Move tip to below main code example * Little bonus section on template editing
This commit is contained in:
parent
8a3c55eb21
commit
b7ea171403
@ -120,7 +120,7 @@
|
|||||||
- local: custom_models
|
- local: custom_models
|
||||||
title: Share a custom model
|
title: Share a custom model
|
||||||
- local: chat_templating
|
- local: chat_templating
|
||||||
title: Templates for chat models
|
title: Chat templates
|
||||||
- local: trainer
|
- local: trainer
|
||||||
title: Trainer
|
title: Trainer
|
||||||
- local: sagemaker
|
- local: sagemaker
|
||||||
|
@ -14,7 +14,7 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# Templates for Chat Models
|
# Chat Templates
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
@ -235,13 +235,14 @@ The sun.</s>
|
|||||||
From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column.
|
From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column.
|
||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
If you format text with `apply_chat_template(tokenize=False)` and then tokenize it in a separate step, you should set the argument
|
|
||||||
`add_special_tokens=False`. If you use `apply_chat_template(tokenize=True)`, you don't need to worry about this!
|
|
||||||
|
|
||||||
By default, some tokenizers add special tokens like `<bos>` and `<eos>` to text they tokenize. Chat templates should
|
By default, some tokenizers add special tokens like `<bos>` and `<eos>` to text they tokenize. Chat templates should
|
||||||
always include all of the special tokens they need, and so adding extra special tokens with
|
already include all the special tokens they need, and so additional special tokens will often be incorrect or
|
||||||
the default `add_special_tokens=True` can result in incorrect or duplicated special tokens, which will hurt model
|
duplicated, which will hurt model performance.
|
||||||
performance.
|
|
||||||
|
Therefore, if you format text with `apply_chat_template(tokenize=False)`, you should set the argument
|
||||||
|
`add_special_tokens=False` when you tokenize that text later. If you use `apply_chat_template(tokenize=True)`, you don't need to worry about this!
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
## Advanced: Extra inputs to chat templates
|
## Advanced: Extra inputs to chat templates
|
||||||
@ -325,7 +326,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||||||
|
|
||||||
checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
|
checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
|
||||||
|
|
||||||
tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision="pr/13")
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||||
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")
|
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -370,7 +371,7 @@ messages = [
|
|||||||
Now, let's apply the chat template and generate a response:
|
Now, let's apply the chat template and generate a response:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
inputs = tokenizer.apply_chat_template(messages, chat_template="tool_use", tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
|
inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
|
||||||
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
||||||
out = model.generate(**inputs, max_new_tokens=128)
|
out = model.generate(**inputs, max_new_tokens=128)
|
||||||
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
|
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
|
||||||
@ -388,29 +389,47 @@ The model has called the function with valid arguments, in the format requested
|
|||||||
inferred that we're most likely referring to the Paris in France, and it remembered that, as the home of SI units,
|
inferred that we're most likely referring to the Paris in France, and it remembered that, as the home of SI units,
|
||||||
the temperature in France should certainly be displayed in Celsius.
|
the temperature in France should certainly be displayed in Celsius.
|
||||||
|
|
||||||
Let's append the model's tool call to the conversation. Note that we generate a random `tool_call_id` here. These IDs
|
Next, let's append the model's tool call to the conversation.
|
||||||
are not used by all models, but they allow models to issue multiple tool calls at once and keep track of which response
|
|
||||||
corresponds to which call. You can generate them any way you like, but they should be unique within each chat.
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
tool_call_id = "vAHdf3" # Random ID, should be unique for each tool call
|
|
||||||
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
|
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
|
||||||
messages.append({"role": "assistant", "tool_calls": [{"id": tool_call_id, "type": "function", "function": tool_call}]})
|
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
Now that we've added the tool call to the conversation, we can call the function and append the result to the
|
Now that we've added the tool call to the conversation, we can call the function and append the result to the
|
||||||
conversation. Since we're just using a dummy function for this example that always returns 22.0, we can just append
|
conversation. Since we're just using a dummy function for this example that always returns 22.0, we can just append
|
||||||
that result directly. Again, note the `tool_call_id` - this should match the ID used in the tool call above.
|
that result directly.
|
||||||
|
|
||||||
|
```python
|
||||||
|
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})
|
||||||
|
```
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Some model architectures, notably Mistral/Mixtral, also require a `tool_call_id` here, which should be
|
||||||
|
9 randomly-generated alphanumeric characters, and assigned to the `id` key of the tool call
|
||||||
|
dictionary. The same key should also be assigned to the `tool_call_id` key of the tool response dictionary below, so
|
||||||
|
that tool calls can be matched to tool responses. So, for Mistral/Mixtral models, the code above would be:
|
||||||
|
|
||||||
|
```python
|
||||||
|
tool_call_id = "9Ae3bDc2F" # Random ID, 9 alphanumeric characters
|
||||||
|
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
|
||||||
|
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "id": tool_call_id, "function": tool_call}]})
|
||||||
|
```
|
||||||
|
|
||||||
|
and
|
||||||
|
|
||||||
```python
|
```python
|
||||||
messages.append({"role": "tool", "tool_call_id": tool_call_id, "name": "get_current_temperature", "content": "22.0"})
|
messages.append({"role": "tool", "tool_call_id": tool_call_id, "name": "get_current_temperature", "content": "22.0"})
|
||||||
```
|
```
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
Finally, let's let the assistant read the function outputs and continue chatting with the user:
|
Finally, let's let the assistant read the function outputs and continue chatting with the user:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
inputs = tokenizer.apply_chat_template(messages, chat_template="tool_use", tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
|
inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
|
||||||
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
||||||
out = model.generate(**inputs, max_new_tokens=128)
|
out = model.generate(**inputs, max_new_tokens=128)
|
||||||
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
|
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
|
||||||
@ -426,14 +445,6 @@ Although this was a simple demo with dummy tools and a single call, the same tec
|
|||||||
multiple real tools and longer conversations. This can be a powerful way to extend the capabilities of conversational
|
multiple real tools and longer conversations. This can be a powerful way to extend the capabilities of conversational
|
||||||
agents with real-time information, computational tools like calculators, or access to large databases.
|
agents with real-time information, computational tools like calculators, or access to large databases.
|
||||||
|
|
||||||
<Tip>
|
|
||||||
Not all of the tool-calling features shown above are used by all models. Some use tool call IDs, others simply use the function name and
|
|
||||||
match tool calls to results using the ordering, and there are several models that use neither and only issue one tool
|
|
||||||
call at a time to avoid confusion. If you want your code to be compatible across as many models as possible, we
|
|
||||||
recommend structuring your tools calls like we've shown here, and returning tool results in the order that
|
|
||||||
they were issued by the model. The chat templates on each model should handle the rest.
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
### Understanding tool schemas
|
### Understanding tool schemas
|
||||||
|
|
||||||
Each function you pass to the `tools` argument of `apply_chat_template` is converted into a
|
Each function you pass to the `tools` argument of `apply_chat_template` is converted into a
|
||||||
@ -855,4 +866,25 @@ all implementations of Jinja:
|
|||||||
in the Jinja documentation for more.
|
in the Jinja documentation for more.
|
||||||
- Replace `True`, `False` and `None`, which are Python-specific, with `true`, `false` and `none`.
|
- Replace `True`, `False` and `None`, which are Python-specific, with `true`, `false` and `none`.
|
||||||
- Directly rendering a dict or list may give different results in other implementations (for example, string entries
|
- Directly rendering a dict or list may give different results in other implementations (for example, string entries
|
||||||
might change from single-quoted to double-quoted). Adding the `tojson` filter can help to ensure consistency here.
|
might change from single-quoted to double-quoted). Adding the `tojson` filter can help to ensure consistency here.
|
||||||
|
|
||||||
|
### Writing and debugging larger templates
|
||||||
|
|
||||||
|
When this feature was introduced, most templates were quite small, the Jinja equivalent of a "one-liner" script.
|
||||||
|
However, with new models and features like tool-use and RAG, some templates can be 100 lines long or more. When
|
||||||
|
writing templates like these, it's a good idea to write them in a separate file, using a text editor. You can easily
|
||||||
|
extract a chat template to a file:
|
||||||
|
|
||||||
|
```python
|
||||||
|
open("template.jinja", "w").write(tokenizer.chat_template)
|
||||||
|
```
|
||||||
|
|
||||||
|
Or load the edited template back into the tokenizer:
|
||||||
|
|
||||||
|
```python
|
||||||
|
tokenizer.chat_template = open("template.jinja").read()
|
||||||
|
```
|
||||||
|
|
||||||
|
As an added bonus, when you write a long, multi-line template in a separate file, line numbers in that file will
|
||||||
|
exactly correspond to line numbers in template parsing or execution errors. This will make it much easier to
|
||||||
|
identify the source of issues.
|
@ -14,7 +14,7 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# Templates for Chat Models
|
# Chat Templates
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user