Render custom tool docs a bit better (#23269)

* Try on a couple of blocks to see

* Build the doc please

* Build the doc please

* Build the doc please

* add more

* Finish with all

* Style
This commit is contained in:
Sylvain Gugger 2023-05-10 11:58:20 -04:00 committed by GitHub
parent 42017d82ba
commit eb5b5ce641
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -54,7 +54,7 @@ The prompt is structured broadly into four parts.
To better understand each part, let's look at a shortened version of how the `run` prompt can look like:
````
````text
I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task.
[...]
You can print intermediate results if it makes sense to do so.
@ -101,7 +101,7 @@ The second part (the bullet points below *"Tools"*) is dynamically added upon ca
exactly as many bullet points as there are tools in `agent.toolbox` and each bullet point consists of the name
and description of the tool:
```
```text
- <tool.name>: <tool.description>
```
@ -115,7 +115,7 @@ print(f"- {document_qa.name}: {document_qa.description}")
```
which gives:
```
```text
- document_qa: This is a tool that answers a question about a document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question.
```
@ -143,7 +143,7 @@ executable code in practice.
Let's have a look at one example:
````
````text
Task: "Identify the oldest person in the `document` and create an image showcasing the result as a banner."
I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
@ -166,7 +166,7 @@ The prompt examples are curated by the Transformers team and rigorously evaluate
to ensure that the agent's prompt is as good as possible to solve real use cases of the agent.
The final part of the prompt corresponds to:
```
```text
Task: "Draw me a picture of rivers and lakes"
I will use the following
@ -187,7 +187,7 @@ exactly in the same way it was previously done in the examples.
Without going into too much detail, the chat template has the same prompt structure with the
examples having a slightly different style, *e.g.*:
````
````text
[...]
=====
@ -225,8 +225,8 @@ to past exchanges as is done *e.g.* above by the user's input of "I tried **this
previously generated code of the agent.
Upon running `.chat`, the user's input or *task* is cast into an unfinished example of the form:
```
Human: <user-input>\n\nAssistent:
```text
Human: <user-input>\n\nAssistant:
```
which the agent completes. Contrary to the `run` command, the `chat` command then appends the completed example
to the prompt, thus giving the agent more context for the next `chat` turn.
@ -254,7 +254,7 @@ agent.run("Show me a tree", return_code=True)
gives:
```
```text
==Explanation from the agent==
I will use the following tool: `image_segmenter` to create a segmentation mask for the image.
@ -269,7 +269,8 @@ are present in the tool's name and description. Let's have a look.
```py
agent.toolbox["image_generator"].description
```
```
```text
'This is a tool that creates an image according to a prompt, which is a text description. It takes an input named `prompt` which contains the image description and outputs an image.
```
@ -280,7 +281,7 @@ agent.run("Create an image of a tree", return_code=True)
```
gives:
```
```text
==Explanation from the agent==
I will use the following tool `image_generator` to generate an image of a tree.
@ -307,7 +308,7 @@ used a lot for image generation tasks, *e.g.*
agent.run("Make an image of a house and a car", return_code=True)
```
returns
```
```text
==Explanation from the agent==
I will use the following tools `image_generator` to generate an image of a house and `image_transformer` to transform the image of a car into the image of a house.
@ -322,9 +323,11 @@ to understand the difference between `image_generator` and `image_transformer` a
We can help the agent here by changing the tool name and description of `image_transformer`. Let's instead call it `modifier`
to disassociate it a bit from "image" and "prompt":
```
```py
agent.toolbox["modifier"] = agent.toolbox.pop("image_transformer")
agent.toolbox["modifier"].description = agent.toolbox["modifier"].description.replace("transforms an image according to a prompt", "modifies an image")
agent.toolbox["modifier"].description = agent.toolbox["modifier"].description.replace(
"transforms an image according to a prompt", "modifies an image"
)
```
Now "modify" is a strong cue to use the new image processor which should help with the above prompt. Let's run it again.
@ -334,7 +337,7 @@ agent.run("Make an image of a house and a car", return_code=True)
```
Now we're getting:
```
```text
==Explanation from the agent==
I will use the following tools: `image_generator` to generate an image of a house, then `image_generator` to generate an image of a car.
@ -350,7 +353,7 @@ which is definitely closer to what we had in mind! However, we want to have both
agent.run("Create image: 'A house and car'", return_code=True)
```
```
```text
==Explanation from the agent==
I will use the following tool: `image_generator` to generate an image.
@ -389,7 +392,7 @@ of the tools, it has available to it as well as correctly insert the user's prom
</Tip>
Similarly, one can overwrite the `chat` prompt template. Note that the `chat` mode always uses the following format for the exchanges:
```
```text
Human: <<task>>
Assistant:
@ -441,7 +444,7 @@ print(f"Name: '{controlnet_transformer.name}'")
```
gives
```
```text
Description: 'This is a tool that transforms an image with ControlNet according to a prompt.
It takes two inputs: `image`, which should be the image to transform, and `prompt`, which should be the prompt to use to change it. It returns the modified image.'
Name: 'image_transformer'
@ -457,7 +460,7 @@ agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder",
This command should give you the following info:
```
```text
image_transformer has been replaced by <transformers_modules.diffusers.controlnet-canny-tool.bd76182c7777eba9612fc03c0
8718a60c0aa6312.image_transformation.ControlNetTransformationTool object at 0x7f1d3bfa3a00> as provided in `additional_tools`
```
@ -480,7 +483,7 @@ You can always have a look at the toolbox that is currently available to the age
print("\n".join([f"- {a}" for a in agent.toolbox.keys()]))
```
```
```text
- document_qa
- image_captioner
- image_qa
@ -518,7 +521,7 @@ Let's transform the image into a beautiful winter landscape:
image = agent.run("Transform the image: 'A frozen lake and snowy forest'", image=image)
```
```
```text
==Explanation from the agent==
I will use the following tool: `image_transformer` to transform the image.
@ -536,7 +539,7 @@ By default the image processing tool returns an image of size 512x512 pixels. Le
image = agent.run("Upscale the image", image)
```
```
```text
==Explanation from the agent==
I will use the following tool: `image_upscaler` to upscale the image.
@ -657,7 +660,7 @@ agent.run(
)
```
which outputs the following:
```
```text
==Code generated by the agent==
model = model_download_counter(task="text-to-video")
print(f"The model with the most downloads is {model}.")
@ -738,7 +741,7 @@ agent.run("Generate an image of the `prompt` after improving it.", prompt="A rab
```
The model adequately leverages the tool:
```
```text
==Explanation from the agent==
I will use the following tools: `StableDiffusionPromptGenerator` to improve the prompt, then `image_generator` to generate an image according to the improved prompt.