mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
[docs] update cache docs with new info (#38775)
* update docs with new info * Update docs/source/en/kv_cache.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
parent
324cc77dc3
commit
e26ae89281
@ -261,7 +261,9 @@ A cache can also work in iterative generation settings where there is back-and-f
|
|||||||
|
|
||||||
For iterative generation with a cache, start by initializing an empty cache class and then you can feed in your new prompts. Keep track of dialogue history with a [chat template](./chat_templating).
|
For iterative generation with a cache, start by initializing an empty cache class and then you can feed in your new prompts. Keep track of dialogue history with a [chat template](./chat_templating).
|
||||||
|
|
||||||
The example below demonstrates how to use a cache for iterative generation.
|
The following example demonstrates [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). If you’re using a different chat-style model, [`~PreTrainedTokenizer.apply_chat_template`] may process messages differently. It might cut out important tokens depending on how the Jinja template is written.
|
||||||
|
|
||||||
|
For example, some models use special `<think> ... </think>` tokens during reasoning. These could get lost during re-encoding, causing indexing issues. You might need to manually remove or adjust extra tokens from the completions to keep things stable.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
import torch
|
import torch
|
||||||
@ -281,7 +283,6 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
|
|||||||
user_prompts = ["Hello, what's your name?", "Btw, yesterday I was on a rock concert."]
|
user_prompts = ["Hello, what's your name?", "Btw, yesterday I was on a rock concert."]
|
||||||
|
|
||||||
past_key_values = DynamicCache()
|
past_key_values = DynamicCache()
|
||||||
max_cache_length = past_key_values.get_max_length()
|
|
||||||
|
|
||||||
messages = []
|
messages = []
|
||||||
for prompt in user_prompts:
|
for prompt in user_prompts:
|
||||||
|
Loading…
Reference in New Issue
Block a user