transformers/docs/source/en/generation_features.md
co63oc 37508816d6
chore: Fix typos in docs and examples (#36524)
Fix typos in docs and examples

Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-03-04 13:47:41 +00:00

4.7 KiB

Generation features

The [~GenerationMixin.generate] API supports a couple features for building applications on top of it.

This guide will show you how to use these features.

Streaming

Streaming starts returning text as soon as it is generated so you don't have to wait to see the entire generated response all at once. It is important in user-facing applications because it reduces perceived latency and allows users to see the generation progression.

Tip

Learn more about streaming in the Text Generation Inference docs.

Create an instance of [TextStreamer] with the tokenizer. Pass [TextStreamer] to the streamer parameter in [~GenerationMixin.generate] to stream the output one word at a time.

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
inputs = tokenizer(["The secret to baking a good cake is "], return_tensors="pt")
streamer = TextStreamer(tokenizer)

_ = model.generate(**inputs, streamer=streamer, max_new_tokens=20)

The streamer parameter is compatible with any class with a [~TextStreamer.put] and [~TextStreamer.end] method. [~TextStreamer.put] pushes new tokens and [~TextStreamer.end] flags the end of generation. You can create your own streamer class as long as they include these two methods, or you can use Transformers' basic streamer classes.

Watermarking

Watermarking is useful for detecting whether text is generated. The watermarking strategy in Transformers randomly "colors" a subset of the tokens green. When green tokens are generated, they have a small bias added to their logits, and a higher probability of being generated. You can detect generated text by comparing the proportion of green tokens to the amount of green tokens typically found in human-generated text.

Watermarking is supported for any generative model in Transformers and doesn't require an extra classification model to detect the watermarked text.

Create a [WatermarkingConfig] with the bias value to add to the logits and watermarking algorithm. The example below uses the "selfhash" algorithm, where the green token selection only depends on the current token. Pass the [WatermarkingConfig] to [~GenerationMixin.generate].

Tip

The [WatermarkDetector] class detects the proportion of green tokens in generated text, which is why it is recommended to strip the prompt text, if it is much longer than the generated text. Padding can also have an effect on [WatermarkDetector].

from transformers import AutoTokenizer, AutoModelForCausalLM, WatermarkDetector, WatermarkingConfig

model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side = "left"

inputs = tokenizer(["This is the beginning of a long story", "Alice and Bob are"], padding=True, return_tensors="pt")
input_len = inputs["input_ids"].shape[-1]

watermarking_config = WatermarkingConfig(bias=2.5, seeding_scheme="selfhash")
out = model.generate(**inputs, watermarking_config=watermarking_config, do_sample=False, max_length=20)

Create an instance of [WatermarkDetector] and pass the model output to it to detect whether the text is machine-generated. The [WatermarkDetector] must have the same [WatermarkingConfig] used during generation.

detector = WatermarkDetector(model_config=model.config, device="cpu", watermarking_config=watermarking_config)
detection_out = detector(out, return_dict=True)
detection_out.prediction
array([True, True])