7.1 KiB
Utilities for Generation
This page lists all the utility functions used by [~generation.GenerationMixin.generate
],
[~generation.GenerationMixin.greedy_search
],
[~generation.GenerationMixin.contrastive_search
],
[~generation.GenerationMixin.sample
],
[~generation.GenerationMixin.beam_search
],
[~generation.GenerationMixin.beam_sample
],
[~generation.GenerationMixin.group_beam_search
], and
[~generation.GenerationMixin.constrained_beam_search
].
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
The output of [~generation.GenerationMixin.generate
] is an instance of a subclass of
[~utils.ModelOutput
]. This output is a data structure containing all the information returned
by [~generation.GenerationMixin.generate
], but that can also be used as tuple or dictionary.
Here's an example:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The generation_output
object is a [~generation.GreedySearchDecoderOnlyOutput
], as we can
see in the documentation of that class below, it means it has the following attributes:
sequences
: the generated sequences of tokensscores
(optional): the prediction scores of the language modelling head, for each generation stephidden_states
(optional): the hidden states of the model, for each generation stepattentions
(optional): the attention weights of the model, for each generation step
Here we have the scores
since we passed along output_scores=True
, but we don't have hidden_states
and
attentions
because we didn't pass output_hidden_states=True
or output_attentions=True
.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None
. Here for instance generation_output.scores
are all the generated prediction scores of the
language modeling head, and generation_output.attentions
is None
.
When using our generation_output
object as a tuple, it only keeps the attributes that don't have None
values.
Here, for instance, it has two elements, loss
then logits
, so
generation_output[:2]
will return the tuple (generation_output.sequences, generation_output.scores)
for instance.
When using our generation_output
object as a dictionary, it only keeps the attributes that don't have None
values. Here, for instance, it has two keys that are sequences
and scores
.
We document here all output types.
GreedySearchOutput
autodoc generation.GreedySearchDecoderOnlyOutput
autodoc generation.GreedySearchEncoderDecoderOutput
autodoc generation.FlaxGreedySearchOutput
SampleOutput
autodoc generation.SampleDecoderOnlyOutput
autodoc generation.SampleEncoderDecoderOutput
autodoc generation.FlaxSampleOutput
BeamSearchOutput
autodoc generation.BeamSearchDecoderOnlyOutput
autodoc generation.BeamSearchEncoderDecoderOutput
BeamSampleOutput
autodoc generation.BeamSampleDecoderOnlyOutput
autodoc generation.BeamSampleEncoderDecoderOutput
LogitsProcessor
A [LogitsProcessor
] can be used to modify the prediction scores of a language model head for
generation.
autodoc LogitsProcessor - call
autodoc LogitsProcessorList - call
autodoc LogitsWarper - call
autodoc MinLengthLogitsProcessor - call
autodoc MinNewTokensLengthLogitsProcessor - call
autodoc TemperatureLogitsWarper - call
autodoc RepetitionPenaltyLogitsProcessor - call
autodoc TopPLogitsWarper - call
autodoc TopKLogitsWarper - call
autodoc TypicalLogitsWarper - call
autodoc NoRepeatNGramLogitsProcessor - call
autodoc SequenceBiasLogitsProcessor - call
autodoc NoBadWordsLogitsProcessor - call
autodoc PrefixConstrainedLogitsProcessor - call
autodoc HammingDiversityLogitsProcessor - call
autodoc ForcedBOSTokenLogitsProcessor - call
autodoc ForcedEOSTokenLogitsProcessor - call
autodoc InfNanRemoveLogitsProcessor - call
autodoc TFLogitsProcessor - call
autodoc TFLogitsProcessorList - call
autodoc TFLogitsWarper - call
autodoc TFTemperatureLogitsWarper - call
autodoc TFTopPLogitsWarper - call
autodoc TFTopKLogitsWarper - call
autodoc TFMinLengthLogitsProcessor - call
autodoc TFNoBadWordsLogitsProcessor - call
autodoc TFNoRepeatNGramLogitsProcessor - call
autodoc TFRepetitionPenaltyLogitsProcessor - call
autodoc TFForcedBOSTokenLogitsProcessor - call
autodoc TFForcedEOSTokenLogitsProcessor - call
autodoc FlaxLogitsProcessor - call
autodoc FlaxLogitsProcessorList - call
autodoc FlaxLogitsWarper - call
autodoc FlaxTemperatureLogitsWarper - call
autodoc FlaxTopPLogitsWarper - call
autodoc FlaxTopKLogitsWarper - call
autodoc FlaxForcedBOSTokenLogitsProcessor - call
autodoc FlaxForcedEOSTokenLogitsProcessor - call
autodoc FlaxMinLengthLogitsProcessor - call
StoppingCriteria
A [StoppingCriteria
] can be used to change when to stop generation (other than EOS token).
autodoc StoppingCriteria - call
autodoc StoppingCriteriaList - call
autodoc MaxLengthCriteria - call
autodoc MaxTimeCriteria - call
Constraints
A [Constraint
] can be used to force the generation to include specific tokens or sequences in the output.
autodoc Constraint
autodoc PhrasalConstraint
autodoc DisjunctiveConstraint
autodoc ConstraintListState
BeamSearch
autodoc BeamScorer - process - finalize
autodoc BeamSearchScorer - process - finalize
autodoc ConstrainedBeamSearchScorer - process - finalize
Utilities
autodoc top_k_top_p_filtering
autodoc tf_top_k_top_p_filtering
Streamers
autodoc TextStreamer
autodoc TextIteratorStreamer