mirror of
https://github.com/huggingface/transformers.git
synced 2025-08-03 03:31:05 +06:00
Clarify and add missing typical_p argument docstring. (#21095)
* Clarify and add missing typical_p docstring. * Make the docstring easier to understand. * Clarify typical_p docstring Accept the suggestion by @stevhliu for paraphrasing the docstring. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Use the same docstring as in GenerationConfig Follow the suggestion suggested by @stevhliu in the pull request conversation. * Fix docstring spacing. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
parent
f30bcd5357
commit
8896ebb9a9
@ -144,6 +144,12 @@ class PretrainedConfig(PushToHubMixin):
|
||||
top_p (`float`, *optional*, defaults to 1):
|
||||
Value that will be used by default in the `generate` method of the model for `top_p`. If set to float < 1,
|
||||
only the most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
|
||||
typical_p (`float`, *optional*, defaults to 1):
|
||||
Local typicality measures how similar the conditional probability of predicting a target token next is to
|
||||
the expected conditional probability of predicting a random token next, given the partial text already
|
||||
generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
|
||||
add up to `typical_p` or higher are kept for generation. See [this
|
||||
paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
|
||||
repetition_penalty (`float`, *optional*, defaults to 1):
|
||||
Parameter for repetition penalty that will be used by default in the `generate` method of the model. 1.0
|
||||
means no penalty.
|
||||
|
@ -111,8 +111,11 @@ class GenerationConfig(PushToHubMixin):
|
||||
If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
|
||||
`top_p` or higher are kept for generation.
|
||||
typical_p (`float`, *optional*, defaults to 1.0):
|
||||
The amount of probability mass from the original distribution to be considered in typical decoding. If set
|
||||
to 1.0 it takes no effect. See [this paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
|
||||
Local typicality measures how similar the conditional probability of predicting a target token next is to
|
||||
the expected conditional probability of predicting a random token next, given the partial text already
|
||||
generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
|
||||
add up to `typical_p` or higher are kept for generation. See [this
|
||||
paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
|
||||
diversity_penalty (`float`, *optional*, defaults to 0.0):
|
||||
This value is subtracted from a beam's score if it generates a token same as any beam from other group at a
|
||||
particular time. Note that `diversity_penalty` is only effective if `group beam search` is enabled.
|
||||
|
Loading…
Reference in New Issue
Block a user