Clarify and add missing typical_p argument docstring. (#21095)

* Clarify and add missing typical_p docstring.

* Make the docstring easier to understand.

* Clarify typical_p docstring

Accept the suggestion by @stevhliu for paraphrasing the docstring.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Use the same docstring as in GenerationConfig

Follow the suggestion suggested by @stevhliu in the pull request conversation.

* Fix docstring spacing.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
Sherman Siu 2023-01-17 09:23:47 -05:00 committed by GitHub
parent f30bcd5357
commit 8896ebb9a9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 11 additions and 2 deletions

View File

@ -144,6 +144,12 @@ class PretrainedConfig(PushToHubMixin):
top_p (`float`, *optional*, defaults to 1):
Value that will be used by default in the `generate` method of the model for `top_p`. If set to float < 1,
only the most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
typical_p (`float`, *optional*, defaults to 1):
Local typicality measures how similar the conditional probability of predicting a target token next is to
the expected conditional probability of predicting a random token next, given the partial text already
generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
add up to `typical_p` or higher are kept for generation. See [this
paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
repetition_penalty (`float`, *optional*, defaults to 1):
Parameter for repetition penalty that will be used by default in the `generate` method of the model. 1.0
means no penalty.

View File

@ -111,8 +111,11 @@ class GenerationConfig(PushToHubMixin):
If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
`top_p` or higher are kept for generation.
typical_p (`float`, *optional*, defaults to 1.0):
The amount of probability mass from the original distribution to be considered in typical decoding. If set
to 1.0 it takes no effect. See [this paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
Local typicality measures how similar the conditional probability of predicting a target token next is to
the expected conditional probability of predicting a random token next, given the partial text already
generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
add up to `typical_p` or higher are kept for generation. See [this
paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
diversity_penalty (`float`, *optional*, defaults to 0.0):
This value is subtracted from a beam's score if it generates a token same as any beam from other group at a
particular time. Note that `diversity_penalty` is only effective if `group beam search` is enabled.