Clarify and add missing typical_p argument docstring. (#21095)

* Clarify and add missing typical_p docstring. * Make the docstring easier to understand. * Clarify typical_p docstring Accept the suggestion by @stevhliu for paraphrasing the docstring. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Use the same docstring as in GenerationConfig Follow the suggestion suggested by @stevhliu in the pull request conversation. * Fix docstring spacing. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-03 03:31:05 +06:00 · 2023-01-17 09:23:47 -05:00 · 2023-01-17 09:23:47 -05:00 · 8896ebb9a9
commit 8896ebb9a9
parent f30bcd5357
2 changed files with 11 additions and 2 deletions
--- a/src/transformers/configuration_utils.py
+++ b/src/transformers/configuration_utils.py
@ -144,6 +144,12 @@ class PretrainedConfig(PushToHubMixin):
        top_p (`float`, *optional*, defaults to 1):
            Value that will be used by default in the `generate` method of the model for `top_p`. If set to float < 1,
            only the most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
+        typical_p (`float`, *optional*, defaults to 1):
+            Local typicality measures how similar the conditional probability of predicting a target token next is to
+            the expected conditional probability of predicting a random token next, given the partial text already
+            generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
+            add up to `typical_p` or higher are kept for generation. See [this
+            paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
        repetition_penalty (`float`, *optional*, defaults to 1):
            Parameter for repetition penalty that will be used by default in the `generate` method of the model. 1.0
            means no penalty.
--- a/src/transformers/generation/configuration_utils.py
+++ b/src/transformers/generation/configuration_utils.py
@ -111,8 +111,11 @@ class GenerationConfig(PushToHubMixin):
            If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
            `top_p` or higher are kept for generation.
        typical_p (`float`, *optional*, defaults to 1.0):
-            The amount of probability mass from the original distribution to be considered in typical decoding. If set
-            to 1.0 it takes no effect. See [this paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
+            Local typicality measures how similar the conditional probability of predicting a target token next is to
+            the expected conditional probability of predicting a random token next, given the partial text already
+            generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
+            add up to `typical_p` or higher are kept for generation. See [this
+            paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
        diversity_penalty (`float`, *optional*, defaults to 0.0):
            This value is subtracted from a beam's score if it generates a token same as any beam from other group at a
            particular time. Note that `diversity_penalty` is only effective if `group beam search` is enabled.