Docs: fix indentation in HammingDiversityLogitsProcessor (#25756)

2025-08-03 03:31:05 +06:00 · 2023-08-25 14:56:39 +01:00 · 2023-08-25 14:56:39 +01:00 · 8b0a7bfcdc
commit 8b0a7bfcdc
parent 35c570c80e
1 changed files with 80 additions and 129 deletions
--- a/src/transformers/generation/logits_process.py
+++ b/src/transformers/generation/logits_process.py
@ -1074,9 +1074,9 @@ class HammingDiversityLogitsProcessor(LogitsProcessor):
    <Tip>
-        Diverse beam search can be particularly useful in scenarios where a variety of different outputs is desired,
+    Diverse beam search can be particularly useful in scenarios where a variety of different outputs is desired, rather
-        rather than multiple similar sequences. It allows the model to explore different generation paths and provides
+    than multiple similar sequences. It allows the model to explore different generation paths and provides a broader
-        a broader coverage of possible outputs.
+    coverage of possible outputs.
    </Tip>
@ -1087,9 +1087,8 @@ class HammingDiversityLogitsProcessor(LogitsProcessor):
    </Tip>
    Traditional beam search often generates very similar sequences across different beams.
-
+    `HammingDiversityLogitsProcessor` addresses this by penalizing beams that generate tokens already chosen by other
-        The `HammingDiversityLogitsProcessor` addresses this by penalizing beams that generate tokens already chosen by
+    beams in the same time step.
        other beams in the same time step.
    How It Works:
    - **Grouping Beams**: Beams are divided into groups. Each group selects tokens independently of the others.
@ -1104,48 +1103,30 @@ class HammingDiversityLogitsProcessor(LogitsProcessor):
    Args:
        diversity_penalty (`float`):
-                This value is subtracted from a beam's score if it generates a token same as any beam from other group
+            This value is subtracted from a beam's score if it generates a token same as any beam from other group at a
-                at a particular time. Note that `diversity_penalty` is only effective if `group beam search` is
+            particular time. Note that `diversity_penalty` is only effective if group beam search is enabled. The
-                enabled.
+            penalty applied to a beam's score when it generates a token that has already been chosen by another beam
-                        - The penalty applied to a beam's score when it generates a token that has already been chosen
+            within the same group during the same time step. A higher `diversity_penalty` will enforce greater
-                                by another beam within the same group during the same time step.
+            diversity among the beams, making it less likely for multiple beams to choose the same token. Conversely, a
-                        - A higher `diversity_penalty` will enforce greater diversity among the beams,
+            lower penalty will allow beams to more freely choose similar tokens. Adjusting this value can help strike a
-                                making it less likely for multiple beams to choose the same token.
+            balance between diversity and natural likelihood.
                        - Conversely, a lower penalty will allow beams to more freely choose similar tokens. --
                          Adjusting
                        this value can help strike a balance between diversity and natural likelihood.
        num_beams (`int`):
-                Number of beams used for group beam search. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for
+            Number of beams used for group beam search. Beam search is a method used that maintains beams (or "multiple
-                more details.
+            hypotheses") at each step, expanding each one and keeping the top-scoring sequences. A higher `num_beams`
-                        - Beam search is a method used that maintains beams (or "multiple hypotheses") at each step,
+            will explore more potential sequences. This can increase chances of finding a high-quality output but also
-                        expanding each one and keeping the top-scoring sequences.
+            increases computational cost.
                        - A higher `num_beams` will explore more potential sequences
                        This can increase chances of finding a high-quality output but also increases computational
                        cost.
        num_beam_groups (`int`):
-                Number of groups to divide `num_beams` into in order to ensure diversity among different groups of
+            Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
-                beams. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
+            Each group of beams will operate independently, selecting tokens without considering the choices of other
-                        - Each group of beams will operate independently, selecting tokens without considering the
+            groups. This division promotes diversity by ensuring that beams within different groups explore different
-                          choices of other groups.
+            paths. For instance, if `num_beams` is 6 and `num_beam_groups` is 2, there will be 2 groups each containing
-                        - This division promotes diversity by ensuring that beams within different groups explore
+            3 beams. The choice of `num_beam_groups` should be made considering the desired level of output diversity
-                          different paths.
+            and the total number of beams. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
                        - For instance, if `num_beams` is 6 and `num_beam_groups` is 2, there will be 2 groups each
                          containing 3 beams.
                        - The choice of `num_beam_groups` should be made considering the desired level of output
                          diversity and the total number of beams.
-
+    Examples:
        Example: the below example shows a comparison before and after applying Hamming Diversity.
    ```python
-                >>> from transformers import (
+    >>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
                ...     AutoTokenizer,
                ...     AutoModelForSeq2SeqLM,
                ...     LogitsProcessorList,
                ...     MinLengthLogitsProcessor,
                ...     HammingDiversityLogitsProcessor,
                ...     BeamSearchScorer,
                ... )
    >>> import torch
    >>> # Initialize the model and tokenizer
@ -1154,67 +1135,37 @@ class HammingDiversityLogitsProcessor(LogitsProcessor):
    >>> # A long text about the solar system
    >>> text = "The Solar System is a gravitationally bound system comprising the Sun and the objects that orbit it, either directly or indirectly. Of the objects that orbit the Sun directly, the largest are the eight planets, with the remainder being smaller objects, such as the five dwarf planets and small Solar System bodies. The Solar System formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud."
    >>> inputs = tokenizer("summarize: " + text, return_tensors="pt")
-                >>> encoder_input_str = "summarize: " + text
+    >>> # Generate diverse summary
-                >>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
+    >>> outputs_diverse = model.generate(
-
+    ...     **inputs,
-                >>> # Set up for diverse beam search
+    ...     num_beam_groups=2,
-                >>> num_beams = 6
+    ...     diversity_penalty=10.0,
-                >>> num_beam_groups = 2
+    ...     max_length=100,
-
+    ...     num_beams=4,
-                >>> model_kwargs = {
+    ...     num_return_sequences=2,
                ...     "encoder_outputs": model.get_encoder()(
                ...         encoder_input_ids.repeat_interleave(num_beams, dim=0), return_dict=True
                ...     )
                ... }
                >>> beam_scorer = BeamSearchScorer(
                ...     batch_size=1,
                ...     max_length=model.config.max_length,
                ...     num_beams=num_beams,
                ...     device=model.device,
                ...     num_beam_groups=num_beam_groups,
                ... )
                >>> # Initialize the diversity logits processor
                >>> # set the logits processor list, note that `HammingDiversityLogitsProcessor` is effective only if `group beam search` is enabled
                >>> logits_processor_diverse = LogitsProcessorList(
                ...     [
                ...         HammingDiversityLogitsProcessor(5.5, num_beams=num_beams, num_beam_groups=num_beam_groups),
                ...         MinLengthLogitsProcessor(10, eos_token_id=model.config.eos_token_id),
                ...     ]
                ... )
                >>> # generate the diverse summary using group_beam_search
                >>> outputs_diverse = model.group_beam_search(
                ...     encoder_input_ids.repeat_interleave(num_beams, dim=0),
                ...     beam_scorer,
                ...     logits_processor=logits_processor_diverse,
                ...     **model_kwargs,
    ... )
    >>> summaries_diverse = tokenizer.batch_decode(outputs_diverse, skip_special_tokens=True)
    >>> # Generate non-diverse summary
    >>> outputs_non_diverse = model.generate(
-                ...     encoder_input_ids,
+    ...     **inputs,
    ...     max_length=100,
-                ...     num_beams=num_beams,
+    ...     num_beams=4,
-                ...     no_repeat_ngram_size=2,
+    ...     num_return_sequences=2,
                ...     early_stopping=True,
    ... )
    >>> summary_non_diverse = tokenizer.batch_decode(outputs_non_diverse, skip_special_tokens=True)
-                >>> # Decode and print the summaries
+    >>> # With `diversity_penalty`, the resulting beams are much more diverse
                >>> summaries_diverse = tokenizer.batch_decode(outputs_diverse, skip_special_tokens=True)
                >>> summary_non_diverse = tokenizer.decode(outputs_non_diverse[0], skip_special_tokens=True)
                >>> print("Diverse Summary:")
                >>> print(summaries_diverse[0])
                >>> # The Solar System is a gravitationally bound system comprising the Sun and the objects that orbit it, either directly or indirectly. Of the objects that orbit the Sun directly, the largest are the eight planets, with the remainder being smaller objects, such as the five dwarf planets and small Solar System bodies. The Solar System formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud.
                >>> print("\nNon-Diverse Summary:")
    >>> print(summary_non_diverse)
-                >>> # The Sun and the objects that orbit it directly are the eight planets, with the remainder being smaller objects, such as the five dwarf worlds and small Solar System bodies. It formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud.    ```
+    ['the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets.',
    'the Solar System formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets.']
    >>> print(summaries_diverse)
    ['the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets.',
    'the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets. the rest of the objects are smaller objects, such as the five dwarf planets and small solar system bodies.']
    ```
                   For more details, see [Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence
                   Models](https://arxiv.org/pdf/1610.02424.pdf).
    """
    def __init__(self, diversity_penalty: float, num_beams: int, num_beam_groups: int):