Fix max_steps documentation regarding the end-of-training condition (#27624)

* fix max_steps doc * Update src/transformers/training_args.py [ci skip] Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * propagate suggested change --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-27 16:22:23 +06:00 · 2023-11-22 12:10:11 +01:00 · 2023-11-22 12:10:11 +01:00 · b2c63c79c3
commit b2c63c79c3
parent c651eb23c3
2 changed files with 10 additions and 8 deletions
--- a/src/transformers/training_args.py
+++ b/src/transformers/training_args.py
@ -234,8 +234,8 @@ class TrainingArguments:
            the last epoch before stopping training).
        max_steps (`int`, *optional*, defaults to -1):
            If set to a positive number, the total number of training steps to perform. Overrides `num_train_epochs`.
-            In case of using a finite iterable dataset the training may stop before reaching the set number of steps
+            For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until
-            when all data is exhausted
+            `max_steps` is reached.
        lr_scheduler_type (`str` or [`SchedulerType`], *optional*, defaults to `"linear"`):
            The scheduler type to use. See the documentation of [`SchedulerType`] for all possible values.
        lr_scheduler_kwargs ('dict', *optional*, defaults to {}):
@ -2181,9 +2181,9 @@ class TrainingArguments:
                Total number of training epochs to perform (if not an integer, will perform the decimal part percents
                of the last epoch before stopping training).
            max_steps (`int`, *optional*, defaults to -1):
-                If set to a positive number, the total number of training steps to perform. Overrides
+                If set to a positive number, the total number of training steps to perform. Overrides `num_train_epochs`.
-                `num_train_epochs`. In case of using a finite iterable dataset the training may stop before reaching
+                For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until
-                the set number of steps when all data is exhausted.
+                `max_steps` is reached.
            gradient_accumulation_steps (`int`, *optional*, defaults to 1):
                Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
@ -2588,9 +2588,9 @@ class TrainingArguments:
                Total number of training epochs to perform (if not an integer, will perform the decimal part percents
                of the last epoch before stopping training).
            max_steps (`int`, *optional*, defaults to -1):
-                If set to a positive number, the total number of training steps to perform. Overrides
+                If set to a positive number, the total number of training steps to perform. Overrides `num_train_epochs`.
-                `num_train_epochs`. In case of using a finite iterable dataset the training may stop before reaching
+                For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until
-                the set number of steps when all data is exhausted.
+                `max_steps` is reached.
            warmup_ratio (`float`, *optional*, defaults to 0.0):
                Ratio of total training steps used for a linear warmup from 0 to `learning_rate`.
            warmup_steps (`int`, *optional*, defaults to 0):
--- a/src/transformers/training_args_tf.py
+++ b/src/transformers/training_args_tf.py
@ -92,6 +92,8 @@ class TFTrainingArguments(TrainingArguments):
            Total number of training epochs to perform.
        max_steps (`int`, *optional*, defaults to -1):
            If set to a positive number, the total number of training steps to perform. Overrides `num_train_epochs`.
            For a finite dataset, training is reiterated through the dataset (if all data is exhausted) until
            `max_steps` is reached.
        warmup_ratio (`float`, *optional*, defaults to 0.0):
            Ratio of total training steps used for a linear warmup from 0 to `learning_rate`.
        warmup_steps (`int`, *optional*, defaults to 0):