[examples] document resuming (#10776)

* document resuming in examples * fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * put trainer code last, adjust notes Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2025-07-31 02:02:21 +06:00 · 2021-03-17 12:48:35 -07:00 · 2021-03-17 12:48:35 -07:00 · 393739194e
commit 393739194e
parent 85a114ef47
1 changed files with 17 additions and 2 deletions
--- a/examples/README.md
+++ b/examples/README.md
@ -95,6 +95,21 @@ Coming soon!
 | [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq)                       | WMT             | ✅  | - | - | -


+
+## Resuming training
+
+You can resume training from a previous checkpoint like this:
+
+1. Pass `--output_dir previous_output_dir` without `--overwrite_output_dir` to resume training from the latest checkpoint in `output_dir` (what you would use if the training was interrupted, for instance).
+2. Pass `--model_name_or_path path_to_a_specific_checkpoint` to resume training from that checkpoint folder.
+
+Should you want to turn an example into a notebook where you'd no longer have access to the command
+line, 🤗 Trainer supports resuming from a checkpoint via `trainer.train(resume_from_checkpoint)`.
+
+1. If `resume_from_checkpoint` is `True` it will look for the last checkpoint in the value of `output_dir` passed via `TrainingArguments`.
+2. If `resume_from_checkpoint` is a path to a specific checkpoint it will use that saved checkpoint folder to resume the training from.
+
+
 ## Distributed training and mixed precision

 All the PyTorch scripts mentioned above work out of the box with distributed training and mixed precision, thanks to
@ -104,7 +119,7 @@ use the following command:
 ```bash
 python -m torch.distributed.launch \
    --nproc_per_node number_of_gpu_you_have path_to_script.py \
-	--all_arguments_of_the_script 
+	--all_arguments_of_the_script
 ```

 As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
@ -148,7 +163,7 @@ regular training script with its arguments (this is similar to the `torch.distri
 ```bash
 python xla_spawn.py --num_cores num_tpu_you_have \
    path_to_script.py \
-	--all_arguments_of_the_script 
+	--all_arguments_of_the_script
 ```

 As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text