mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 10:12:23 +06:00
parent
269c9638df
commit
0311ba2153
@ -355,9 +355,9 @@ Notes:
|
||||
able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
|
||||
significantly shorter training time.
|
||||
|
||||
3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp zero_dp_3`
|
||||
to the command line arguments, and make sure you have added the distributed launcher ``-m torch.distributed.launch
|
||||
--nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
|
||||
3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp
|
||||
zero_dp_3`` to the command line arguments, and make sure you have added the distributed launcher ``-m
|
||||
torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
|
||||
|
||||
For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user