mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
docs: fix typo (#37567)
Co-authored-by: Anthony <anthony.song@capitalone.com>
This commit is contained in:
parent
48dd89cf55
commit
346f1eebbd
@ -111,7 +111,7 @@ This approach optimizes parallel data processing by reducing idle GPU utilizatio
|
||||
|
||||
Data, pipeline and model parallelism combine to form [3D parallelism](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/) to optimize memory and compute efficiency.
|
||||
|
||||
Memory effiiciency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.
|
||||
Memory efficiency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.
|
||||
|
||||
Compute efficiency is enabled by ZeRO data parallelism where each GPU only stores a slice of the model, optimizer, and activations. This allows higher communication bandwidth between data parallel nodes because communication can occur independently or in parallel with the other pipeline stages.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user