docs: fix typo (#37567)

Co-authored-by: Anthony <anthony.song@capitalone.com>
This commit is contained in:
Anthony Song 2025-04-17 09:54:44 -04:00 committed by GitHub
parent 48dd89cf55
commit 346f1eebbd
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -111,7 +111,7 @@ This approach optimizes parallel data processing by reducing idle GPU utilizatio
Data, pipeline and model parallelism combine to form [3D parallelism](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/) to optimize memory and compute efficiency.
Memory effiiciency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.
Memory efficiency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.
Compute efficiency is enabled by ZeRO data parallelism where each GPU only stores a slice of the model, optimizer, and activations. This allows higher communication bandwidth between data parallel nodes because communication can occur independently or in parallel with the other pipeline stages.