docs: fix typo (#37567)

Co-authored-by: Anthony <anthony.song@capitalone.com>
2025-07-03 12:50:06 +06:00 · 2025-04-17 09:54:44 -04:00 · 2025-04-17 09:54:44 -04:00 · 346f1eebbd
commit 346f1eebbd
parent 48dd89cf55
1 changed files with 1 additions and 1 deletions
--- a/docs/source/en/perf_train_gpu_many.md
+++ b/docs/source/en/perf_train_gpu_many.md
@ -111,7 +111,7 @@ This approach optimizes parallel data processing by reducing idle GPU utilizatio

 Data, pipeline and model parallelism combine to form [3D parallelism](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/) to optimize memory and compute efficiency.

-Memory effiiciency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.
+Memory efficiency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.

 Compute efficiency is enabled by ZeRO data parallelism where each GPU only stores a slice of the model, optimizer, and activations. This allows higher communication bandwidth between data parallel nodes because communication can occur independently or in parallel with the other pipeline stages.