From 346f1eebbdab244880967133aea7272d9b16f519 Mon Sep 17 00:00:00 2001 From: Anthony Song <38965603+tonyksong@users.noreply.github.com> Date: Thu, 17 Apr 2025 09:54:44 -0400 Subject: [PATCH] docs: fix typo (#37567) Co-authored-by: Anthony --- docs/source/en/perf_train_gpu_many.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/perf_train_gpu_many.md b/docs/source/en/perf_train_gpu_many.md index 347db1a3f0b..3dd0845e671 100644 --- a/docs/source/en/perf_train_gpu_many.md +++ b/docs/source/en/perf_train_gpu_many.md @@ -111,7 +111,7 @@ This approach optimizes parallel data processing by reducing idle GPU utilizatio Data, pipeline and model parallelism combine to form [3D parallelism](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/) to optimize memory and compute efficiency. -Memory effiiciency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations. +Memory efficiency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations. Compute efficiency is enabled by ZeRO data parallelism where each GPU only stores a slice of the model, optimizer, and activations. This allows higher communication bandwidth between data parallel nodes because communication can occur independently or in parallel with the other pipeline stages.