From 346f1eebbdab244880967133aea7272d9b16f519 Mon Sep 17 00:00:00 2001
From: Anthony Song <38965603+tonyksong@users.noreply.github.com>
Date: Thu, 17 Apr 2025 09:54:44 -0400
Subject: [PATCH] docs: fix typo (#37567)

Co-authored-by: Anthony <anthony.song@capitalone.com>
---
 docs/source/en/perf_train_gpu_many.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/perf_train_gpu_many.md b/docs/source/en/perf_train_gpu_many.md
index 347db1a3f0b..3dd0845e671 100644
--- a/docs/source/en/perf_train_gpu_many.md
+++ b/docs/source/en/perf_train_gpu_many.md
@@ -111,7 +111,7 @@ This approach optimizes parallel data processing by reducing idle GPU utilizatio
 
 Data, pipeline and model parallelism combine to form [3D parallelism](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/) to optimize memory and compute efficiency.
 
-Memory effiiciency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.
+Memory efficiency is achieved by splitting the model across GPUs and also dividing it into stages to create a pipeline. This allows GPUs to work in parallel on micro-batches of data, reducing the memory usage of the model, optimizer, and activations.
 
 Compute efficiency is enabled by ZeRO data parallelism where each GPU only stores a slice of the model, optimizer, and activations. This allows higher communication bandwidth between data parallel nodes because communication can occur independently or in parallel with the other pipeline stages.