mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
parent
c772bff31a
commit
9dc1efa5d4
2
.github/workflows/self-scheduled.yml
vendored
2
.github/workflows/self-scheduled.yml
vendored
@ -366,7 +366,7 @@ jobs:
|
||||
run: |
|
||||
python3 -m pip uninstall -y deepspeed
|
||||
rm -rf DeepSpeed
|
||||
git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build
|
||||
git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build
|
||||
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
|
@ -48,8 +48,8 @@ RUN python3 -m pip uninstall -y torch-tensorrt apex
|
||||
# Pre-build **nightly** release of DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
|
||||
RUN python3 -m pip uninstall -y deepspeed
|
||||
# This has to be run inside the GPU VMs running the tests. (So far, it fails here due to GPU checks during compilation.)
|
||||
# Issue: https://github.com/microsoft/DeepSpeed/issues/2010
|
||||
# RUN git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build && \
|
||||
# Issue: https://github.com/deepspeedai/DeepSpeed/issues/2010
|
||||
# RUN git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build && \
|
||||
# DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
|
||||
|
||||
RUN python3 -m pip install -U "itsdangerous<2.1.0"
|
||||
|
@ -34,8 +34,8 @@ RUN python3 -m pip uninstall -y torch-tensorrt apex
|
||||
# Pre-build **nightly** release of DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
|
||||
RUN python3 -m pip uninstall -y deepspeed
|
||||
# This has to be run inside the GPU VMs running the tests. (So far, it fails here due to GPU checks during compilation.)
|
||||
# Issue: https://github.com/microsoft/DeepSpeed/issues/2010
|
||||
# RUN git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build && \
|
||||
# Issue: https://github.com/deepspeedai/DeepSpeed/issues/2010
|
||||
# RUN git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build && \
|
||||
# DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
|
||||
|
||||
## For `torchdynamo` tests
|
||||
|
@ -30,7 +30,7 @@ DeepSpeed compiles CUDA C++ code and it can be a potential source of errors when
|
||||
|
||||
<Tip>
|
||||
|
||||
For any other installation issues, please [open an issue](https://github.com/microsoft/DeepSpeed/issues) with the DeepSpeed team.
|
||||
For any other installation issues, please [open an issue](https://github.com/deepspeedai/DeepSpeed/issues) with the DeepSpeed team.
|
||||
|
||||
</Tip>
|
||||
|
||||
@ -89,7 +89,7 @@ sudo ln -s /usr/bin/g++-7 /usr/local/cuda-10.2/bin/g++
|
||||
If you're still having issues with installing DeepSpeed or if you're building DeepSpeed at run time, you can try to prebuild the DeepSpeed modules before installing them. To make a local build for DeepSpeed:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
git clone https://github.com/deepspeedai/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install . \
|
||||
@ -141,7 +141,7 @@ It is also possible to not specify `TORCH_CUDA_ARCH_LIST` and the build program
|
||||
For training on multiple machines with the same setup, you'll need to make a binary wheel:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
git clone https://github.com/deepspeedai/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 \
|
||||
|
@ -28,7 +28,7 @@ This guide will walk you through how to deploy DeepSpeed training, the features
|
||||
|
||||
## Installation
|
||||
|
||||
DeepSpeed is available to install from PyPI or Transformers (for more detailed installation options, take a look at the DeepSpeed [installation details](https://www.deepspeed.ai/tutorials/advanced-install/) or the GitHub [README](https://github.com/microsoft/deepspeed#installation)).
|
||||
DeepSpeed is available to install from PyPI or Transformers (for more detailed installation options, take a look at the DeepSpeed [installation details](https://www.deepspeed.ai/tutorials/advanced-install/) or the GitHub [README](https://github.com/deepspeedai/DeepSpeed#installation)).
|
||||
|
||||
<Tip>
|
||||
|
||||
@ -114,10 +114,10 @@ DeepSpeed works with the [`Trainer`] class by way of a config file containing al
|
||||
|
||||
<Tip>
|
||||
|
||||
Find a complete list of DeepSpeed configuration options on the [DeepSpeed Configuration JSON](https://www.deepspeed.ai/docs/config-json/) reference. You can also find more practical examples of various DeepSpeed configuration examples on the [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples) repository or the main [DeepSpeed](https://github.com/microsoft/DeepSpeed) repository. To quickly find specific examples, you can:
|
||||
Find a complete list of DeepSpeed configuration options on the [DeepSpeed Configuration JSON](https://www.deepspeed.ai/docs/config-json/) reference. You can also find more practical examples of various DeepSpeed configuration examples on the [DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples) repository or the main [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) repository. To quickly find specific examples, you can:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeedExamples
|
||||
git clone https://github.com/deepspeedai/DeepSpeedExamples
|
||||
cd DeepSpeedExamples
|
||||
find . -name '*json'
|
||||
# find examples with the Lamb optimizer
|
||||
@ -303,7 +303,7 @@ For more information about initializing large models with ZeRO-3 and accessing t
|
||||
|
||||
[ZeRO-Infinity](https://hf.co/papers/2104.07857) allows offloading model states to the CPU and/or NVMe to save even more memory. Smart partitioning and tiling algorithms allow each GPU to send and receive very small amounts of data during offloading such that a modern NVMe can fit an even larger total memory pool than is available to your training process. ZeRO-Infinity requires ZeRO-3.
|
||||
|
||||
Depending on the CPU and/or NVMe memory available, you can offload both the [optimizer states](https://www.deepspeed.ai/docs/config-json/#optimizer-offloading) and [parameters](https://www.deepspeed.ai/docs/config-json/#parameter-offloading), just one of them, or none. You should also make sure the `nvme_path` is pointing to an NVMe device, because while it still works with a normal hard drive or solid state drive, it'll be significantly slower. With a modern NVMe, you can expect peak transfer speeds of ~3.5GB/s for read and ~3GB/s for write operations. Lastly, [run a benchmark](https://github.com/microsoft/DeepSpeed/issues/998) on your training setup to determine the optimal `aio` configuration.
|
||||
Depending on the CPU and/or NVMe memory available, you can offload both the [optimizer states](https://www.deepspeed.ai/docs/config-json/#optimizer-offloading) and [parameters](https://www.deepspeed.ai/docs/config-json/#parameter-offloading), just one of them, or none. You should also make sure the `nvme_path` is pointing to an NVMe device, because while it still works with a normal hard drive or solid state drive, it'll be significantly slower. With a modern NVMe, you can expect peak transfer speeds of ~3.5GB/s for read and ~3GB/s for write operations. Lastly, [run a benchmark](https://github.com/deepspeedai/DeepSpeed/issues/998) on your training setup to determine the optimal `aio` configuration.
|
||||
|
||||
The example ZeRO-3/Infinity configuration file below sets most of the parameter values to `auto`, but you could also manually add these values.
|
||||
|
||||
@ -1157,7 +1157,7 @@ For Transformers>=4.28, if `synced_gpus` is automatically set to `True` if multi
|
||||
|
||||
## Troubleshoot
|
||||
|
||||
When you encounter an issue, you should consider whether DeepSpeed is the cause of the problem because often it isn't (unless it's super obviously and you can see DeepSpeed modules in the exception)! The first step should be to retry your setup without DeepSpeed, and if the problem persists, then you can report the issue. If the issue is a core DeepSpeed problem and unrelated to the Transformers integration, open an Issue on the [DeepSpeed repository](https://github.com/microsoft/DeepSpeed).
|
||||
When you encounter an issue, you should consider whether DeepSpeed is the cause of the problem because often it isn't (unless it's super obviously and you can see DeepSpeed modules in the exception)! The first step should be to retry your setup without DeepSpeed, and if the problem persists, then you can report the issue. If the issue is a core DeepSpeed problem and unrelated to the Transformers integration, open an Issue on the [DeepSpeed repository](https://github.com/deepspeedai/DeepSpeed).
|
||||
|
||||
For issues related to the Transformers integration, please provide the following information:
|
||||
|
||||
@ -1227,7 +1227,7 @@ This means the DeepSpeed loss scaler is unable to find a scaling coefficient to
|
||||
|
||||
## Resources
|
||||
|
||||
DeepSpeed ZeRO is a powerful technology for training and loading very large models for inference with limited GPU resources, making it more accessible to everyone. To learn more about DeepSpeed, feel free to read the [blog posts](https://www.microsoft.com/en-us/research/search/?q=deepspeed), [documentation](https://www.deepspeed.ai/getting-started/), and [GitHub repository](https://github.com/microsoft/deepspeed).
|
||||
DeepSpeed ZeRO is a powerful technology for training and loading very large models for inference with limited GPU resources, making it more accessible to everyone. To learn more about DeepSpeed, feel free to read the [blog posts](https://www.microsoft.com/en-us/research/search/?q=deepspeed), [documentation](https://www.deepspeed.ai/getting-started/), and [GitHub repository](https://github.com/deepspeedai/DeepSpeed).
|
||||
|
||||
The following papers are also a great resource for learning more about ZeRO:
|
||||
|
||||
|
@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
# DeepSpeed
|
||||
|
||||
[DeepSpeed](https://github.com/microsoft/DeepSpeed), powered by Zero Redundancy Optimizer (ZeRO), is an optimization library for training and fitting very large models onto a GPU. It is available in several ZeRO stages, where each stage progressively saves more GPU memory by partitioning the optimizer state, gradients, parameters, and enabling offloading to a CPU or NVMe. DeepSpeed is integrated with the [`Trainer`] class and most of the setup is automatically taken care of for you.
|
||||
[DeepSpeed](https://github.com/deepspeedai/DeepSpeed), powered by Zero Redundancy Optimizer (ZeRO), is an optimization library for training and fitting very large models onto a GPU. It is available in several ZeRO stages, where each stage progressively saves more GPU memory by partitioning the optimizer state, gradients, parameters, and enabling offloading to a CPU or NVMe. DeepSpeed is integrated with the [`Trainer`] class and most of the setup is automatically taken care of for you.
|
||||
|
||||
However, if you want to use DeepSpeed without the [`Trainer`], Transformers provides a [`HfDeepSpeedConfig`] class.
|
||||
|
||||
|
@ -476,7 +476,7 @@ And GPU1 does the same by enlisting GPU3 to its aid.
|
||||
Since each dimension requires at least 2 GPUs, here you'd need at least 4 GPUs.
|
||||
|
||||
Implementations:
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed)
|
||||
- [DeepSpeed](https://github.com/deepspeedai/DeepSpeed)
|
||||
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
||||
- [Varuna](https://github.com/microsoft/varuna)
|
||||
- [SageMaker](https://arxiv.org/abs/2111.05972)
|
||||
@ -497,7 +497,7 @@ This diagram is from a blog post [3D parallelism: Scaling to trillion-parameter
|
||||
Since each dimension requires at least 2 GPUs, here you'd need at least 8 GPUs.
|
||||
|
||||
Implementations:
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed) - DeepSpeed also includes an even more efficient DP, which they call ZeRO-DP.
|
||||
- [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) - DeepSpeed also includes an even more efficient DP, which they call ZeRO-DP.
|
||||
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
||||
- [Varuna](https://github.com/microsoft/varuna)
|
||||
- [SageMaker](https://arxiv.org/abs/2111.05972)
|
||||
|
@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
# DeepSpeed Integration
|
||||
|
||||
[DeepSpeed](https://github.com/microsoft/DeepSpeed) は、[ZeRO 論文](https://arxiv.org/abs/1910.02054) で説明されているすべてを実装します。現在、次のものを完全にサポートしています。
|
||||
[DeepSpeed](https://github.com/deepspeedai/DeepSpeed) は、[ZeRO 論文](https://arxiv.org/abs/1910.02054) で説明されているすべてを実装します。現在、次のものを完全にサポートしています。
|
||||
|
||||
1. オプティマイザーの状態分割 (ZeRO ステージ 1)
|
||||
2. 勾配分割 (ZeRO ステージ 2)
|
||||
@ -32,7 +32,7 @@ DeepSpeed ZeRO-2 は、その機能が推論には役に立たないため、主
|
||||
DeepSpeed ZeRO-3 は、巨大なモデルを複数の GPU にロードできるため、推論にも使用できます。
|
||||
単一の GPU では不可能です。
|
||||
|
||||
🤗 Transformers は、2 つのオプションを介して [DeepSpeed](https://github.com/microsoft/DeepSpeed) を統合します。
|
||||
🤗 Transformers は、2 つのオプションを介して [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) を統合します。
|
||||
|
||||
1. [`Trainer`] によるコア DeepSpeed 機能の統合。何でもやってくれるタイプです
|
||||
統合の場合 - カスタム構成ファイルを指定するか、テンプレートを使用するだけで、他に何もする必要はありません。たいていの
|
||||
@ -78,7 +78,7 @@ pip install deepspeed
|
||||
pip install transformers[deepspeed]
|
||||
```
|
||||
|
||||
または、[DeepSpeed の GitHub ページ](https://github.com/microsoft/deepspeed#installation) で詳細を確認してください。
|
||||
または、[DeepSpeed の GitHub ページ](https://github.com/deepspeedai/DeepSpeed#installation) で詳細を確認してください。
|
||||
[高度なインストール](https://www.deepspeed.ai/tutorials/advanced-install/)。
|
||||
|
||||
それでもビルドに苦労する場合は、まず [CUDA 拡張機能のインストール ノート](trainer#cuda-extension-installation-notes) を必ず読んでください。
|
||||
@ -89,7 +89,7 @@ pip install transformers[deepspeed]
|
||||
DeepSpeed のローカル ビルドを作成するには:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
git clone https://github.com/deepspeedai/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install . \
|
||||
@ -113,7 +113,7 @@ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.get_device_capa
|
||||
複数のマシンで同じセットアップを使用する必要がある場合は、バイナリ ホイールを作成します。
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
git clone https://github.com/deepspeedai/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 \
|
||||
@ -154,7 +154,7 @@ _CudaDeviceProperties(name='GeForce RTX 3090', major=8, minor=6, total_memory=24
|
||||
目的のアーチを明示的に指定することをお勧めします。
|
||||
|
||||
提案されたことをすべて試してもまだビルドの問題が発生する場合は、GitHub の問題に進んでください。
|
||||
[ディープスピード](https://github.com/microsoft/DeepSpeed/issues)、
|
||||
[ディープスピード](https://github.com/deepspeedai/DeepSpeed/issues)、
|
||||
|
||||
<a id='deepspeed-multi-gpu'></a>
|
||||
|
||||
@ -481,11 +481,11 @@ deepspeed examples/pytorch/translation/run_translation.py ...
|
||||
設定ファイルで使用できる DeepSpeed 設定オプションの完全なガイドについては、次を参照してください。
|
||||
[次のドキュメント](https://www.deepspeed.ai/docs/config-json/) にアクセスしてください。
|
||||
|
||||
さまざまな実際のニーズに対応する数十の DeepSpeed 構成例を [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples)で見つけることができます。
|
||||
さまざまな実際のニーズに対応する数十の DeepSpeed 構成例を [DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples)で見つけることができます。
|
||||
リポジトリ:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeedExamples
|
||||
git clone https://github.com/deepspeedai/DeepSpeedExamples
|
||||
cd DeepSpeedExamples
|
||||
find . -name '*json'
|
||||
```
|
||||
@ -497,7 +497,7 @@ find . -name '*json'
|
||||
grep -i Lamb $(find . -name '*json')
|
||||
```
|
||||
|
||||
さらにいくつかの例が [メイン リポジトリ](https://github.com/microsoft/DeepSpeed) にもあります。
|
||||
さらにいくつかの例が [メイン リポジトリ](https://github.com/deepspeedai/DeepSpeed) にもあります。
|
||||
|
||||
DeepSpeed を使用する場合は、常に DeepSpeed 構成ファイルを指定する必要がありますが、一部の構成パラメータには
|
||||
コマンドライン経由で設定します。微妙な違いについては、このガイドの残りの部分で説明します。
|
||||
@ -868,7 +868,7 @@ ZeRO-Infinity は、GPU と CPU メモリを NVMe メモリで拡張すること
|
||||
書き込みでは、読み取り最大 3.5 GB/秒、書き込み最大 3 GB/秒のピーク速度が得られます)。
|
||||
|
||||
最適な`aio`構成ブロックを見つけるには、ターゲット設定でベンチマークを実行する必要があります。
|
||||
[ここで説明](https://github.com/microsoft/DeepSpeed/issues/998)。
|
||||
[ここで説明](https://github.com/deepspeedai/DeepSpeed/issues/998)。
|
||||
|
||||
<a id='deepspeed-zero2-zero3-performance'></a>
|
||||
|
||||
@ -1934,7 +1934,7 @@ SW: Model with 2783M total params, 65M largest layer params.
|
||||
問題が解決しない場合にのみ、Deepspeed について言及し、必要な詳細をすべて提供してください。
|
||||
|
||||
- 問題が統合部分ではなく DeepSpeed コアにあることが明らかな場合は、問題を提出してください。
|
||||
[Deepspeed](https://github.com/microsoft/DeepSpeed/) を直接使用します。よくわからない場合でも、ご安心ください。
|
||||
[Deepspeed](https://github.com/deepspeedai/DeepSpeed/) を直接使用します。よくわからない場合でも、ご安心ください。
|
||||
どちらの問題トラッカーでも問題ありません。投稿されたらそれを判断し、次の場合は別の問題トラッカーにリダイレクトします。
|
||||
そうである必要がある。
|
||||
|
||||
@ -1994,7 +1994,7 @@ SW: Model with 2783M total params, 65M largest layer params.
|
||||
|
||||
### Notes
|
||||
|
||||
- DeepSpeed には pip でインストール可能な PyPI パッケージがありますが、ハードウェアに最も適合するように、また有効にする必要がある場合は、[ソース](https://github.com/microsoft/deepspeed#installation) からインストールすることを強くお勧めします。
|
||||
- DeepSpeed には pip でインストール可能な PyPI パッケージがありますが、ハードウェアに最も適合するように、また有効にする必要がある場合は、[ソース](https://github.com/deepspeedai/DeepSpeed#installation) からインストールすることを強くお勧めします。
|
||||
1 ビット Adam などの特定の機能は、pypi ディストリビューションでは利用できません。
|
||||
- 🤗 Transformers で DeepSpeed を使用するために [`Trainer`] を使用する必要はありません - 任意のモデルを使用できます
|
||||
後者は [DeepSpeed 統合手順](https://www.deepspeed.ai/getting-started/#writing-deepspeed-models) に従って調整する必要があります。
|
||||
@ -2239,7 +2239,7 @@ RUN_SLOW=1 pytest tests/deepspeed
|
||||
|
||||
## Main DeepSpeed Resources
|
||||
|
||||
- [プロジェクトの github](https://github.com/microsoft/deepspeed)
|
||||
- [プロジェクトの github](https://github.com/deepspeedai/DeepSpeed)
|
||||
- [使用方法ドキュメント](https://www.deepspeed.ai/getting-started/)
|
||||
- [API ドキュメント](https://deepspeed.readthedocs.io/en/latest/index.html)
|
||||
- [ブログ投稿](https://www.microsoft.com/en-us/research/search/?q=deepspeed)
|
||||
@ -2251,4 +2251,4 @@ RUN_SLOW=1 pytest tests/deepspeed
|
||||
- [ZeRO-Infinity: 極限スケールの深層学習のための GPU メモリの壁を打ち破る](https://arxiv.org/abs/2104.07857)
|
||||
|
||||
最後に、HuggingFace [`Trainer`] は DeepSpeed のみを統合していることを覚えておいてください。
|
||||
DeepSpeed の使用に関して問題や質問がある場合は、[DeepSpeed GitHub](https://github.com/microsoft/DeepSpeed/issues) に問題を提出してください。
|
||||
DeepSpeed の使用に関して問題や質問がある場合は、[DeepSpeed GitHub](https://github.com/deepspeedai/DeepSpeed/issues) に問題を提出してください。
|
||||
|
@ -199,7 +199,7 @@ _python_、_numpy_、および _pytorch_ の RNG 状態は、そのチェック
|
||||
torchrun --nproc_per_node=2 trainer-program.py ...
|
||||
```
|
||||
|
||||
[`accelerate`](https://github.com/huggingface/accelerate) または [`deepspeed`](https://github.com/microsoft/DeepSpeed) がインストールされている場合は、次を使用して同じことを達成することもできます。の一つ:
|
||||
[`accelerate`](https://github.com/huggingface/accelerate) または [`deepspeed`](https://github.com/deepspeedai/DeepSpeed) がインストールされている場合は、次を使用して同じことを達成することもできます。の一つ:
|
||||
|
||||
```bash
|
||||
accelerate launch --num_processes 2 trainer-program.py ...
|
||||
@ -291,7 +291,7 @@ export CUDA_VISIBLE_DEVICES=1,0
|
||||
[`Trainer`] は、トレーニングを劇的に改善する可能性のあるライブラリをサポートするように拡張されました。
|
||||
時間とはるかに大きなモデルに適合します。
|
||||
|
||||
現在、サードパーティのソリューション [DeepSpeed](https://github.com/microsoft/DeepSpeed) および [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html) をサポートしています。論文 [ZeRO: メモリの最適化兆パラメータ モデルのトレーニングに向けて、Samyam Rajbhandari、Jeff Rasley、Olatunji Ruwase、Yuxiong He 著](https://arxiv.org/abs/1910.02054)。
|
||||
現在、サードパーティのソリューション [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) および [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html) をサポートしています。論文 [ZeRO: メモリの最適化兆パラメータ モデルのトレーニングに向けて、Samyam Rajbhandari、Jeff Rasley、Olatunji Ruwase、Yuxiong He 著](https://arxiv.org/abs/1910.02054)。
|
||||
|
||||
この提供されるサポートは、この記事の執筆時点では新しくて実験的なものです。 DeepSpeed と PyTorch FSDP のサポートはアクティブであり、それに関する問題は歓迎しますが、FairScale 統合は PyTorch メインに統合されているため、もうサポートしていません ([PyTorch FSDP 統合](#pytorch-fully-sharded-data-parallel))
|
||||
|
||||
@ -301,7 +301,7 @@ export CUDA_VISIBLE_DEVICES=1,0
|
||||
|
||||
この記事の執筆時点では、Deepspeed を使用するには、CUDA C++ コードをコンパイルする必要があります。
|
||||
|
||||
すべてのインストールの問題は、[Deepspeed](https://github.com/microsoft/DeepSpeed/issues) の対応する GitHub の問題を通じて対処する必要がありますが、ビルド中に発生する可能性のある一般的な問題がいくつかあります。
|
||||
すべてのインストールの問題は、[Deepspeed](https://github.com/deepspeedai/DeepSpeed/issues) の対応する GitHub の問題を通じて対処する必要がありますが、ビルド中に発生する可能性のある一般的な問題がいくつかあります。
|
||||
CUDA 拡張機能を構築する必要がある PyTorch 拡張機能。
|
||||
|
||||
したがって、次の操作を実行中に CUDA 関連のビルドの問題が発生した場合は、次のとおりです。
|
||||
|
@ -360,7 +360,7 @@ by [@anton-l](https://github.com/anton-l)。
|
||||
SageMakerは、より効率的な処理のためにTPとDPを組み合わせて使用します。
|
||||
|
||||
代替名:
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed)はこれを「テンソルスライシング」と呼びます。詳細は[DeepSpeedの特徴](https://www.deepspeed.ai/training/#model-parallelism)をご覧ください。
|
||||
- [DeepSpeed](https://github.com/deepspeedai/DeepSpeed)はこれを「テンソルスライシング」と呼びます。詳細は[DeepSpeedの特徴](https://www.deepspeed.ai/training/#model-parallelism)をご覧ください。
|
||||
|
||||
実装例:
|
||||
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)には、モデル固有の内部実装があります。
|
||||
@ -384,7 +384,7 @@ DeepSpeedの[パイプラインチュートリアル](https://www.deepspeed.ai/t
|
||||
各次元には少なくとも2つのGPUが必要ですので、ここでは少なくとも4つのGPUが必要です。
|
||||
|
||||
実装例:
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed)
|
||||
- [DeepSpeed](https://github.com/deepspeedai/DeepSpeed)
|
||||
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
||||
- [Varuna](https://github.com/microsoft/varuna)
|
||||
- [SageMaker](https://arxiv.org/abs/2111.05972)
|
||||
@ -403,7 +403,7 @@ DeepSpeedの[パイプラインチュートリアル](https://www.deepspeed.ai/t
|
||||
各次元には少なくとも2つのGPUが必要ですので、ここでは少なくとも8つのGPUが必要です。
|
||||
|
||||
実装例:
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed) - DeepSpeedには、さらに効率的なDPであるZeRO-DPと呼ばれるものも含まれています。
|
||||
- [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) - DeepSpeedには、さらに効率的なDPであるZeRO-DPと呼ばれるものも含まれています。
|
||||
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
||||
- [Varuna](https://github.com/microsoft/varuna)
|
||||
- [SageMaker](https://arxiv.org/abs/2111.05972)
|
||||
|
@ -28,7 +28,7 @@ GPU가 제한된 환경에서 ZeRO는 최적화 메모리와 계산을 GPU에서
|
||||
|
||||
## 설치[[installation]]
|
||||
|
||||
DeepSpeed는 PyPI 또는 Transformers에서 설치할 수 있습니다(자세한 설치 옵션은 DeepSpeed [설치 상세사항](https://www.deepspeed.ai/tutorials/advanced-install/) 또는 GitHub [README](https://github.com/microsoft/deepspeed#installation)를 참조하세요).
|
||||
DeepSpeed는 PyPI 또는 Transformers에서 설치할 수 있습니다(자세한 설치 옵션은 DeepSpeed [설치 상세사항](https://www.deepspeed.ai/tutorials/advanced-install/) 또는 GitHub [README](https://github.com/deepspeedai/DeepSpeed#installation)를 참조하세요).
|
||||
|
||||
<Tip>
|
||||
|
||||
@ -114,10 +114,10 @@ DeepSpeed는 트레이닝 실행 방법을 구성하는 모든 매개변수가
|
||||
|
||||
<Tip>
|
||||
|
||||
DeepSpeed 구성 옵션의 전체 목록은 [DeepSpeed Configuration JSON](https://www.deepspeed.ai/docs/config-json/)에서 확인할 수 있습니다. 또한 [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples) 리포지토리 또는 기본 [DeepSpeed](https://github.com/microsoft/DeepSpeed) 리포지토리에서 다양한 DeepSpeed 구성 예제에 대한 보다 실용적인 예제를 찾을 수 있습니다. 구체적인 예제를 빠르게 찾으려면 다음과 같이 하세요:
|
||||
DeepSpeed 구성 옵션의 전체 목록은 [DeepSpeed Configuration JSON](https://www.deepspeed.ai/docs/config-json/)에서 확인할 수 있습니다. 또한 [DeepSpeedExamples](https://github.com/deepspeedai/DeepSpeedExamples) 리포지토리 또는 기본 [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) 리포지토리에서 다양한 DeepSpeed 구성 예제에 대한 보다 실용적인 예제를 찾을 수 있습니다. 구체적인 예제를 빠르게 찾으려면 다음과 같이 하세요:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeedExamples
|
||||
git clone https://github.com/deepspeedai/DeepSpeedExamples
|
||||
cd DeepSpeedExamples
|
||||
find . -name '*json'
|
||||
# Lamb 옵티마이저 샘플 찾기
|
||||
@ -303,7 +303,7 @@ ZeRO-3로 대규모 모델을 초기화하고 매개변수에 액세스하는
|
||||
|
||||
[ZeRO-Infinity](https://hf.co/papers/2104.07857)를 사용하면 모델 상태를 CPU 및/또는 NVMe로 오프로드하여 더 많은 메모리를 절약할 수 있습니다. 스마트 파티셔닝 및 타일링 알고리즘을 통해 각 GPU는 오프로딩 중에 매우 적은 양의 데이터를 주고받을 수 있으므로 최신 NVMe는 훈련 프로세스에 사용할 수 있는 것보다 훨씬 더 큰 총 메모리 풀에 맞출 수 있습니다. ZeRO-Infinity에는 ZeRO-3가 필요합니다.
|
||||
|
||||
사용 가능한 CPU 및/또는 NVMe 메모리에 따라 [옵티마이저](https://www.deepspeed.ai/docs/config-json/#optimizer-offloading)와 [매개변수](https://www.deepspeed.ai/docs/config-json/#parameter-offloading) 중 하나만 오프로드하거나 아무것도 오프로드하지 않을 수 있습니다. 또한 일반 하드 드라이브나 솔리드 스테이트 드라이브에서도 작동하지만 속도가 현저히 느려지므로 `nvme_path`가 NVMe 장치를 가리키고 있는지 확인해야 합니다. 최신 NVMe를 사용하면 읽기 작업의 경우 최대 3.5GB/s, 쓰기 작업의 경우 최대 3GB/s의 전송 속도를 기대할 수 있습니다. 마지막으로, 트레이닝 설정에서 [벤치마크 실행하기](https://github.com/microsoft/DeepSpeed/issues/998)을 통해 최적의 'aio' 구성을 결정합니다.
|
||||
사용 가능한 CPU 및/또는 NVMe 메모리에 따라 [옵티마이저](https://www.deepspeed.ai/docs/config-json/#optimizer-offloading)와 [매개변수](https://www.deepspeed.ai/docs/config-json/#parameter-offloading) 중 하나만 오프로드하거나 아무것도 오프로드하지 않을 수 있습니다. 또한 일반 하드 드라이브나 솔리드 스테이트 드라이브에서도 작동하지만 속도가 현저히 느려지므로 `nvme_path`가 NVMe 장치를 가리키고 있는지 확인해야 합니다. 최신 NVMe를 사용하면 읽기 작업의 경우 최대 3.5GB/s, 쓰기 작업의 경우 최대 3GB/s의 전송 속도를 기대할 수 있습니다. 마지막으로, 트레이닝 설정에서 [벤치마크 실행하기](https://github.com/deepspeedai/DeepSpeed/issues/998)을 통해 최적의 'aio' 구성을 결정합니다.
|
||||
|
||||
아래 예제 ZeRO-3/Infinity 구성 파일은 대부분의 매개변수 값을 `auto`으로 설정하고 있지만, 수동으로 값을 추가할 수도 있습니다.
|
||||
|
||||
@ -1141,7 +1141,7 @@ rank1:
|
||||
|
||||
## 트러블슈팅[[troubleshoot]]
|
||||
|
||||
문제가 발생하면 DeepSpeed가 문제의 원인이 아닌 경우가 많으므로(아주 명백하고 예외적으로 DeepSpeed 모듈을 볼 수 있는 경우가 아니라면) DeepSpeed가 문제의 원인인지 고려해야 합니다! 첫 번째 단계는 DeepSpeed 없이 설정을 다시 시도하고 문제가 지속되면 문제를 신고하는 것입니다. 문제가 핵심적인 DeepSpeed 문제이고 transformers와 관련이 없는 경우, [DeepSpeed 리포지토리](https://github.com/microsoft/DeepSpeed)에서 이슈를 개설하세요.
|
||||
문제가 발생하면 DeepSpeed가 문제의 원인이 아닌 경우가 많으므로(아주 명백하고 예외적으로 DeepSpeed 모듈을 볼 수 있는 경우가 아니라면) DeepSpeed가 문제의 원인인지 고려해야 합니다! 첫 번째 단계는 DeepSpeed 없이 설정을 다시 시도하고 문제가 지속되면 문제를 신고하는 것입니다. 문제가 핵심적인 DeepSpeed 문제이고 transformers와 관련이 없는 경우, [DeepSpeed 리포지토리](https://github.com/deepspeedai/DeepSpeed)에서 이슈를 개설하세요.
|
||||
|
||||
transformers와 관련된 이슈를 개설할 때에는 다음 정보를 제공해 주세요:
|
||||
|
||||
@ -1211,7 +1211,7 @@ NVMe 및 ZeRO-3를 설정한 경우 NVMe로 오프로드를 실험해 보세요(
|
||||
|
||||
## 리소스[[resources]]
|
||||
|
||||
DeepSpeed ZeRO는 제한된 GPU 리소스로 추론을 위해 매우 큰 모델을 훈련하고 로드하는 강력한 기술로, 누구나 쉽게 사용할 수 있습니다. DeepSpeed에 대해 자세히 알아보려면 [블로그 포스트](https://www.microsoft.com/en-us/research/search/?q=deepspeed), [공식 문서](https://www.deepspeed.ai/getting-started/), [깃허브 리포지토리](https://github.com/microsoft/deepspeed)를 참조하세요.
|
||||
DeepSpeed ZeRO는 제한된 GPU 리소스로 추론을 위해 매우 큰 모델을 훈련하고 로드하는 강력한 기술로, 누구나 쉽게 사용할 수 있습니다. DeepSpeed에 대해 자세히 알아보려면 [블로그 포스트](https://www.microsoft.com/en-us/research/search/?q=deepspeed), [공식 문서](https://www.deepspeed.ai/getting-started/), [깃허브 리포지토리](https://github.com/deepspeedai/DeepSpeed)를 참조하세요.
|
||||
|
||||
다음 문서도 ZeRO에 대해 자세히 알아볼 수 있는 훌륭한 자료입니다:
|
||||
|
||||
|
@ -386,7 +386,7 @@ DeepSpeed [pipeline tutorial](https://www.deepspeed.ai/tutorials/pipeline/)에
|
||||
각 차원마다 적어도 2개의 GPU가 필요하므로 최소한 4개의 GPU가 필요합니다.
|
||||
|
||||
구현:
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed)
|
||||
- [DeepSpeed](https://github.com/deepspeedai/DeepSpeed)
|
||||
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
||||
- [Varuna](https://github.com/microsoft/varuna)
|
||||
- [SageMaker](https://arxiv.org/abs/2111.05972)
|
||||
@ -405,7 +405,7 @@ DeepSpeed [pipeline tutorial](https://www.deepspeed.ai/tutorials/pipeline/)에
|
||||
각 차원마다 적어도 2개의 GPU가 필요하므로 최소한 8개의 GPU가 필요합니다.
|
||||
|
||||
구현:
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed) - DeepSpeed는 더욱 효율적인 DP인 ZeRO-DP라고도 부릅니다.
|
||||
- [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) - DeepSpeed는 더욱 효율적인 DP인 ZeRO-DP라고도 부릅니다.
|
||||
- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
||||
- [Varuna](https://github.com/microsoft/varuna)
|
||||
- [SageMaker](https://arxiv.org/abs/2111.05972)
|
||||
|
@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
# DeepSpeed集成
|
||||
|
||||
[DeepSpeed](https://github.com/microsoft/DeepSpeed)实现了[ZeRO论文](https://arxiv.org/abs/1910.02054)中描述的所有内容。目前,它提供对以下功能的全面支持:
|
||||
[DeepSpeed](https://github.com/deepspeedai/DeepSpeed)实现了[ZeRO论文](https://arxiv.org/abs/1910.02054)中描述的所有内容。目前,它提供对以下功能的全面支持:
|
||||
|
||||
1. 优化器状态分区(ZeRO stage 1)
|
||||
2. 梯度分区(ZeRO stage 2)
|
||||
@ -31,7 +31,7 @@ DeepSpeed ZeRO-2主要用于训练,因为它的特性对推理没有用处。
|
||||
|
||||
DeepSpeed ZeRO-3也可以用于推理,因为它允许将单个GPU无法加载的大模型加载到多个GPU上。
|
||||
|
||||
🤗 Transformers通过以下两种方式集成了[DeepSpeed](https://github.com/microsoft/DeepSpeed):
|
||||
🤗 Transformers通过以下两种方式集成了[DeepSpeed](https://github.com/deepspeedai/DeepSpeed):
|
||||
|
||||
1. 通过[`Trainer`]集成核心的DeepSpeed功能。这是一种“为您完成一切”式的集成 - 您只需提供自定义配置文件或使用我们的模板配置文件。本文档的大部分内容都集中在这个功能上。
|
||||
2. 如果您不使用[`Trainer`]并希望在自己的Trainer中集成DeepSpeed,那么像`from_pretrained`和`from_config`这样的核心功能函数将包括ZeRO stage 3及以上的DeepSpeed的基础部分,如`zero.Init`。要利用此功能,请阅读有关[非Trainer DeepSpeed集成](#nontrainer-deepspeed-integration)的文档。
|
||||
@ -72,7 +72,7 @@ pip install deepspeed
|
||||
pip install transformers[deepspeed]
|
||||
```
|
||||
|
||||
或在 [DeepSpeed 的 GitHub 页面](https://github.com/microsoft/deepspeed#installation) 和
|
||||
或在 [DeepSpeed 的 GitHub 页面](https://github.com/deepspeedai/DeepSpeed#installation) 和
|
||||
[高级安装](https://www.deepspeed.ai/tutorials/advanced-install/) 中查找更多详细信息。
|
||||
|
||||
如果构建过程中仍然遇到问题,请首先确保阅读 [CUDA 扩展安装注意事项](trainer#cuda-extension-installation-notes)。
|
||||
@ -83,7 +83,7 @@ pip install transformers[deepspeed]
|
||||
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
git clone https://github.com/deepspeedai/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install . \
|
||||
@ -105,7 +105,7 @@ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.get_device_capa
|
||||
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeed/
|
||||
git clone https://github.com/deepspeedai/DeepSpeed/
|
||||
cd DeepSpeed
|
||||
rm -rf build
|
||||
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 \
|
||||
@ -142,7 +142,7 @@ _CudaDeviceProperties(name='GeForce RTX 3090', major=8, minor=6, total_memory=24
|
||||
|
||||
您也可以完全省略 `TORCH_CUDA_ARCH_LIST`,然后构建程序将自动查询构建所在的 GPU 的架构。这可能与目标机器上的 GPU 不匹配,因此最好明确指定所需的架构。
|
||||
|
||||
如果尝试了所有建议的方法仍然遇到构建问题,请继续在 [Deepspeed](https://github.com/microsoft/DeepSpeed/issues)的 GitHub Issue 上提交问题。
|
||||
如果尝试了所有建议的方法仍然遇到构建问题,请继续在 [Deepspeed](https://github.com/deepspeedai/DeepSpeed/issues)的 GitHub Issue 上提交问题。
|
||||
|
||||
|
||||
<a id='deepspeed-multi-gpu'></a>
|
||||
@ -471,10 +471,10 @@ deepspeed examples/pytorch/translation/run_translation.py ...
|
||||
|
||||
有关可以在 DeepSpeed 配置文件中使用的完整配置选项的详细指南,请参阅[以下文档](https://www.deepspeed.ai/docs/config-json/)。
|
||||
|
||||
您可以在 [DeepSpeedExamples 仓库](https://github.com/microsoft/DeepSpeedExamples)中找到解决各种实际需求的数十个 DeepSpeed 配置示例。
|
||||
您可以在 [DeepSpeedExamples 仓库](https://github.com/deepspeedai/DeepSpeedExamples)中找到解决各种实际需求的数十个 DeepSpeed 配置示例。
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/DeepSpeedExamples
|
||||
git clone https://github.com/deepspeedai/DeepSpeedExamples
|
||||
cd DeepSpeedExamples
|
||||
find . -name '*json'
|
||||
```
|
||||
@ -485,7 +485,7 @@ find . -name '*json'
|
||||
grep -i Lamb $(find . -name '*json')
|
||||
```
|
||||
|
||||
还可以在[主仓](https://github.com/microsoft/DeepSpeed)中找到更多示例。
|
||||
还可以在[主仓](https://github.com/deepspeedai/DeepSpeed)中找到更多示例。
|
||||
|
||||
在使用 DeepSpeed 时,您总是需要提供一个 DeepSpeed 配置文件,但是一些配置参数必须通过命令行进行配置。您将在本指南的剩余章节找到这些细微差别。
|
||||
|
||||
@ -797,7 +797,7 @@ ZeRO-Infinity 通过使用 NVMe 内存扩展 GPU 和 CPU 内存,从而允许
|
||||
|
||||
确保您的 `nvme_path` 实际上是一个 NVMe,因为它与普通硬盘或 SSD 一起工作,但速度会慢得多。快速可扩展的训练是根据现代 NVMe 传输速度设计的(截至本文撰写时,可以达到 ~3.5GB/s 读取,~3GB/s 写入的峰值速度)。
|
||||
|
||||
为了找出最佳的 `aio` 配置块,您必须在目标设置上运行一个基准测试,具体操作请参见[说明](https://github.com/microsoft/DeepSpeed/issues/998)。
|
||||
为了找出最佳的 `aio` 配置块,您必须在目标设置上运行一个基准测试,具体操作请参见[说明](https://github.com/deepspeedai/DeepSpeed/issues/998)。
|
||||
|
||||
|
||||
|
||||
@ -1789,7 +1789,7 @@ SW: Model with 2783M total params, 65M largest layer params.
|
||||
|
||||
因此,如果问题明显与DeepSpeed相关,例如您可以看到有一个异常并且可以看到DeepSpeed模块涉及其中,请先重新测试没有DeepSpeed的设置。只有当问题仍然存在时,才向Deepspeed提供所有必需的细节。
|
||||
|
||||
- 如果您明确问题是在Deepspeed核心中而不是集成部分,请直接向[Deepspeed](https://github.com/microsoft/DeepSpeed/)提交问题。如果您不确定,请不要担心,无论使用哪个issue跟踪问题都可以,一旦您发布问题,我们会弄清楚并将其重定向到另一个issue跟踪(如果需要的话)。
|
||||
- 如果您明确问题是在Deepspeed核心中而不是集成部分,请直接向[Deepspeed](https://github.com/deepspeedai/DeepSpeed/)提交问题。如果您不确定,请不要担心,无论使用哪个issue跟踪问题都可以,一旦您发布问题,我们会弄清楚并将其重定向到另一个issue跟踪(如果需要的话)。
|
||||
|
||||
|
||||
|
||||
@ -2086,7 +2086,7 @@ RUN_SLOW=1 pytest tests/deepspeed
|
||||
|
||||
## 主要的DeepSpeed资源
|
||||
|
||||
- [项目GitHub](https://github.com/microsoft/deepspeed)
|
||||
- [项目GitHub](https://github.com/deepspeedai/DeepSpeed)
|
||||
- [使用文档](https://www.deepspeed.ai/getting-started/)
|
||||
- [API文档](https://deepspeed.readthedocs.io/en/latest/index.html)
|
||||
- [博客文章](https://www.microsoft.com/en-us/research/search/?q=deepspeed)
|
||||
@ -2097,4 +2097,4 @@ RUN_SLOW=1 pytest tests/deepspeed
|
||||
- [ZeRO-Offload: Democratizing Billion-Scale Model Training](https://arxiv.org/abs/2101.06840)
|
||||
- [ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning](https://arxiv.org/abs/2104.07857)
|
||||
|
||||
最后,请记住,HuggingFace [`Trainer`]仅集成了DeepSpeed,因此如果您在使用DeepSpeed时遇到任何问题或疑问,请在[DeepSpeed GitHub](https://github.com/microsoft/DeepSpeed/issues)上提交一个issue。
|
||||
最后,请记住,HuggingFace [`Trainer`]仅集成了DeepSpeed,因此如果您在使用DeepSpeed时遇到任何问题或疑问,请在[DeepSpeed GitHub](https://github.com/deepspeedai/DeepSpeed/issues)上提交一个issue。
|
||||
|
@ -182,7 +182,7 @@ my_app.py ... --log_level error --log_level_replica error --log_on_each_node 0
|
||||
python -m torch.distributed.launch --nproc_per_node=2 trainer-program.py ...
|
||||
```
|
||||
|
||||
如果你安装了 [`accelerate`](https://github.com/huggingface/accelerate) 或 [`deepspeed`](https://github.com/microsoft/DeepSpeed),你还可以通过以下任一方法实现相同的效果:
|
||||
如果你安装了 [`accelerate`](https://github.com/huggingface/accelerate) 或 [`deepspeed`](https://github.com/deepspeedai/DeepSpeed),你还可以通过以下任一方法实现相同的效果:
|
||||
|
||||
|
||||
```bash
|
||||
@ -281,7 +281,7 @@ export CUDA_VISIBLE_DEVICES=1,0
|
||||
|
||||
[`Trainer`] 已经被扩展,以支持可能显著提高训练时间并适应更大模型的库。
|
||||
|
||||
目前,它支持第三方解决方案 [DeepSpeed](https://github.com/microsoft/DeepSpeed) 和 [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html),它们实现了论文 [ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He](https://arxiv.org/abs/1910.02054) 的部分内容。
|
||||
目前,它支持第三方解决方案 [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) 和 [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html),它们实现了论文 [ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He](https://arxiv.org/abs/1910.02054) 的部分内容。
|
||||
|
||||
截至撰写本文,此提供的支持是新的且实验性的。尽管我们欢迎围绕 DeepSpeed 和 PyTorch FSDP 的issues,但我们不再支持 FairScale 集成,因为它已经集成到了 PyTorch 主线(参见 [PyTorch FSDP 集成](#pytorch-fully-sharded-data-parallel))。
|
||||
|
||||
@ -293,7 +293,7 @@ export CUDA_VISIBLE_DEVICES=1,0
|
||||
|
||||
撰写时,Deepspeed 需要在使用之前编译 CUDA C++ 代码。
|
||||
|
||||
虽然所有安装问题都应通过 [Deepspeed](https://github.com/microsoft/DeepSpeed/issues) 的 GitHub Issues处理,但在构建依赖CUDA 扩展的任何 PyTorch 扩展时,可能会遇到一些常见问题。
|
||||
虽然所有安装问题都应通过 [Deepspeed](https://github.com/deepspeedai/DeepSpeed/issues) 的 GitHub Issues处理,但在构建依赖CUDA 扩展的任何 PyTorch 扩展时,可能会遇到一些常见问题。
|
||||
|
||||
因此,如果在执行以下操作时遇到与 CUDA 相关的构建问题:
|
||||
|
||||
|
@ -383,8 +383,8 @@ def deepspeed_init(trainer, num_training_steps, inference=False):
|
||||
Returns: optimizer, lr_scheduler
|
||||
|
||||
We may use `deepspeed_init` more than once during the life of Trainer, when we do - it's a temp hack based on:
|
||||
https://github.com/microsoft/DeepSpeed/issues/1394#issuecomment-937405374 until Deepspeed fixes a bug where it
|
||||
can't resume from a checkpoint after it did some stepping https://github.com/microsoft/DeepSpeed/issues/1612
|
||||
https://github.com/deepspeedai/DeepSpeed/issues/1394#issuecomment-937405374 until Deepspeed fixes a bug where it
|
||||
can't resume from a checkpoint after it did some stepping https://github.com/deepspeedai/DeepSpeed/issues/1612
|
||||
|
||||
"""
|
||||
from deepspeed.utils import logger as ds_logger
|
||||
|
@ -568,7 +568,7 @@ class TrainingArguments:
|
||||
fsdp_min_num_params or fsdp_transformer_layer_cls_to_wrap.
|
||||
|
||||
deepspeed (`str` or `dict`, *optional*):
|
||||
Use [Deepspeed](https://github.com/microsoft/deepspeed). This is an experimental feature and its API may
|
||||
Use [Deepspeed](https://github.com/deepspeedai/DeepSpeed). This is an experimental feature and its API may
|
||||
evolve in the future. The value is either the location of DeepSpeed json config file (e.g.,
|
||||
`ds_config.json`) or an already loaded json file as a `dict`"
|
||||
|
||||
|
@ -898,7 +898,7 @@ class TrainerIntegrationDeepSpeed(TrainerIntegrationDeepSpeedWithCustomConfig, T
|
||||
self.check_trainer_state_are_the_same(state, state1)
|
||||
|
||||
# Finally, should be able to resume with the same trainer/same deepspeed engine instance
|
||||
# XXX: but currently this not possible due DS bug: https://github.com/microsoft/DeepSpeed/issues/1612
|
||||
# XXX: but currently this not possible due DS bug: https://github.com/deepspeedai/DeepSpeed/issues/1612
|
||||
# trainer.train(resume_from_checkpoint=checkpoint)
|
||||
# a workaround needs to be used that re-creates the deepspeed engine
|
||||
|
||||
@ -975,7 +975,7 @@ class TrainerIntegrationDeepSpeed(TrainerIntegrationDeepSpeedWithCustomConfig, T
|
||||
def test_load_best_model(self, stage, dtype):
|
||||
# Test that forced deepspeed reinit doesn't break the model. the forced re-init after
|
||||
# loading the best model in Trainer is there to workaround this bug in Deepspeed
|
||||
# https://github.com/microsoft/DeepSpeed/issues/1612
|
||||
# https://github.com/deepspeedai/DeepSpeed/issues/1612
|
||||
#
|
||||
# The test is derived from a repro script submitted in this Issue:
|
||||
# https://github.com/huggingface/transformers/issues/17114
|
||||
|
Loading…
Reference in New Issue
Block a user