mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 04:40:06 +06:00
Integrate AMD GPU in CI/CD environment (#26007)
* Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact * Add a new artifact single-amdgpu testing on main * Attempt to test the workflow without merging. * Changed BERT to check if things are triggered * Meet the dependencies graph on workflow * Revert BERT changes * Add check_runners_amdgpu to correctly mount and check availability * Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD * Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies * Fix setup dependency graph to use check_runner_amdgpu * Let's do the runner status check only on AMDGPU target * Update the Dockerfile.amd to put ourselves in / rather than /var/lib * Restore the whole setup for CUDA too. * Let's redisable them * Change BERT to trigger tests * Restore BERT * Add torchaudio with rocm 5.6 to AMD Dockerfile (#26050) fix dockerfile Co-authored-by: Felix Marty <felix@hf.co> * Place AMD GPU tests in a separate workflow (correct branch) (#26105) AMDGPU CI lives in an other workflow * Fix invalid job name is dependencies. * Remove tests multi-amdgpu for now. * Use single-amdgpu * Use --net=host for now. * Remote host networking. * Removed duplicated check_runners_amdgpu step * Let's tag machine-types with mi210 for now. * Machine type should be only mi210 * Remove unnecessary push.branches item * Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels. * Remove amdgpu from step names. * finalize * delete --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This commit is contained in:
parent
37c205eb5d
commit
2d71307dc0
73
.github/workflows/build-docker-images.yml
vendored
73
.github/workflows/build-docker-images.yml
vendored
@ -34,19 +34,19 @@ jobs:
|
||||
sudo du -sh /usr/share/
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v2
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v2
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v3
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-all-latest-gpu
|
||||
build-args: |
|
||||
@ -59,7 +59,7 @@ jobs:
|
||||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
uses: docker/build-push-action@v3
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-all-latest-gpu
|
||||
build-args: |
|
||||
@ -83,19 +83,19 @@ jobs:
|
||||
sudo du -sh /usr/share/
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v2
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v2
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v3
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
|
||||
build-args: |
|
||||
@ -120,13 +120,13 @@ jobs:
|
||||
sudo du -sh /usr/share/
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v2
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v2
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
@ -136,7 +136,7 @@ jobs:
|
||||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
uses: docker/build-push-action@v3
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
|
||||
build-args: |
|
||||
@ -152,19 +152,19 @@ jobs:
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v2
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v2
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v3
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-doc-builder
|
||||
push: true
|
||||
@ -188,19 +188,19 @@ jobs:
|
||||
sudo du -sh /usr/share/
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v2
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v2
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v3
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-gpu
|
||||
build-args: |
|
||||
@ -208,6 +208,41 @@ jobs:
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-gpu
|
||||
|
||||
latest-pytorch-amd:
|
||||
name: "Latest PyTorch (AMD) [dev]"
|
||||
runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210]
|
||||
steps:
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v3
|
||||
- name: Login to DockerHub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-amd-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
|
||||
# Push CI images still need to be re-built daily
|
||||
-
|
||||
name: Build and push (for Push CI) in a daily basis
|
||||
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
|
||||
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
|
||||
if: inputs.image_postfix != '-push-ci'
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-pytorch-amd-gpu
|
||||
build-args: |
|
||||
REF=main
|
||||
push: true
|
||||
tags: huggingface/transformers-pytorch-amd-gpu-push-ci
|
||||
|
||||
latest-tensorflow:
|
||||
name: "Latest TensorFlow [dev]"
|
||||
# Push CI doesn't need this image
|
||||
@ -216,19 +251,19 @@ jobs:
|
||||
steps:
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v2
|
||||
uses: docker/setup-buildx-action@v3
|
||||
-
|
||||
name: Check out code
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v2
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v3
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: ./docker/transformers-tensorflow-gpu
|
||||
build-args: |
|
||||
|
330
.github/workflows/self-push-amd.yml
vendored
Normal file
330
.github/workflows/self-push-amd.yml
vendored
Normal file
@ -0,0 +1,330 @@
|
||||
name: Self-hosted runner AMD GPU (push)
|
||||
|
||||
on:
|
||||
workflow_run:
|
||||
workflows: ["Self-hosted runner (push-caller)"]
|
||||
branches: ["main"]
|
||||
types: [completed]
|
||||
push:
|
||||
branches:
|
||||
- ci_*
|
||||
- ci-*
|
||||
paths:
|
||||
- "src/**"
|
||||
- "tests/**"
|
||||
- ".github/**"
|
||||
- "templates/**"
|
||||
- "utils/**"
|
||||
repository_dispatch:
|
||||
|
||||
env:
|
||||
HF_HOME: /mnt/cache
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
OMP_NUM_THREADS: 8
|
||||
MKL_NUM_THREADS: 8
|
||||
PYTEST_TIMEOUT: 60
|
||||
TF_FORCE_GPU_ALLOW_GROWTH: true
|
||||
RUN_PT_TF_CROSS_TESTS: 1
|
||||
|
||||
jobs:
|
||||
check_runner_status:
|
||||
name: Check Runner Status
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout transformers
|
||||
uses: actions/checkout@v3
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: Check Runner Status
|
||||
run: python utils/check_self_hosted_runner.py --target_runners amd-mi210-single-gpu-ci-runner-docker --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
|
||||
|
||||
check_runners:
|
||||
name: Check Runners
|
||||
needs: check_runner_status
|
||||
strategy:
|
||||
matrix:
|
||||
machine_type: [single-gpu]
|
||||
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', mi210]
|
||||
container:
|
||||
# --device /dev/dri/renderD128 == AMDGPU:0 (indexing for AMDGPU starts at 128 ...)
|
||||
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
|
||||
options: --device /dev/kfd --device /dev/dri/renderD128 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: ROCM-SMI
|
||||
run: |
|
||||
rocm-smi
|
||||
|
||||
setup_gpu:
|
||||
name: Setup
|
||||
needs: check_runners
|
||||
strategy:
|
||||
matrix:
|
||||
machine_type: [single-gpu]
|
||||
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', mi210]
|
||||
container:
|
||||
# --device /dev/dri/renderD128 == AMDGPU:0 (indexing for AMDGPU starts at 128 ...)
|
||||
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
|
||||
options: --device /dev/kfd --device /dev/dri/renderD128 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
outputs:
|
||||
matrix: ${{ steps.set-matrix.outputs.matrix }}
|
||||
test_map: ${{ steps.set-matrix.outputs.test_map }}
|
||||
steps:
|
||||
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
|
||||
# We also take into account the `push` event (we might want to test some changes in a branch)
|
||||
- name: Prepare custom environment variables
|
||||
shell: bash
|
||||
# `CI_BRANCH_PUSH`: The branch name from the push event
|
||||
# `CI_BRANCH_WORKFLOW_RUN`: The name of the branch on which this workflow is triggered by `workflow_run` event
|
||||
# `CI_BRANCH`: The non-empty branch name from the above two (one and only one of them is empty)
|
||||
# `CI_SHA_PUSH`: The commit SHA from the push event
|
||||
# `CI_SHA_WORKFLOW_RUN`: The commit SHA that triggers this workflow by `workflow_run` event
|
||||
# `CI_SHA`: The non-empty commit SHA from the above two (one and only one of them is empty)
|
||||
run: |
|
||||
CI_BRANCH_PUSH=${{ github.event.ref }}
|
||||
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
|
||||
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
|
||||
CI_SHA_PUSH=${{ github.event.head_commit.id }}
|
||||
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
|
||||
echo $CI_BRANCH_PUSH
|
||||
echo $CI_BRANCH_WORKFLOW_RUN
|
||||
echo $CI_SHA_PUSH
|
||||
echo $CI_SHA_WORKFLOW_RUN
|
||||
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
|
||||
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
|
||||
|
||||
- name: print environment variables
|
||||
run: |
|
||||
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
|
||||
echo "env.CI_SHA = ${{ env.CI_SHA }}"
|
||||
|
||||
- name: Update clone using environment variables
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
echo "original branch = $(git branch --show-current)"
|
||||
git fetch && git checkout ${{ env.CI_BRANCH }}
|
||||
echo "updated branch = $(git branch --show-current)"
|
||||
git checkout ${{ env.CI_SHA }}
|
||||
echo "log = $(git log -n 1)"
|
||||
|
||||
- name: Cleanup
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
rm -rf tests/__pycache__
|
||||
rm -rf tests/models/__pycache__
|
||||
rm -rf reports
|
||||
|
||||
- name: Show installed libraries and their versions
|
||||
working-directory: /transformers
|
||||
run: pip freeze
|
||||
|
||||
- name: Fetch the tests to run
|
||||
working-directory: /transformers
|
||||
# TODO: add `git-python` in the docker images
|
||||
run: |
|
||||
pip install --upgrade git-python
|
||||
python3 utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
|
||||
|
||||
- name: Report fetched tests
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: test_fetched
|
||||
path: /transformers/test_preparation.txt
|
||||
|
||||
- id: set-matrix
|
||||
name: Organize tests into models
|
||||
working-directory: /transformers
|
||||
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models/bert`, `tokenization`, `pipeline`, etc.
|
||||
# The `test_map` is used to get the actual identified test files under each key.
|
||||
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
|
||||
run: |
|
||||
if [ -f test_map.json ]; then
|
||||
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(d)')
|
||||
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(test_map)')
|
||||
else
|
||||
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
|
||||
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
|
||||
fi
|
||||
echo $keys
|
||||
echo $test_map
|
||||
echo "matrix=$keys" >> $GITHUB_OUTPUT
|
||||
echo "test_map=$test_map" >> $GITHUB_OUTPUT
|
||||
|
||||
run_tests_single_gpu:
|
||||
name: Model tests
|
||||
needs: setup_gpu
|
||||
# `dummy` means there is no test to run
|
||||
if: contains(fromJson(needs.setup_gpu.outputs.matrix), 'dummy') != true
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
folders: ${{ fromJson(needs.setup_gpu.outputs.matrix) }}
|
||||
machine_type: [single-gpu]
|
||||
runs-on: [self-hosted, docker-gpu, amd-gpu, '${{ matrix.machine_type }}', mi210]
|
||||
container:
|
||||
# --device /dev/dri/renderD128 == AMDGPU:0 (indexing for AMDGPU starts at 128 ...)
|
||||
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
|
||||
options: --device /dev/kfd --device /dev/dri/renderD128 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
|
||||
# We also take into account the `push` event (we might want to test some changes in a branch)
|
||||
- name: Prepare custom environment variables
|
||||
shell: bash
|
||||
# For the meaning of these environment variables, see the job `Setup`
|
||||
run: |
|
||||
CI_BRANCH_PUSH=${{ github.event.ref }}
|
||||
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
|
||||
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
|
||||
CI_SHA_PUSH=${{ github.event.head_commit.id }}
|
||||
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
|
||||
echo $CI_BRANCH_PUSH
|
||||
echo $CI_BRANCH_WORKFLOW_RUN
|
||||
echo $CI_SHA_PUSH
|
||||
echo $CI_SHA_WORKFLOW_RUN
|
||||
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
|
||||
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
|
||||
|
||||
- name: print environment variables
|
||||
run: |
|
||||
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
|
||||
echo "env.CI_SHA = ${{ env.CI_SHA }}"
|
||||
|
||||
- name: Update clone using environment variables
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
echo "original branch = $(git branch --show-current)"
|
||||
git fetch && git checkout ${{ env.CI_BRANCH }}
|
||||
echo "updated branch = $(git branch --show-current)"
|
||||
git checkout ${{ env.CI_SHA }}
|
||||
echo "log = $(git log -n 1)"
|
||||
|
||||
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
|
||||
working-directory: /transformers
|
||||
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
|
||||
|
||||
- name: Echo folder ${{ matrix.folders }}
|
||||
shell: bash
|
||||
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
|
||||
# set the artifact folder names (because the character `/` is not allowed).
|
||||
run: |
|
||||
echo "${{ matrix.folders }}"
|
||||
echo "${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }}"
|
||||
matrix_folders=${{ matrix.folders }}
|
||||
matrix_folders=${matrix_folders/'models/'/'models_'}
|
||||
echo "$matrix_folders"
|
||||
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
|
||||
|
||||
- name: ROCM-SMI
|
||||
run: |
|
||||
rocm-smi
|
||||
|
||||
- name: Environment
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
python3 utils/print_env.py
|
||||
|
||||
- name: Show installed libraries and their versions
|
||||
working-directory: /transformers
|
||||
run: pip freeze
|
||||
|
||||
- name: Run all non-slow selected tests on GPU
|
||||
working-directory: /transformers
|
||||
run: |
|
||||
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }}
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ failure() }}
|
||||
continue-on-error: true
|
||||
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
|
||||
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
|
||||
|
||||
send_results:
|
||||
name: Send results to webhook
|
||||
runs-on: ubuntu-latest
|
||||
if: always()
|
||||
needs: [
|
||||
check_runner_status,
|
||||
check_runners,
|
||||
setup_gpu,
|
||||
run_tests_single_gpu,
|
||||
# run_tests_multi_gpu,
|
||||
# run_tests_torch_cuda_extensions_single_gpu,
|
||||
# run_tests_torch_cuda_extensions_multi_gpu
|
||||
]
|
||||
steps:
|
||||
- name: Preliminary job status
|
||||
shell: bash
|
||||
# For the meaning of these environment variables, see the job `Setup`
|
||||
run: |
|
||||
echo "Runner availability: ${{ needs.check_runner_status.result }}"
|
||||
echo "Setup status: ${{ needs.setup_gpu.result }}"
|
||||
echo "Runner status: ${{ needs.check_runners.result }}"
|
||||
|
||||
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
|
||||
# We also take into account the `push` event (we might want to test some changes in a branch)
|
||||
- name: Prepare custom environment variables
|
||||
shell: bash
|
||||
# For the meaning of these environment variables, see the job `Setup`
|
||||
run: |
|
||||
CI_BRANCH_PUSH=${{ github.event.ref }}
|
||||
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
|
||||
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
|
||||
CI_SHA_PUSH=${{ github.event.head_commit.id }}
|
||||
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
|
||||
echo $CI_BRANCH_PUSH
|
||||
echo $CI_BRANCH_WORKFLOW_RUN
|
||||
echo $CI_SHA_PUSH
|
||||
echo $CI_SHA_WORKFLOW_RUN
|
||||
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
|
||||
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
|
||||
|
||||
- name: print environment variables
|
||||
run: |
|
||||
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
|
||||
echo "env.CI_SHA = ${{ env.CI_SHA }}"
|
||||
|
||||
- uses: actions/checkout@v3
|
||||
# To avoid failure when multiple commits are merged into `main` in a short period of time.
|
||||
# Checking out to an old commit beyond the fetch depth will get an error `fatal: reference is not a tree: ...
|
||||
# (Only required for `workflow_run` event, where we get the latest HEAD on `main` instead of the event commit)
|
||||
with:
|
||||
fetch-depth: 20
|
||||
|
||||
- name: Update clone using environment variables
|
||||
run: |
|
||||
echo "original branch = $(git branch --show-current)"
|
||||
git fetch && git checkout ${{ env.CI_BRANCH }}
|
||||
echo "updated branch = $(git branch --show-current)"
|
||||
git checkout ${{ env.CI_SHA }}
|
||||
echo "log = $(git log -n 1)"
|
||||
|
||||
- uses: actions/download-artifact@v3
|
||||
- name: Send message to Slack
|
||||
env:
|
||||
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
|
||||
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
|
||||
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
|
||||
CI_SLACK_CHANNEL_ID_AMD: ${{ secrets.CI_SLACK_CHANNEL_ID_AMD }}
|
||||
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
|
||||
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_AMD }}
|
||||
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
|
||||
CI_EVENT: push
|
||||
CI_TITLE_PUSH: ${{ github.event.head_commit.message }}
|
||||
CI_TITLE_WORKFLOW_RUN: ${{ github.event.workflow_run.head_commit.message }}
|
||||
CI_SHA: ${{ env.CI_SHA }}
|
||||
RUNNER_STATUS: ${{ needs.check_runner_status.result }}
|
||||
RUNNER_ENV_STATUS: ${{ needs.check_runners.result }}
|
||||
SETUP_STATUS: ${{ needs.setup_gpu.result }}
|
||||
|
||||
# We pass `needs.setup_gpu.outputs.matrix` as the argument. A processing in `notification_service.py` to change
|
||||
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
|
||||
run: |
|
||||
pip install slack_sdk
|
||||
pip show slack_sdk
|
||||
python utils/notification_service.py "${{ needs.setup_gpu.outputs.matrix }}"
|
31
docker/transformers-pytorch-amd-gpu/Dockerfile
Normal file
31
docker/transformers-pytorch-amd-gpu/Dockerfile
Normal file
@ -0,0 +1,31 @@
|
||||
FROM rocm/pytorch:rocm5.6_ubuntu20.04_py3.8_pytorch_2.0.1
|
||||
LABEL maintainer="Hugging Face"
|
||||
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
RUN apt update && \
|
||||
apt install -y --no-install-recommends git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg && \
|
||||
apt clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN python3 -m pip install --no-cache-dir --upgrade pip setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0"
|
||||
|
||||
# If set to nothing, will install the latest version
|
||||
ARG PYTORCH='2.0.1'
|
||||
ARG TORCH_VISION='0.15.2'
|
||||
ARG TORCH_AUDIO='2.0.2'
|
||||
ARG ROCM='5.6'
|
||||
|
||||
RUN git clone --depth 1 --branch v$TORCH_AUDIO https://github.com/pytorch/audio.git
|
||||
RUN cd audio && USE_ROCM=1 USE_CUDA=0 python setup.py install
|
||||
|
||||
ARG REF=main
|
||||
WORKDIR /
|
||||
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
|
||||
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video]
|
||||
|
||||
RUN python3 -m pip uninstall -y tensorflow flax
|
||||
|
||||
# When installing in editable mode, `transformers` is not recognized as a package.
|
||||
# this line must be added in order for python to be aware of transformers.
|
||||
RUN cd transformers && python3 setup.py develop
|
Loading…
Reference in New Issue
Block a user