Commit Graph

13 Commits

Author SHA1 Message Date
Joao Gante
0863eef248
[tests] remove pt_tf equivalence tests (#36253) 2025-02-19 11:55:11 +00:00
Yih-Dar
3d79dcbda0
update push CI workflow files for security (#33142)
* update for security 1

* update for security 2

* update for security 3

* update for security 4

* update for security 5

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-08-28 18:15:58 +02:00
Sai-Suraj-27
af638c4afe
fix: Added missing huggingface_hub installation to workflows (#32891)
Added missing huggingface_hub installation to workflows.
2024-08-22 12:51:12 +01:00
fxmarty
37bba2a32d
CI: update to ROCm 6.0.2 and test MI300 (#30266)
* update to ROCm 6.0.2 and test MI300

* add callers for mi300

* update dockerfile

* fix trainer tests

* remove apex

* style

* Update tests/trainer/test_trainer_seq2seq.py

* Update tests/trainer/test_trainer_seq2seq.py

* Update tests/trainer/test_trainer_seq2seq.py

* Update tests/trainer/test_trainer_seq2seq.py

* update to torch 2.3

* add workflow dispatch target

* we may need branches: mi300-ci after all

* nit

* fix docker build

* nit

* add check runner

* remove docker-gpu

* fix issues

* fix

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-13 18:14:36 +02:00
Yih-Dar
fbb41cd420
consistent job / pytest report / artifact name correspondence (#30392)
* better names

* run better names

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-04-24 22:32:42 +02:00
Yih-Dar
440bd3c3c0
update github actions packages' version to suppress warnings (#30249)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-04-15 15:08:09 +02:00
Yih-Dar
95346e9dcd
Add artifact name in job step to maintain job / artifact correspondence (#28682)
* avoid using job name

* apply to other files

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-31 15:58:17 +01:00
Patrick von Platen
cbbe30749b
[Whisper] Fix slow test (#28407)
* [Whisper] Fix slow test

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-10 22:35:36 +01:00
Yih-Dar
d903abfccc
Fix AMD CI not showing GPU (#27555)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-17 10:44:37 +01:00
Yih-Dar
64e21ca2a4
Make some jobs run on the GitHub Actions runners (#27512)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-15 10:43:16 +01:00
Yih-Dar
12cc123359
Better way to run AMD CI with different flavors (#26634)
* Enable testing against mi250

* Change BERT to trigger tests

* Revert BERT's change

* AMD CI

* AMD CI

---------

Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-16 16:24:30 +02:00
Funtowicz Morgan
3632fb3c25
[AMD] Add initial version for run_tests_multi_gpu (#26346)
* Add initial version for run_tests_multi_gpu

* Trigger change in BERT

* fix typo setup -> setup_gpu

* Add tag mi210

* Enable multi-gpu jobs

* One more

* Use dynamic device allocation

* Attempt to fix syntax for docker create

* fix script path

* fix

* temp machine type

* fix label

* Enable multi-gpu tests

* Rename multi-amd-gpu to multi-gpu

* Let's not be lazy dude

* Update rocm-smi output

* Add gpu_flavour in the matrix

* Fix typos

* merge single/multi dispatch into the matrix

* Format.

* Revert BERT's change

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2023-10-03 11:13:45 +02:00
Funtowicz Morgan
2d71307dc0
Integrate AMD GPU in CI/CD environment (#26007)
* Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact

* Add a new artifact single-amdgpu testing on main

* Attempt to test the workflow without merging.

* Changed BERT to check if things are triggered

* Meet the dependencies graph on workflow

* Revert BERT changes

* Add check_runners_amdgpu to correctly mount and check availability

* Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD

* Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies

* Fix setup dependency graph to use check_runner_amdgpu

* Let's do the runner status check only on AMDGPU target

* Update the Dockerfile.amd to put ourselves in / rather than /var/lib

* Restore the whole setup for CUDA too.

* Let's redisable them

* Change BERT to trigger tests

* Restore BERT

* Add torchaudio with rocm 5.6 to AMD Dockerfile (#26050)

fix dockerfile

Co-authored-by: Felix Marty <felix@hf.co>

* Place AMD GPU tests in a separate workflow (correct branch) (#26105)

AMDGPU CI lives in an other workflow

* Fix invalid job name is dependencies.

* Remove tests multi-amdgpu for now.

* Use single-amdgpu

* Use --net=host for now.

* Remote host networking.

* Removed duplicated check_runners_amdgpu step

* Let's tag machine-types with mi210 for now.

* Machine type should be only mi210

* Remove unnecessary push.branches item

* Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels.

* Remove amdgpu from step names.

* finalize

* delete

---------

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-09-20 14:48:49 +02:00