fxmarty
|
37bba2a32d
|
CI: update to ROCm 6.0.2 and test MI300 (#30266)
* update to ROCm 6.0.2 and test MI300
* add callers for mi300
* update dockerfile
* fix trainer tests
* remove apex
* style
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* Update tests/trainer/test_trainer_seq2seq.py
* update to torch 2.3
* add workflow dispatch target
* we may need branches: mi300-ci after all
* nit
* fix docker build
* nit
* add check runner
* remove docker-gpu
* fix issues
* fix
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2024-05-13 18:14:36 +02:00 |
|
Yih-Dar
|
fbb41cd420
|
consistent job / pytest report / artifact name correspondence (#30392)
* better names
* run better names
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2024-04-24 22:32:42 +02:00 |
|
Yih-Dar
|
440bd3c3c0
|
update github actions packages' version to suppress warnings (#30249)
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2024-04-15 15:08:09 +02:00 |
|
Yih-Dar
|
95346e9dcd
|
Add artifact name in job step to maintain job / artifact correspondence (#28682)
* avoid using job name
* apply to other files
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2024-01-31 15:58:17 +01:00 |
|
Patrick von Platen
|
cbbe30749b
|
[Whisper] Fix slow test (#28407)
* [Whisper] Fix slow test
* update
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2024-01-10 22:35:36 +01:00 |
|
Ella Charlaix
|
39acfe84ba
|
Add deepspeed test to amd scheduled CI (#27633)
* add deepspeed scheduled test for amd
* fix image
* add dockerfile
* add comment
* enable tests
* trigger
* remove trigger for this branch
* trigger
* change runner env to trigger the docker build image test
* use new docker image
* remove test suffix from docker image tag
* replace test docker image with original image
* push new image
* Trigger
* add back amd tests
* fix typo
* add amd tests back
* fix
* comment until docker image build scheduled test fix
* remove deprecated deepspeed build option
* upgrade torch
* update docker & make tests pass
* Update docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile
* fix
* tmp disable test
* precompile deepspeed to avoid timeout during tests
* fix comment
* trigger deepspeed tests with new image
* comment tests
* trigger
* add sklearn dependency to fix slow tests
* enable back other tests
* final update
---------
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: Félix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2023-12-11 16:33:36 +01:00 |
|
Yih-Dar
|
e0d2e69582
|
restructure AMD scheduled CI (#27743)
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2023-12-04 15:32:05 +01:00 |
|
fxmarty
|
f93c1e9ece
|
Add RoCm scheduled CI & upgrade RoCm CI to PyTorch 2.1 (#26940)
* add scheduled ci on amdgpu
* fix likely typo
* more tests, avoid parallelism
* precise comment
* fix report channel
* trigger docker build on this branch
* fix
* fix
* run rocm scheduled ci
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2023-11-21 14:55:13 +01:00 |
|