Yih-Dar
3d34b92116
Switch to use A10 progressively ( #38936 )
...
* try
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 16:10:35 +00:00
Yih-Dar
5009252a05
Better CI ( #38552 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
better CI
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 17:59:14 +02:00
Yih-Dar
eb74cf977b
Use one utils/notification_service.py
( #38379 )
...
* step 1
* step 2
* step 3
* step 4
* step 5
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 16:15:29 +02:00
Yih-Dar
d0c9c66d1c
new failure CI reports for all jobs ( #38298 )
...
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run
Check Tiny Models / Check tiny models (push) Has been cancelled
* new failures
* report_repo_id
* report_repo_id
* report_repo_id
* More fixes
* More fixes
* More fixes
* ruff
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-24 19:15:02 +02:00
Yih-Dar
7819911b0c
Use T4 single GPU runner with more CPU RAM ( #37961 )
...
larger T4 single GPU
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-05 16:17:45 +02:00
Yih-Dar
4f139f5a50
Send trainer/fsdp/deepspeed CI job reports to a single channel ( #37411 )
...
* send trainer/fsdd/deepspeed channel
* update
* change name
* no .
* final
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-10 13:17:31 +02:00
Yih-Dar
8064cd9b4f
fix deepspeed job ( #37284 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-08 15:19:33 +02:00
Joao Gante
0863eef248
[tests] remove pt_tf
equivalence tests ( #36253 )
2025-02-19 11:55:11 +00:00
Stas Bekman
9dc1efa5d4
DeepSpeed github repo move sync ( #36021 )
...
deepspeed github repo move
2025-02-05 08:19:31 -08:00
Yih-Dar
fce1fcfe71
Ping team members for new failed tests in daily CI ( #34171 )
...
* ping
* fix
* fix
* fix
* remove runner
* update members
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-17 16:11:52 +02:00
Yih-Dar
75c878da1e
Update daily ci to use new cluster ( #33627 )
...
* update
* re-enable daily CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-09-20 21:05:30 +02:00
Yih-Dar
5334b61c33
Revive AMD scheduled CI ( #33448 )
...
Revive AMD scheduled CI
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-09-12 15:52:15 +02:00
Yih-Dar
d4564df1d4
Revive Nightly/Past CI ( #31159 )
...
* build
* build
* build
* build
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-20 18:57:24 +02:00
Yih-Dar
fbb41cd420
consistent job / pytest report / artifact name correspondence ( #30392 )
...
* better names
* run better names
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-04-24 22:32:42 +02:00
Yih-Dar
440bd3c3c0
update github actions packages' version to suppress warnings ( #30249 )
...
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-04-15 15:08:09 +02:00
Marc Sun
bb76f81e40
[CI] Quantization workflow fix ( #30158 )
...
* fix workflow
* call ci
* Update .github/workflows/self-scheduled-caller.yml
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-04-10 11:51:06 +02:00
Marc Sun
6cdbd73e01
[CI] Fix setup ( #30147 )
...
* [CI] fix setup
* fix
* test
* Revert "test"
This reverts commit 7df416d450
.
2024-04-09 18:10:00 +02:00
Marc Sun
58a939c6b7
Fix quantization tests ( #29914 )
...
* revert back to torch 2.1.1
* run test
* switch to torch 2.2.1
* udapte dockerfile
* fix awq tests
* fix test
* run quanto tests
* update tests
* split quantization tests
* fix
* fix again
* final fix
* fix report artifact
* build docker again
* Revert "build docker again"
This reverts commit 399a5f9d93
.
* debug
* revert
* style
* new notification system
* testing notfication
* rebuild docker
* fix_prev_ci_results
* typo
* remove warning
* fix typo
* fix artifact name
* debug
* issue fixed
* debug again
* fix
* fix time
* test notif with faling test
* typo
* issues again
* final fix ?
* run all quantization tests again
* remove name to clear space
* revert modfiication done on workflow
* fix
* build docker
* build only quant docker
* fix quantization ci
* fix
* fix report
* better quantization_matrix
* add print
* revert to the basic one
2024-04-09 17:10:29 +02:00
Yih-Dar
b17b54d3dd
Refactor daily CI workflow ( #30012 )
...
* separate jobs
* separate jobs
* use channel name directly instead of ID
* use channel name directly instead of ID
* use channel name directly instead of ID
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-04-05 15:49:51 +02:00
Marc Sun
f54d82cace
[CI] Quantization workflow ( #29046 )
...
* [CI] Quantization workflow
* build dockerfile
* fix dockerfile
* update self-cheduled.yml
* test build dockerfile on push
* fix torch install
* udapte to python 3.10
* update aqlm version
* uncomment build dockerfile
* tests if the scheduler works
* fix docker
* do not trigger on psuh again
* add additional runs
* test again
* all good
* style
* Update .github/workflows/self-scheduled.yml
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* test build dockerfile with torch 2.2.0
* fix extra
* clean
* revert changes
* Revert "revert changes"
This reverts commit 4cb52b8822
.
* revert correct change
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-28 10:09:25 -05:00
Yih-Dar
93f8617afd
Use DS_DISABLE_NINJA=1
( #29290 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-02-26 17:41:01 +08:00
Yih-Dar
4735866141
Split daily CI using 2 level matrix ( #28773 )
...
* update / add new workflow files
* Add comment
* Use env.NUM_SLICES
* use scripts
* use scripts
* use scripts
* Fix
* using one script
* Fix
* remove unused file
* update
* fail-fast: false
* remove unused file
* fix
* fix
* use matrix
* inputs
* style
* update
* fix
* fix
* no model name
* add doc
* allow args
* style
* pass argument
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-31 18:04:43 +01:00
Yih-Dar
95346e9dcd
Add artifact name in job step to maintain job / artifact correspondence ( #28682 )
...
* avoid using job name
* apply to other files
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-31 15:58:17 +01:00
Patrick von Platen
cbbe30749b
[Whisper] Fix slow test ( #28407 )
...
* [Whisper] Fix slow test
* update
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-10 22:35:36 +01:00
Ella Charlaix
39acfe84ba
Add deepspeed test to amd scheduled CI ( #27633 )
...
* add deepspeed scheduled test for amd
* fix image
* add dockerfile
* add comment
* enable tests
* trigger
* remove trigger for this branch
* trigger
* change runner env to trigger the docker build image test
* use new docker image
* remove test suffix from docker image tag
* replace test docker image with original image
* push new image
* Trigger
* add back amd tests
* fix typo
* add amd tests back
* fix
* comment until docker image build scheduled test fix
* remove deprecated deepspeed build option
* upgrade torch
* update docker & make tests pass
* Update docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile
* fix
* tmp disable test
* precompile deepspeed to avoid timeout during tests
* fix comment
* trigger deepspeed tests with new image
* comment tests
* trigger
* add sklearn dependency to fix slow tests
* enable back other tests
* final update
---------
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: Félix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-11 16:33:36 +01:00
Yih-Dar
9f1f11a2e7
Show new failing tests in a more clear way in slack report ( #27881 )
...
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-07 15:09:30 +01:00
Yih-Dar
64e21ca2a4
Make some jobs run on the GitHub Actions runners ( #27512 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-15 10:43:16 +01:00
Yih-Dar
00dc856233
At most 2 GPUs for CI ( #27435 )
...
At most 2 GPUs
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-10 16:19:06 +01:00
Yih-Dar
9dc4ce9ea7
Disable CI runner check ( #27170 )
...
Disable runner check
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-31 11:59:21 +01:00
Yih-Dar
6ae71ec836
Update runs-on
in workflow files ( #26435 )
...
* update
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-09-27 19:25:52 +02:00
Yih-Dar
11cb6e0f7e
Unpin DeepSpeed and require DS >= 0.9.3 ( #24541 )
...
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-28 14:01:22 +02:00
Yih-Dar
7631db0fdc
Pin deepspeed
to 0.9.2
for now ( #24024 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-05 20:00:28 +02:00
Yih-Dar
e69feab8a1
Update workflow files ( #23658 )
...
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-05-22 21:26:51 +02:00
Yih-Dar
aa4316757d
Change schedule CI time ( #22884 )
...
* fix
* Update .github/workflows/self-nightly-past-ci-caller.yml
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-04-20 14:01:08 +02:00
Yih-Dar
648bd5a8aa
Show diff between 2 CI runs on Slack reports ( #22798 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-04-19 19:27:37 +02:00
Yih-Dar
656d41ab4c
Remove DS_BUILD_AIO=1
( #22741 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-04-13 18:08:22 +02:00
Yih-Dar
0fe6c6bdca
(Re-)Enable Nightly + Past CI ( #22393 )
...
* Enable Nightly + Past CI
* put schedule
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-30 21:06:35 +02:00
Yih-Dar
aab895c396
Make Slack CI reporting stronger ( #21823 )
...
* Use token
* Avoid failure
* better error
* Fix
* fix style
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-02-28 17:12:44 +01:00
Yih-Dar
bf9a5882a7
Update some GH action versions ( #20537 )
...
* update actions versions
* update actions versions
* update actions versions
* update actions versions
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-12-06 16:54:40 +01:00
Yih-Dar
67d32f4649
Replace set-output
by $GITHUB_OUTPUT
( #20547 )
...
* remove set-output
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-12-05 18:25:13 +01:00
Yih-Dar
e8d448edcf
extract warnings in GH workflows ( #20487 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-29 15:58:54 +01:00
Yih-Dar
700e0cd65f
Add missing report button for Example test ( #20293 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-17 15:55:00 +01:00
Yih-Dar
c06d555647
Show installed libraries and their versions in GA jobs ( #20069 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:03:18 +01:00
Sylvain Gugger
9ac586b3c8
Rework pipeline tests ( #19366 )
...
* Rework pipeline tests
* Try to fix Flax tests
* Try to put it before
* Use a new decorator instead
* Remove ignore marker since it doesn't work
* Filter pipeline tests
* Woopsie
* Use the fitlered list
* Clean up and fake modif
* Remove init
* Revert fake modif
2022-10-07 18:01:58 -04:00
Yih-Dar
ba7f2173cc
Add runner availability check ( #19054 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-19 12:27:06 +02:00
Yih-Dar
7a8118947f
Add checks for more workflow jobs ( #18905 )
...
* add check for scheduled CI
* Add check to other CIs
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-07 12:51:37 +02:00
Yih-Dar
7d5fde991d
unpin slack_sdk version ( #18901 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-06 18:42:00 +02:00
Yih-Dar
ecdf9b06bc
Remove cached torch_extensions on CI runners ( #18868 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-02 18:17:58 +02:00
Yih-Dar
0ab465a5d2
pin Slack SDK to 3.18.1 to avoid failing issue ( #18869 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-02 16:49:08 +02:00
Yih-Dar
d2704c4143
Add machine type in the artifact of Examples directory job ( #18459 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-04 18:52:01 +02:00