Zach Mueller
1211e616a4
Use inherit tempdir makers for tests + fix failing DS tests ( #35600 )
...
* Use existing APIs to make tempdir folders
* Fixup deepspeed too
* output_dir -> tmp_dir
2025-01-10 10:01:58 -05:00
amyeroberts
b7474f211d
Trainer - deprecate tokenizer for processing_class ( #32385 )
...
* Trainer - deprecate tokenizer for processing_class
* Extend chage across Seq2Seq trainer and docs
* Add tests
* Update to FutureWarning and add deprecation version
2024-10-02 14:08:46 +01:00
Zach Mueller
0b066bed14
Revert PR 32299, flag users when Zero-3 was missed ( #32851 )
...
Revert PR 32299
2024-08-16 12:35:41 -04:00
Zach Mueller
82efc53513
Yell at the user if zero-3 init wasn't performed, but expected to have been done ( #32299 )
...
* Test this zach
* Test for improper init w/o zero3
* Move back
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Get rid of stars in warning
* Make private
* Make clear
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-01 15:18:43 -04:00
Fanli Lin
25e5e3fa56
[tests] fix deepspeed zero3 config for test_stage3_nvme_offload
( #31881 )
...
fix config
2024-07-16 16:11:37 +02:00
amyeroberts
1de7dc7403
Skip tests properly ( #31308 )
...
* Skip tests properly
* [test_all]
* Add 'reason' as kwarg for skipTest
* [test_all] Fix up
* [test_all]
2024-06-26 21:59:08 +01:00
Fanli Lin
077c139f57
[tests] rename test_config_object
to test_ds_config_object
( #31403 )
...
fix name
2024-06-19 11:19:15 +02:00
Zach Mueller
60d5f8f9f0
🚨 🚨 🚨 Deprecate evaluation_strategy
to eval_strategy
🚨 🚨 🚨 ( #30190 )
...
* Alias
* Note alias
* Tests and src
* Rest
* Clean
* Change typing?
* Fix tests
* Deprecation versions
2024-04-18 12:49:43 -04:00
Sourab Mangrulkar
b262808656
fix failing trainer ds tests ( #29057 )
2024-02-16 17:18:45 +05:30
Lysandre Debut
f497f564bb
Update all references to canonical models ( #29001 )
...
* Script & Manual edition
* Update
2024-02-16 08:16:58 +01:00
Joao Gante
beb2a09687
DeepSpeed: hardcode torch.arange
dtype on float
usage to avoid incorrect initialization ( #28760 )
2024-01-31 14:39:07 +00:00
Xuehai Pan
976189a6df
Fix initialization for missing parameters in from_pretrained
under ZeRO-3 ( #28245 )
...
* Fix initialization for missing parameters in `from_pretrained` under ZeRO-3
* Test initialization for missing parameters under ZeRO-3
* Add more tests
* Only enable deepspeed context for per-module level parameters
* Enable deepspeed context only once
* Move class definition inside test case body
2024-01-09 14:58:21 +00:00
Ella Charlaix
39acfe84ba
Add deepspeed test to amd scheduled CI ( #27633 )
...
* add deepspeed scheduled test for amd
* fix image
* add dockerfile
* add comment
* enable tests
* trigger
* remove trigger for this branch
* trigger
* change runner env to trigger the docker build image test
* use new docker image
* remove test suffix from docker image tag
* replace test docker image with original image
* push new image
* Trigger
* add back amd tests
* fix typo
* add amd tests back
* fix
* comment until docker image build scheduled test fix
* remove deprecated deepspeed build option
* upgrade torch
* update docker & make tests pass
* Update docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile
* fix
* tmp disable test
* precompile deepspeed to avoid timeout during tests
* fix comment
* trigger deepspeed tests with new image
* comment tests
* trigger
* add sklearn dependency to fix slow tests
* enable back other tests
* final update
---------
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: Félix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-11 16:33:36 +01:00
Hz, Ji
c5d7754b11
device-agnostic deepspeed testing ( #27342 )
2023-11-09 12:34:13 +01:00
Sourab Mangrulkar
7ecd229ba4
Smangrul/fix failing ds ci tests ( #27358 )
...
* fix failing DeepSpeed CI tests due to `safetensors` being default
* debug
* remove debug statements
* resolve comments
* Update test_deepspeed.py
2023-11-09 11:47:24 +05:30
Sourab Mangrulkar
b477327394
fix the deepspeed tests ( #26021 )
...
* fix the deepspeed tests
* resolve comment
2023-09-13 10:26:53 +05:30
Sourab Mangrulkar
6bc517ccd4
deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler ( #25863 )
...
* Add support for deepspeed optimizer and HF scheduler
* fix bug
* fix the import
* fix issue with deepspeed scheduler saving for hf optim + hf scheduler scenario
* fix loading of hf scheduler when loading deepspeed checkpoint
* fix import of `DeepSpeedSchedulerWrapper`
* add tests
* add the comment and skip the failing tests
* address comment
2023-09-05 22:31:20 +05:30
Younes Belkada
4b79697865
🚨 🚨 🚨 [Refactor
] Move third-party related utility files into integrations/
folder 🚨 🚨 🚨 ( #25599 )
...
* move deepspeed to `lib_integrations.deepspeed`
* more refactor
* oops
* fix slow tests
* Fix docs
* fix docs
* addess feedback
* address feedback
* final modifs for PEFT
* fixup
* ok now
* trigger CI
* trigger CI again
* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* import from `integrations`
* address feedback
* revert removal of `deepspeed` module
* revert removal of `deepspeed` module
* fix conflicts
* ooops
* oops
* add deprecation warning
* place it on the top
* put `FutureWarning`
* fix conflicts with not_doctested.txt
* add back `bitsandbytes` module with a depr warning
* fix
* fix
* fixup
* oops
* fix doctests
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-08-25 17:13:34 +02:00
Sourab Mangrulkar
a73b1d59a3
accelerate deepspeed and gradient accumulation integrate ( #23236 )
...
* mixed precision support via accelerate
* fix issues
* fix for the sharded ddp case
* fix flax and tf failing tests
* `refactor the place to create `Accelerator` object
* move ddp prep to accelerate
* fix 😅
* resolving comments
* move fsdp handling to accelerate
* fixex
* fix saving
* shift torch dynamo handling to accelerate
* shift deepspeed integration and save & load utils to accelerate
* fix accelerate launcher support
* oops
* fix 🐛
* save ckpt fix
* Trigger CI
* nasty 🐛 😅
* as deepspeed needs grad_acc fixes, transfer grad_acc to accelerate
* make tests happy
* quality ✨
* loss tracked needs to account for grad_acc
* fixing the deepspeed tests
* quality ✨
* 😅 😅 😅
* tests 😡
* quality ✨
* Trigger CI
* resolve comments and fix the issue with the previous merge from branch
* Trigger CI
* accelerate took over deepspeed integration
---------
Co-authored-by: Stas Bekman <stas@stason.org>
2023-05-31 15:16:22 +05:30
Yih-Dar
fe1f5a639d
Fix decorator order ( #22708 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-04-11 17:59:15 +02:00
Stas Bekman
ec24132b6c
[deepspeed] offload + non-cpuadam optimizer exception ( #22043 )
...
* [deepspeed] offload + non-cpuadam optimizer exception
* flip
* revert min version
2023-03-09 08:12:57 -08:00
Stas Bekman
633062639b
[deepspeed tests] fix issues introduced by #21700 ( #21769 )
...
* [deepspeed tests] fix issues introduced by #21700
* fix
* fix
2023-02-23 13:22:25 -08:00
Aaron Gokaslan
5e8c8eb5ba
Apply ruff flake8-comprehensions ( #21694 )
2023-02-22 09:14:54 +01:00
Stas Bekman
8ea994d3c5
[tests] add missing report_to none
( #21505 )
...
[tests] report_to none
2023-02-08 09:32:40 -08:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting ( #21480 )
...
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
2023-02-06 18:10:56 -05:00
Sylvain Gugger
36d4647993
Refine Bf16 test for deepspeed ( #17734 )
...
* Refine BF16 check in CPU/GPU
* Fixes
* Renames
2022-06-16 11:27:58 -04:00
Stas Bekman
d28b7aa8cb
[deepspeed / testing] reset global state ( #17553 )
...
* [deepspeed] fix load_best_model test
* [deepspeed] add state reset on unittest tearDown
2022-06-06 07:49:25 -07:00
Stas Bekman
26e5e129b4
[deepspeed] fix load_best_model test ( #17550 )
2022-06-03 11:19:03 -07:00
Stas Bekman
2f59ad1609
[trainer/deepspeed] load_best_model (reimplement re-init) ( #17151 )
...
* [trainer/deepspeed] load_best_model
* to sync with DS PR #1947
* simplify
* rework load_best_model test
* cleanup
* bump deepspeed>=0.6.5
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-06-02 09:14:21 -07:00
Stas Bekman
f861504466
[Deepspeed] add many more models to the model zoo test ( #12695 )
...
* model zoo take 2
* add deberta
* new param for zero2
* doc update
* doc update
* add layoutlm
* bump deepspeed
* add deberta-v2, funnel, longformer
* new models
* style
* add t5_v1
* update TAPAS status
* reorg problematic models
* move doc to another PR
* style
* fix checkpoint check test
* making progress on more models running
* cleanup
* new version
* cleanup
2022-05-10 08:22:42 -07:00
Stas Bekman
ce2fef2ad2
[trainer / deepspeed] fix hyperparameter_search ( #16740 )
...
* [trainer / deepspeed] fix hyperparameter_search
* require optuna
* style
* oops
* add dep in the right place
* create deepspeed-testing dep group
* Trigger CI
2022-04-14 17:24:38 -07:00
Sylvain Gugger
4975002df5
Reorganize file utils ( #16264 )
...
* Split file_utils in several submodules
* Fixes
* Add back more objects
* More fixes
* Who exactly decided to import that from there?
* Second suggestion to code with code review
* Revert wront move
* Fix imports
* Adapt all imports
* Adapt all imports everywhere
* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
Stas Bekman
580dd87c55
[Deepspeed] add support for bf16 mode ( #14569 )
...
* [WIP] add support for bf16 mode
* prep for bf16
* prep for bf16
* fix; zero2/bf16 is ok
* check bf16 is available
* test fixes
* enable zero3_bf16
* config files
* docs
* split stage_dtype; merge back to non-dtype-specific config file
* fix doc
* cleanup
* cleanup
* bfloat16 => bf16 to match the PR changes
* s/zero_gather_fp16_weights_on_model_save/zero_gather_16bit_weights_on_model_save/; s/save_fp16_model/save_16bit_model/
* test fixes/skipping
* move
* fix
* Update docs/source/main_classes/deepspeed.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* backticks
* cleanup
* cleanup
* cleanup
* new version
* add note about grad accum in bf16
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-11 17:53:53 -08:00
Stas Bekman
b842d7277a
fix deepspeed tests ( #15881 )
...
* fix deepspeed tests
* style
* more fixes
2022-03-01 19:27:28 -08:00
Lysandre Debut
29c10a41d0
[Test refactor 1/5] Per-folder tests reorganization ( #15725 )
...
* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-02-23 15:46:28 -05:00
Stas Bekman
4f5faaf044
[deepspeed] fix a bug in a test ( #15493 )
...
* [deepspeed] fix a bug in a test
* consistency
2022-02-03 08:55:45 -08:00
Stas Bekman
b66c5ab20c
[deepspeed] fix --load_best_model_at_end ( #14652 )
...
* [deepspeed] fix load_best_model_at_end
* try with pull_request_target
* revert: try with pull_request_target
* style
* add test
* cleanup
2021-12-06 21:57:47 -08:00
Stas Bekman
956a483173
[deepspeed] zero inference ( #14253 )
...
* [deepspeed] zero inference
* only z3 makes sense for inference
* fix and style
* docs
* rework
* fix test
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* responding to suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-11-23 14:09:15 -08:00
Stas Bekman
1c76a51615
solve the port conflict ( #14362 )
2021-11-10 19:11:45 -08:00
Jeff Rasley
d0e96c6de6
[deepspeed] Enable multiple test runs on single box, defer to DS_TEST_PORT if set ( #14331 )
...
* defer to DS_TEST_PORT if set
* style
Co-authored-by: Stas Bekman <stas@stason.org>
2021-11-08 12:40:29 -08:00
Olatunji Ruwase
42f359d015
Use DS callable API to allow hf_scheduler + ds_optimizer ( #13216 )
...
* Use DS callable API to allow hf_scheduler + ds_optimizer
* Preserve backward-compatibility
* Restore backward compatibility
* Tweak arg positioning
* Tweak arg positioning
* bump the required version
* Undo indent
* Update src/transformers/trainer.py
* style
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-08-30 10:01:06 -07:00
Stas Bekman
98364ea74f
[tests] fix logging_steps requirements ( #12860 )
2021-07-23 08:05:48 -07:00
Stas Bekman
5dd0c956a8
non-native optimizers are mostly ok with zero-offload ( #12690 )
2021-07-13 20:18:51 -07:00
Stas Bekman
78f5fe1416
[Deepspeed] adapt multiple models, add zero_to_fp32 tests ( #12477 )
...
* zero_to_fp32 tests
* args change
* remove unnecessary work
* use transformers.trainer_utils.get_last_checkpoint
* document the new features
* cleanup
* wip
* fix fsmt
* add bert
* cleanup
* add xlm-roberta
* electra works
* cleanup
* sync
* split off the model zoo tests
* cleanup
* cleanup
* cleanup
* cleanup
* reformat
* cleanup
* casing
* deepspeed>=0.4.3
* adjust distilbert
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-13 12:07:32 -07:00
Stas Bekman
ebe5413589
[trainer] 2 bug fixes and a rename ( #12309 )
...
* bug fixes and a rename
* add extended DDP test
2021-06-22 11:13:23 -07:00
Stas Bekman
11d86d3de4
[Deepspeed Wav2vec2] integration ( #11638 )
...
* wip
* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044
* cleanup
* workaround
* working 5/8 modes
* solve fp32 distributed zero3
* style
* sync
* sync
* rework
* deprecation
* cleanup
* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged
* clean up
* add a guide
* more prose
* more prose
* fix
* more prose
* sub_group_size was too big
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor
* bug fix
* make the true check explicit
* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 12:32:03 -07:00
Stas Bekman
32290d87f6
[Deepspeed] various fixes ( #12058 )
...
* replace deprecated config
* sub_group_size was too big
* complete deprecation removal
2021-06-08 08:36:15 -07:00
Stas Bekman
2c73b93099
[Deepspeed] Assert on mismatches between ds and hf args ( #12021 )
...
* wip
* add mismatch validation + test
* renames
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-04 08:58:23 -07:00
Stas Bekman
61c5063491
[deepspeed] add nvme test skip rule ( #11997 )
...
* add nvme skip rule
* fix
2021-06-02 12:06:37 -07:00
Stas Bekman
640318befa
[deepspeed] Move code and doc into standalone files ( #11984 )
...
* move code and docs
* style
* moved
* restore
2021-06-02 09:56:00 -07:00