Stas Bekman
98364ea74f
[tests] fix logging_steps requirements ( #12860 )
2021-07-23 08:05:48 -07:00
Stas Bekman
5dd0c956a8
non-native optimizers are mostly ok with zero-offload ( #12690 )
2021-07-13 20:18:51 -07:00
Stas Bekman
78f5fe1416
[Deepspeed] adapt multiple models, add zero_to_fp32 tests ( #12477 )
...
* zero_to_fp32 tests
* args change
* remove unnecessary work
* use transformers.trainer_utils.get_last_checkpoint
* document the new features
* cleanup
* wip
* fix fsmt
* add bert
* cleanup
* add xlm-roberta
* electra works
* cleanup
* sync
* split off the model zoo tests
* cleanup
* cleanup
* cleanup
* cleanup
* reformat
* cleanup
* casing
* deepspeed>=0.4.3
* adjust distilbert
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-13 12:07:32 -07:00
Stas Bekman
ebe5413589
[trainer] 2 bug fixes and a rename ( #12309 )
...
* bug fixes and a rename
* add extended DDP test
2021-06-22 11:13:23 -07:00
Stas Bekman
11d86d3de4
[Deepspeed Wav2vec2] integration ( #11638 )
...
* wip
* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044
* cleanup
* workaround
* working 5/8 modes
* solve fp32 distributed zero3
* style
* sync
* sync
* rework
* deprecation
* cleanup
* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged
* clean up
* add a guide
* more prose
* more prose
* fix
* more prose
* sub_group_size was too big
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor
* bug fix
* make the true check explicit
* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 12:32:03 -07:00
Stas Bekman
32290d87f6
[Deepspeed] various fixes ( #12058 )
...
* replace deprecated config
* sub_group_size was too big
* complete deprecation removal
2021-06-08 08:36:15 -07:00
Stas Bekman
2c73b93099
[Deepspeed] Assert on mismatches between ds and hf args ( #12021 )
...
* wip
* add mismatch validation + test
* renames
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-04 08:58:23 -07:00
Stas Bekman
61c5063491
[deepspeed] add nvme test skip rule ( #11997 )
...
* add nvme skip rule
* fix
2021-06-02 12:06:37 -07:00
Stas Bekman
640318befa
[deepspeed] Move code and doc into standalone files ( #11984 )
...
* move code and docs
* style
* moved
* restore
2021-06-02 09:56:00 -07:00
Stas Bekman
7ec596ecda
[DeepSpeed] decouple DeepSpeedConfigHF
from Trainer
( #11966 )
...
* decouple DeepSpeedConfigHF from Trainer
* add LoggingLevel ctx manager; add new test
* cleanup
* add docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* implemented suggested renames
* formatter workaround
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-01 13:24:52 -07:00
Stas Bekman
a26f4d6208
[Deepspeed] support zero.Init
in from_config
( #11805 )
...
* support zero.Init in from_config
* no need for eval test
2021-05-21 09:07:46 -07:00
Stas Bekman
619200cc42
[cuda ext tests] fixing tests ( #11619 )
...
* fixing tests
* cleanup
2021-05-06 13:35:28 -07:00
Stas Bekman
4e7bf94e72
[DeepSpeed] fp32 support ( #11499 )
...
* prep for deepspeed==0.3.16
* new version
* too soon
* support and test fp32 mode
* troubleshooting doc start
* workaround no longer needed
* add fp32 doc
* style
* cleanup, add tf32 note
* clarify
* release was made
2021-04-30 12:51:48 -07:00
Stas Bekman
bc2571e61c
[Deepspeed] ZeRO-Infinity integration plus config revamp ( #11418 )
...
* adding Z-inf
* revamp config process
* up version requirement
* wip
* massive rewrite
* cleanup
* cleanup
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* consistent json commas
* act on suggestions
* leave this feature for 0.3.16
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-26 10:40:32 -07:00
Bhadresh Savani
1d30ec95c7
[Examples] Fixes inconsistency around eval vs val and predict vs test ( #11380 )
...
* added changes for uniformity
* modified files
* corrected typo
* fixed qa scripts
* fix typos
* fixed predict typo in qa no trainer
* fixed test file
* reverted trainer changes
* reverted trainer changes in custom exmaples
* updated readme
* added changes in deepspeed test
* added changes for predict and eval
2021-04-26 09:24:31 -07:00
Patrick von Platen
32dbb2d954
make style ( #11442 )
2021-04-26 13:50:34 +02:00
Sylvain Gugger
dabeb15292
Examples reorg ( #11350 )
...
* Base move
* Examples reorganization
* Update references
* Put back test data
* Move conftest
* More fixes
* Move test data to test fixtures
* Update path
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00
Stas Bekman
83206ca6a8
[deepspeed] test on one node 2 gpus max ( #11237 )
...
* test on one node 2 gpus max
* fix the other place
* refactor
* fix
* cleanup
* more exact version
2021-04-14 11:06:59 -07:00
Stas Bekman
3d339ee659
[Deepspeed] zero3 tests band aid ( #11235 )
...
* temp band-aid
* style
2021-04-13 17:58:09 -04:00
Stas Bekman
66446909b2
[tests] relocate core integration tests ( #11146 )
...
* relocate core integration tests
* add sys.path context manager
* cleanup
* try
* try2
* fix path
* doc
* style
* add dep
* add 2 more deps
2021-04-08 13:13:17 -07:00