Commit Graph

8 Commits

Author SHA1 Message Date
Philipp Schmid
f689743e74
SageMaker: Fix sagemaker DDP & metric logs (#13181)
* Barrier -> barrier

* added logger for metrics

* removed stream handler in trainer

* moved handler

* removed streamhandler from trainer

* updated test image and instance type added datasets version to test

* Update tests/sagemaker/scripts/pytorch/requirements.txt

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-08-23 10:18:07 +02:00
Bhadresh Savani
04dbea31a9
[Examples] Added context manager to datasets map (#12367)
* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc
2021-06-28 09:14:00 -07:00
Stas Bekman
4a872caef4
remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
Sylvain Gugger
f086652b16
Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training

* Use an alternate property
2021-05-25 08:03:43 -04:00
Stas Bekman
c9035e4537
fix: The 'warn' method is deprecated (#11105)
* The 'warn' method is deprecated

* fix test
2021-04-07 09:20:06 -04:00
Sylvain Gugger
cd56f3fe7e
Merge trainers (#10975)
* Replace is_sagemaker_distributed_available

* Merge SageMakerTrainer into Trainer

* Test with shorter condition

* Put back deleted line

* Deprecate SageMakerTrainer and SageMakerTrainingArguments

* Apply suggestions from code review

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2021-03-31 10:01:30 -04:00
Sylvain Gugger
acc3bd9d2a
Enforce string-formatting with f-strings (#10980)
* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-31 10:00:27 -04:00
Philipp Schmid
604c085087
Sagemaker test (#10925)
* init

* first working test

* added todo for setup.py

* working test for single node multi node ddp and smd

* added tensorflow single node test

* added directory for pytorch and tensorflow due to different requirements.txt

* added directory for pytorch and tensorflow

* added comment for run_glue until it is available

* added output_dir to it

* smaller dataset to make test running faster

* adjust HP and script

* adjusted parameter for tensorflow

* refactored test scripts

* adjusted make file

* init

* first working test

* added todo for setup.py

* working test for single node multi node ddp and smd

* added tensorflow single node test

* added directory for pytorch and tensorflow due to different requirements.txt

* added directory for pytorch and tensorflow

* added comment for run_glue until it is available

* added output_dir to it

* smaller dataset to make test running faster

* adjust HP and script

* adjusted parameter for tensorflow

* refactored test scripts

* adjusted make file

* updated dlc container

* commented in all tests

* added both ecr images

* added new master branches

* debug

* added new datasets version

* init

* strange rebase bug

* removed changes

* changed min version for tests to work

* updated DLC

* added model parallel test

* removed test files

* removed test files

* tested with ned dlc

* added correct sagemaker sdk version

* adjust DLCs for official one

* reworked tests

* quality

* removed default profile added documentation to it

* added step in release for sagemaker tests

* reverted version for example script removed duplicated script and added install from master to requirements.txt

* removed mistaken .DS_Stores from mac

* fixed tests

* added Sylvains feedback

* make style

* added lysandre's feedback
2021-03-30 08:28:02 +02:00