Sourab Mangrulkar
|
238d2e3c44
|
fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891)
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT
* update tests
* fix tests
|
2023-12-16 19:41:43 +05:30 |
|
Hz, Ji
|
82c7e87987
|
device agnostic fsdp testing (#27120)
* make fsdp test cases device agnostic
* make style
|
2023-11-01 07:17:06 +01:00 |
|
Yih-Dar
|
3e93dd295b
|
Skip TrainerIntegrationFSDP::test_basic_run_with_cpu_offload if torch < 2.1 (#26764)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2023-10-12 18:22:09 +02:00 |
|
Sourab Mangrulkar
|
86ffd5ffa2
|
fix name error when accelerate is not available (#26278)
* fix name error when accelerate is not available
* fix `is_fsdp_available`
|
2023-09-20 08:02:55 +02:00 |
|
Sourab Mangrulkar
|
382ba670ed
|
FSDP tests and checkpointing fixes (#26180)
* add fsdp tests
* Update test_fsdp.py
* Update test_fsdp.py
* fixes
* checks
* Update trainer.py
* fix
* fixes for saving/resuming checkpoints
* fixes
* add tests and delete debug statements
* fixing tests
* Update test_fsdp.py
* fix tests
* fix tests
* minor nits
* fix code style and quality
* refactor and modularize test code
* reduce the time of tests
* reduce the test time
* fix test
* reduce test time
* reduce test time
* fix failing tests
* fix
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* resolve comments
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
2023-09-20 10:26:16 +05:30 |
|