Howard Liberty
|
f16caf44bb
|
Add FSDP config for CPU RAM efficient loading through accelerate (#30002)
* Add FSDP config for CPU RAM efficient loading
* Style fix
* Update src/transformers/training_args.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add sync_module_states and cpu_ram_efficient_loading validation logic
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Style
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
|
2024-04-22 13:15:28 +01:00 |
|
Zach Mueller
|
60d5f8f9f0
|
🚨🚨🚨Deprecate evaluation_strategy to eval_strategy 🚨🚨🚨 (#30190)
* Alias
* Note alias
* Tests and src
* Rest
* Clean
* Change typing?
* Fix tests
* Deprecation versions
|
2024-04-18 12:49:43 -04:00 |
|
Sourab Mangrulkar
|
350c5d1566
|
Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA (#29587)
* fsdp+qlora related changes
* fixes
* Update quantization_config.py
* support fsdp+qlora and dsz3+qlora
* Update quantization_config.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* handle fsdp+qlora and dsz3+qlora correctly while model loading
* fix param count
* quality
* fsdp related changes
* fsdp changes only when using LoRA/QLoRA
* add accelerate version check
* refactor, update min accelerate version and add tests
1. Update minimum accelerate version to 0.26.0
2. Clean the trainer wrt accelerate version checks
3. FSDP refactor and test for fsdp config
4. use `itemsize` instead of `dtype2bytes` dict
* fix test
* Address comments
Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* fix the conditional flag
* fix conditional flag
* address comments
Co-Authored-By: Zach Mueller <7831895+muellerzr@users.noreply.github.com>
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Zach Mueller <7831895+muellerzr@users.noreply.github.com>
|
2024-03-13 22:03:02 +05:30 |
|
Lysandre Debut
|
f497f564bb
|
Update all references to canonical models (#29001)
* Script & Manual edition
* Update
|
2024-02-16 08:16:58 +01:00 |
|
Sourab Mangrulkar
|
238d2e3c44
|
fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891)
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT
* update tests
* fix tests
|
2023-12-16 19:41:43 +05:30 |
|
Hz, Ji
|
82c7e87987
|
device agnostic fsdp testing (#27120)
* make fsdp test cases device agnostic
* make style
|
2023-11-01 07:17:06 +01:00 |
|
Yih-Dar
|
3e93dd295b
|
Skip TrainerIntegrationFSDP::test_basic_run_with_cpu_offload if torch < 2.1 (#26764)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
|
2023-10-12 18:22:09 +02:00 |
|
Sourab Mangrulkar
|
86ffd5ffa2
|
fix name error when accelerate is not available (#26278)
* fix name error when accelerate is not available
* fix `is_fsdp_available`
|
2023-09-20 08:02:55 +02:00 |
|
Sourab Mangrulkar
|
382ba670ed
|
FSDP tests and checkpointing fixes (#26180)
* add fsdp tests
* Update test_fsdp.py
* Update test_fsdp.py
* fixes
* checks
* Update trainer.py
* fix
* fixes for saving/resuming checkpoints
* fixes
* add tests and delete debug statements
* fixing tests
* Update test_fsdp.py
* fix tests
* fix tests
* minor nits
* fix code style and quality
* refactor and modularize test code
* reduce the time of tests
* reduce the test time
* fix test
* reduce test time
* reduce test time
* fix failing tests
* fix
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* resolve comments
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
2023-09-20 10:26:16 +05:30 |
|