Commit Graph

18 Commits

Author SHA1 Message Date
Yuan Wu
a41b6d9b5c
Fix the fsdp config cannot work issue. (#37549)
* Fix the fsdp config cannot work issue.

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Check the fsdp_config type

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add the accelerate_fsdp_config test

Signed-off-by: yuanwu <yuan.wu@intel.com>

* fix error of make style

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add key check

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 10:44:51 +02:00
cyyever
371c44d0ef
Remove old code for PyTorch, Accelerator and tokenizers (#37234)
* Remove unneeded library version checks

Signed-off-by: cyy <cyyever@outlook.com>

* Remove PyTorch condition

Signed-off-by: cyy <cyyever@outlook.com>

* Remove PyTorch condition

Signed-off-by: cyy <cyyever@outlook.com>

* Fix ROCm get_device_capability

Signed-off-by: cyy <cyyever@outlook.com>

* Revert "Fix ROCm get_device_capability"

This reverts commit 0e756434bd.

* Remove unnecessary check

Signed-off-by: cyy <cyyever@outlook.com>

* Revert changes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-04-10 20:54:21 +02:00
Yih-Dar
e7ad077012
byebye torch 2.0 (#37277)
* bump Torch 2.1 with broken compatibility `torch.compile`

* dep table

* remove usage of is_torch_greater_or_equal_than_2_1

* remove usage of is_torch_greater_or_equal_than_2_1

* remove if is_torch_greater_or_equal("2.1.0")

* remove torch >= "2.1.0"

* deal with 2.0.0

* PyTorch 2.0+ --> PyTorch 2.1+

* ruff 1

* difficult ruff

* address comment

* address comment

---------

Co-authored-by: Jirka B <j.borovec+github@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-07 15:19:47 +02:00
byi8220
a4e55fcff8
Disable delay_optimizer_creation in Trainer to support fsdp2 (#37147)
* github why you do this

* fix

* make fixup

* disable cpu offload test

* fixup

* tmp reworks

* git branch movement

* make fixup

* add require_fsdp_v2_version

* dep issues

* update ruff and fixup
2025-04-04 20:11:37 +02:00
Ilyas Moutawwakil
89f6956015
HPU support (#36424)
* test

* fix

* fix

* skip some and run some first

* test fsdp

* fix

* patches for generate

* test distributed

* copy

* don't test distributed loss for hpu

* require fp16 and run first

* changes from marc's PR fixing zero3

* better alternative

* return True when fp16 support on gaudi without creating bridge

* fix

* fix tested dtype in deepspeed inference test

* test

* fix

* test

* fix

* skip

* require fp16

* run first fsdp

* Apply suggestions from code review

* address comments

* address comments and refactor test

* reduce precison

* avoid doing gaudi1 specific stuff in the genreation loop

* document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
2025-03-12 09:08:12 +01:00
Fanli Lin
2fa876d2d8
[tests] make cuda-only tests device-agnostic (#35607)
* intial commit

* remove unrelated files

* further remove

* Update test_trainer.py

* fix style
2025-01-13 14:48:39 +01:00
Wing Lian
b0c0ba7b4d
FSDP grad accum fix (#34645)
* add gradient accumulation steps tests for fsdp

* invert no_sync context to fix training for fsdp
2024-11-15 22:28:06 +01:00
Marc Sun
fd06ad5438
🚨🚨🚨 Update min version of accelerate to 0.26.0 (#32627)
* Update min version of accelerate to 0.26.0

* dev-ci

* update min version in import

* remove useless check

* dev-ci

* style

* dev-ci

* dev-ci
2024-08-20 11:42:36 +02:00
Younes Belkada
3f93fd0694
Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP (#31161)
* fix llama fsdp

* fixup

* adding FSDP tests for CPU offloading

* fixes

* fix tests

* fix tests

* add it for mixtral

* propagate the changes on other models

* Update src/transformers/models/phi/modeling_phi.py

* Delete utils/testing_scripts/fsdp_cpu_offloading.py

Remove script - FSDP + CPU offloading it tested in the test suite

* Delete utils/testing_scripts/dummy_fsdp_config.yml

* Update + add cache_positions docstring

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-26 14:50:08 +01:00
Howard Liberty
f16caf44bb
Add FSDP config for CPU RAM efficient loading through accelerate (#30002)
* Add FSDP config for CPU RAM efficient loading

* Style fix

* Update src/transformers/training_args.py

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add sync_module_states and cpu_ram_efficient_loading validation logic

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Style

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-22 13:15:28 +01:00
Zach Mueller
60d5f8f9f0
🚨🚨🚨Deprecate evaluation_strategy to eval_strategy🚨🚨🚨 (#30190)
* Alias

* Note alias

* Tests and src

* Rest

* Clean

* Change typing?

* Fix tests

* Deprecation versions
2024-04-18 12:49:43 -04:00
Sourab Mangrulkar
350c5d1566
Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA (#29587)
* fsdp+qlora related changes

* fixes

* Update quantization_config.py

* support fsdp+qlora and dsz3+qlora

* Update quantization_config.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* handle fsdp+qlora and dsz3+qlora correctly while model loading

* fix param count

* quality

* fsdp related changes

* fsdp changes only when using LoRA/QLoRA

* add accelerate version check

* refactor, update min accelerate version and add tests

1. Update minimum accelerate version to 0.26.0
2. Clean the trainer wrt accelerate version checks
3. FSDP refactor and test for fsdp config
4. use `itemsize` instead of `dtype2bytes` dict

* fix test

* Address comments

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fix the conditional flag

* fix conditional flag

* address comments

Co-Authored-By: Zach Mueller <7831895+muellerzr@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Zach Mueller <7831895+muellerzr@users.noreply.github.com>
2024-03-13 22:03:02 +05:30
Lysandre Debut
f497f564bb
Update all references to canonical models (#29001)
* Script & Manual edition

* Update
2024-02-16 08:16:58 +01:00
Sourab Mangrulkar
238d2e3c44
fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891)
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT

* update tests

* fix tests
2023-12-16 19:41:43 +05:30
Hz, Ji
82c7e87987
device agnostic fsdp testing (#27120)
* make fsdp test cases device agnostic

* make style
2023-11-01 07:17:06 +01:00
Yih-Dar
3e93dd295b
Skip TrainerIntegrationFSDP::test_basic_run_with_cpu_offload if torch < 2.1 (#26764)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-10-12 18:22:09 +02:00
Sourab Mangrulkar
86ffd5ffa2
fix name error when accelerate is not available (#26278)
* fix name error when accelerate is not available

* fix `is_fsdp_available`
2023-09-20 08:02:55 +02:00
Sourab Mangrulkar
382ba670ed
FSDP tests and checkpointing fixes (#26180)
* add fsdp tests

* Update test_fsdp.py

* Update test_fsdp.py

* fixes

* checks

* Update trainer.py

* fix

* fixes for saving/resuming checkpoints

* fixes

* add tests and delete debug statements

* fixing tests

* Update test_fsdp.py

* fix tests

* fix tests

* minor nits

* fix code style and quality

* refactor and modularize test code

* reduce the time of tests

* reduce the test time

* fix test

* reduce test time

* reduce test time

* fix failing tests

* fix

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* resolve comments

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-09-20 10:26:16 +05:30