Commit Graph

158 Commits

Author SHA1 Message Date
Jonathon Belotti
4c5ed1d0c9
fix: non-atomic checkpoint save (#27820) 2023-12-08 14:08:54 +01:00
Charbel Abi Daher
2ca73e5ee3
Fixed passing scheduler-specific kwargs via TrainingArguments lr_scheduler_kwargs (#27595)
* Fix passing scheduler-specific kwargs through TrainingArguments `lr_scheduler_kwargs`

* Added test for lr_scheduler_kwargs
2023-11-28 08:33:45 +01:00
Dave Berenbaum
8eb9e29d8d
dvclive callback: warn instead of fail when logging non-scalars (#27608)
* dvclive callback: warn instead of fail when logging non-scalars

* tests: log lr as scalar
2023-11-21 09:29:51 +01:00
Arthur
651408a077
[Styling] stylify using ruff (#27144)
* try to stylify using ruff

* might need to remove these changes?

* use ruf format andruff check

* use isinstance instead of type comparision

* use # fmt: skip

* use # fmt: skip

* nits

* soem styling changes

* update ci job

* nits isinstance

* more files update

* nits

* more nits

* small nits

* check and format

* revert wrong changes

* actually use formatter instead of checker

* nits

* well docbuilder is overwriting this commit

* revert notebook changes

* try to nuke docbuilder

* style

* fix feature exrtaction test

* remve `indent-width = 4`

* fixup

* more nits

* update the ruff version that we use

* style

* nuke docbuilder styling

* leve the print for detected changes

* nits

* Remove file I/O

Co-authored-by: charliermarsh
 <charlie.r.marsh@gmail.com>

* style

* nits

* revert notebook changes

* Add # fmt skip when possible

* Add # fmt skip when possible

* Fix

* More `  # fmt: skip` usage

* More `  # fmt: skip` usage

* More `  # fmt: skip` usage

* NIts

* more fixes

* fix tapas

* Another way to skip

* Recommended way

* Fix two more fiels

* Remove asynch
Remove asynch

---------

Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>
2023-11-16 17:43:19 +01:00
Hz, Ji
1ffc4dee5b
enable memory tracker metrics for npu (#27280) 2023-11-06 13:44:21 +00:00
Lysandre Debut
113ebf80ac
Safetensors serialization by default (#27064)
* Safetensors serialization by default

* First pass on the tests

* Second pass on the tests

* Third pass on the tests

* Fix TF weight loading from TF-format safetensors

* Specific encoder-decoder fixes for weight crossloading

* Add VisionEncoderDecoder fixes for TF too

* Change filename test for pt-to-tf

* One missing fix for TFVisionEncoderDecoder

* Fix the other crossload test

* Support for flax + updated tests

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Sanchit's comments

* Sanchit's comments 2

* Nico's comments

* Fix tests

* cleanup

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-10-31 19:16:49 +01:00
Younes Belkada
309a90664f
[FEAT] Add Neftune into transformers Trainer (#27141)
* add v1 neftune

* use `unwrap_model` instead

* add test + docs

* Apply suggestions from code review

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* more details

* fixup

* Update docs/source/en/main_classes/trainer.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* refactor a bit

* more elaborated test

* fix unwrap issue

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-10-31 16:03:59 +01:00
Hz, Ji
5bbf671276
Device agnostic trainer testing (#27131) 2023-10-30 18:16:40 +00:00
Younes Belkada
5fbed2d7ca
[Trainer / GC] Add gradient_checkpointing_kwargs in trainer and training arguments (#27068)
* add `gradient_checkpointing_kwargs` in trainer and training arguments

* add comment

* add test - currently failing

* now tests pass
2023-10-30 12:41:48 +01:00
Zach Mueller
34a640642b
Save TB logs as part of push_to_hub (#27022)
* Support runs/

* Upload runs folder as part of push to hub

* Add a test

* Add to test deps

* Update with proposed solution from Slack

* Ensure that repo gets deleted in tests
2023-10-26 12:13:19 -04:00
Wang, Yi
8f609ab9e0
enable optuna multi-objectives feature (#25969)
* enable optuna multi-objectives feature

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update hpo doc

* update docstring

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* extend direction to List[str] type

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Update src/transformers/integrations/integration_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-12 18:01:22 +01:00
Zach Mueller
be0e189bd3
Revert frozen training arguments (#25903)
* Revert frozen training arguments

* TODO
2023-09-01 11:24:12 -04:00
Zach Mueller
ca51499248
Make training args fully immutable (#25435)
* Make training args fully immutable

* Working tests, PyTorch

* In test_trainer

* during testing

* Use proper dataclass way

* Fix test

* Another one

* Fix tf

* Lingering slow

* Exception

* Clean
2023-08-15 11:47:47 -04:00
Sylvain Gugger
baf1daa58e
Migrate Trainer from Repository to upload_folder (#25095)
* First draft

* Deal with progress bars

* Update src/transformers/utils/hub.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Address review comments

* Forgot one

* Pin hf_hub

* Add argument for push all and fix tests

* Fix tests

* Address review comments

---------

Co-authored-by: Lucain <lucainp@gmail.com>
2023-08-07 17:47:22 +02:00
Zach Mueller
0284285501
Fix pad across processes dim in trainer and not being able to set the timeout (#24775)
* dim, and rm copy

* Don't rm copy for now

* Oops

* pad index

* Should be a working test

* Tickle down ddp timeout

* Put fix back in now that testing locally is done

* Better comment specifying timeout

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-12 10:01:51 -04:00
Alex Hall
b6295b26c5
Refactor hyperparameter search backends (#24384)
* Refactor hyperparameter search backends

* Simpler refactoring without abstract base class

* black

* review comments:
specify name in class
use methods instead of callable class attributes
name constant better

* review comments: safer bool checking, log multiple available backends

* test ALL_HYPERPARAMETER_SEARCH_BACKENDS vs HPSearchBackend in unit test, not module. format with black.

* copyright
2023-06-22 14:28:25 -04:00
Zach Mueller
ebd94b0f6f
🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 (#24028)
* Working integration

* Fix failing test

* Revert label host logic

* Bring it back!
2023-06-12 11:23:37 -04:00
Tim Dettmers
796162c512
Paged Optimizer + Lion Optimizer for Trainer (#23217)
* Added lion and paged optimizers and made original tests pass.

* Added tests for paged and lion optimizers.

* Added and fixed optimizer tests.

* Style and quality checks.

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-05-24 12:53:28 +02:00
Maxime Méloux
9b435204b1
Add Trainer support for ReduceLROnPlateau (#23010)
* Add Trainer support for ReduceLROnPlateau

Fixes #16503

* Remove training argument and add default instance

---------

Co-authored-by: mmeloux <maxime.meloux@loria.fr>
2023-04-28 09:17:30 -04:00
Viktor Scherbakov
871598be55
Implemented safetensors checkpoints save/load for Trainer (#22498)
* implemented safetensors save/load

* remove duplicated file

* added tests

* more tests

* style fix

* fix tf tests

* change to list comprehension

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* review fixes + safe load for sharded checkpoint

* style fix

* remove rogue import

* remove partial to avoid undefined exception

* use naming alias instead of safetensors.torch

* fix safe sharding in tests

* grammar

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update docs

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor corrections

* style

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-04-04 09:05:04 -04:00
Yih-Dar
5110e5748e
🔥py38 + torch 2 🔥🔥🔥🚀 (#22204)
* py38 + torch 2

* increment cache versions

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 22:59:23 +01:00
Lucain
923110b74f
Remove set_access_token usage + fail tests if FutureWarning (#22051)
* Remove set_access_token usage + fail tests if FutureWarning

* do not fail on FutureWarning in CI

---------

Co-authored-by: testbot <lucainp@hf.co>
2023-03-09 09:23:48 -05:00
Sylvain Gugger
b29e2dcaff
Fix flaky test for log level (#21776)
* Fix flaky test for log level

* Fix other flaky test
2023-02-28 16:24:14 -05:00
ydshieh
aa3787c8f0 Skip test_log_level for now 2023-02-23 12:11:20 +01:00
Sylvain Gugger
b19d64d852
Respect documentation on passive log level (#21700)
* Respect documentation on passive log level

* Fix test and set log level in examples

* Add doc
2023-02-22 09:39:18 +01:00
Aaron Gokaslan
5e8c8eb5ba
Apply ruff flake8-comprehensions (#21694) 2023-02-22 09:14:54 +01:00
Sylvain Gugger
cc8407522a
Fix epoch number when resuming training (#21478) 2023-02-06 19:34:34 -05:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting (#21480)
* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies
2023-02-06 18:10:56 -05:00
Sylvain Gugger
05e72aa0c4
Adapt repository creation to latest hf_hub (#21158)
* Adapt repository creation to latest hf_hub

* Update all examples

* Fix other tests, add Flax examples

* Address review comments
2023-01-18 11:14:00 -05:00
Yih-Dar
b3a0aad37d
Fix past CI (#20967)
* Fix for Past CI

* make style

* clean up

* unindent 2 blocks

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-01-12 18:04:21 +01:00
Thomas-MMJ
7ef3f19c3c
fix typo output not ouput in bitsandbytes trainer test (#20839)
fix typo output not ouput

typo was causing an error on pytest collection
2022-12-20 03:16:26 -05:00
Sylvain Gugger
08b4621899
Repurpose torchdynamo training args towards torch._dynamo (#20498)
* Repurpose torchdynamo training args towards torch._dynamo

* Add doc
2022-11-30 11:10:45 -05:00
Stas Bekman
a547d5bda5
[AnyPrecisionAdamW] test fix (#20454) 2022-11-25 09:02:10 -08:00
atturaioe
84c9cc6d15
Add AnyPrecisionAdamW optimizer (#18961)
* Add AnyPrecisionAdamW optimizer

* Add optim_args argument to TrainingArgs

* Add tests for AnyPrecisionOptimizer

* Change AnyPrecisionAdam default params to float32

* Move default_anyprecision_kwargs in trainer test

* Rename AnyPrecisionAdamW
2022-11-18 09:27:08 -05:00
Yih-Dar
16242e1bf0
Run torchdynamo tests (#19056)
* Enable torchdynamo tests

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-15 11:10:16 -07:00
Younes Belkada
1ccd2515ed
small change (#18584) 2022-08-12 20:04:38 +02:00
Wei
7ea6ccc2b3
Enable torchdynamo with torch_tensorrt(fx path) (#17765)
* enable fx2trt

* Update perf_train_gpu_one.mdx

* Update perf_train_gpu_one.mdx

* add lib check

* update

* format

* update

* fix import check

* fix isort

* improve doc

* refactor ctx manager

* fix isort

* black format

* isort fix

* fix format

* update args

* update black

* cleanups

* Update perf_train_gpu_one.mdx

* code refactor

* code refactor to init

* remove redundancy

* isort

* replace self.args with args

Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-13 12:43:28 -04:00
jianan-gu
b7d8bd378c
Enhance IPEX integration in Trainer (#18072)
* enhance ipex import

* refine codes

* refine style

* add link

* style

Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-11 21:34:09 -07:00
Yih-Dar
664688b94f
higher atol to avoid flaky trainer test failure (#17979)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 17:53:16 +02:00
Yih-Dar
fe14046421
skip some ipex tests until it works with torch 1.12 (#17964)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-30 18:05:29 +02:00
Yih-Dar
f717d47fe0
Fix test_number_of_steps_in_training_with_ipex (#17889)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-28 08:55:02 +02:00
Lysandre Debut
6a5272b205
Prepare transformers for v0.8.0 huggingface-hub release (#17716)
* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1.

* Test rc3

* Test latest rc

* Unpin to the RC

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-06-21 11:51:18 -04:00
Stas Bekman
a2d34b7c04
deprecate is_torch_bf16_available (#17738)
* deprecate is_torch_bf16_available

* address suggestions
2022-06-20 08:40:11 -04:00
jianan-gu
3b29c9fdb7
Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153)
* add jit mode option and model wrap

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add ut and refine code

* code refine

* refine code

* add inference doc

* Update src/transformers/trainer.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add cpu inference performance doc

* Update perf_infer_cpu.mdx

* Update perf_infer_cpu.mdx

* Update performance.mdx

* Update _toctree.yml

* refine jit func naming

* Update _toctree.yml

* Delete perf_infer_gpu_one.mdx

* Update perf_infer_cpu.mdx

* Update docs/source/en/perf_infer_cpu.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add none check before jit

* Update docs/source/en/perf_infer_cpu.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/perf_infer_cpu.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-06-14 07:56:47 -04:00
jianan-gu
34097b3304
Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138)
* init PR

* fix import ipex

* minor fix on bf16

* refine optimizer

* refine args notes

* refine code

* refine ipex optimize args

* refine half_precision_backend

* black format

* isort format

* isort format files

* flake8 format

* doc builder format

* refine codes

* remove jit and optim bits

* black preview format

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* refine notes

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* code refine

* add ipex ut

* add performance cpu doc

* link to the cpu doc from main perf doc

* install ipex into CI's docker

* Update perf_train_cpu.mdx

* Update docs/source/en/perf_train_cpu.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update perf_train_cpu.mdx

* Update perf_train_cpu.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-06-08 09:41:57 -04:00
Animesh Jain
897a8dd89f
Support compilation via Torchdynamo, AOT Autograd, NVFuser (#17308)
* Support compilation via Torchdynamo, AOT Autograd, NVFuser

* Address comments

* Lint

* Stas comments - missing quality test

* Lintere

* Quality test

* Doc lint

* Reset CUDA peak mem

* Add CustomTrainer

* require a single gpu

Co-authored-by: Stas Bekman <stas@stason.org>
2022-05-25 11:16:09 -04:00
Stas Bekman
3601aa8fc9
[tests] fix copy-n-paste error (#17312)
* [tests] fix copy-n-paste error

* fix
2022-05-18 16:00:47 -07:00
Yih-Dar
66b3e106a1
Make TrainerHyperParameterSigOptIntegrationTest slow test (#17288)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-16 14:18:09 -04:00
Antoni Baum
edcc66d27c
Remove unnecessary columns for all dataset types in Trainer (#17166)
* Remove unneeded columns for IterableDataset

* Add test

* Update trainer tests

* Edit docstring

* Lint

* Apply feedback

* Apply feedback
2022-05-11 11:11:26 -04:00
Zachary Mueller
2fbb237967
Add the auto_find_batch_size capability from Accelerate into Trainer (#17068)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

- Adds auto_batch_size finder 
- Moves training loop to an inner training loop
2022-05-09 12:29:18 -04:00
Sylvain Gugger
1c9fcd0e04
Fix RNG reload in resume training from epoch checkpoint (#17055)
* Fix RNG reload in resume training from epoch checkpoint

* Fix test
2022-05-03 10:31:24 -04:00
Sylvain Gugger
a8fa2f91f4
Make Trainer compatible with sharded checkpoints (#17053)
* Make Trainer compatible with sharded checkpoints

* Add doc
2022-05-03 09:55:10 -04:00
Manuel R. Ciosici
3104036e7f
Add support for bitsandbytes (#15622)
* Add initial BNB integration

* fixup! Add initial BNB integration

* Add bnb test decorator

* Update Adamw8bit option name

* Use the full bnb package name

* Overide bnb for all embedding layers

* Fix package name

* Formatting

* Remove unnecessary import

* Update src/transformers/trainer.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename AdamwBNB optimizer option

* Add training test checking that bnb memory utilization is lower

* fix merge

* fix merge; fix + extend new test

* cleanup

* expand bnb

* move all require_* candidates to testing_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-04-19 16:01:29 -04:00
code-review-doctor
a2392415e9
Some tests misusing assertTrue for comparisons fix (#16771)
* Fix issue avoid-misusing-assert-true found at https://codereview.doctor

* fix tests

* fix tf

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-04-19 14:44:08 +02:00
Sander Land
d7c8ce57d4
Avoid accessing .dataset of a DataLoader in Trainer (#16451)
* Avoid accessing .dataset of a dataloader

* style

* fix

* cleaning up, reverting some misunderstandings

* black

* add train_dataset argument to get_train_dataloader, and fix other instances of length checks

* flake8

* address comments

* fix bug

* cleanup

* add test

* Update tests/trainer/test_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* under torch

* merge

* stylistic suggestion

Co-authored-by: Sander Land <sander@chatdesk.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-29 15:00:18 -04:00
Sylvain Gugger
4975002df5
Reorganize file utils (#16264)
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
David Hall
5b7dcc7342
Seed _get_train_sampler's generator with arg seed to improve reproducibility (#15961)
* Seed get_train_sampler's generator with arg seed to improve reproducibility

and make the world_size<=1 code path more similar to the others

* move test file into trainer test explicitly

* dumb typo

* make style lint happy

* per discussion, switch to data_seed

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-08 13:45:41 -05:00
Lysandre Debut
29c10a41d0
[Test refactor 1/5] Per-folder tests reorganization (#15725)
* Per-folder tests reorganization

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-02-23 15:46:28 -05:00