Tim Dettmers
796162c512
Paged Optimizer + Lion Optimizer for Trainer ( #23217 )
...
* Added lion and paged optimizers and made original tests pass.
* Added tests for paged and lion optimizers.
* Added and fixed optimizer tests.
* Style and quality checks.
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-05-24 12:53:28 +02:00
Maxime Méloux
9b435204b1
Add Trainer support for ReduceLROnPlateau ( #23010 )
...
* Add Trainer support for ReduceLROnPlateau
Fixes #16503
* Remove training argument and add default instance
---------
Co-authored-by: mmeloux <maxime.meloux@loria.fr>
2023-04-28 09:17:30 -04:00
Zachary Mueller
03462875cc
Introduce PartialState
as the device handler in the Trainer
( #22752 )
...
* Use accelerate for device management
* Add accelerate to setup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-04-17 15:09:45 -04:00
Stas Bekman
1306b7d3ae
[tests] switch to torchrun ( #22712 )
2023-04-12 08:25:45 -07:00
Viktor Scherbakov
871598be55
Implemented safetensors checkpoints save/load for Trainer ( #22498 )
...
* implemented safetensors save/load
* remove duplicated file
* added tests
* more tests
* style fix
* fix tf tests
* change to list comprehension
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* review fixes + safe load for sharded checkpoint
* style fix
* remove rogue import
* remove partial to avoid undefined exception
* use naming alias instead of safetensors.torch
* fix safe sharding in tests
* grammar
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* minor corrections
* style
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-04-04 09:05:04 -04:00
Yih-Dar
5110e5748e
🔥 py38 + torch 2 🔥 🔥 🔥 🚀 ( #22204 )
...
* py38 + torch 2
* increment cache versions
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 22:59:23 +01:00
Dean Wyatte
2f4cdd97f5
handle numpy inputs in whole word mask data collator ( #22032 )
2023-03-10 10:50:29 -05:00
Lucain
923110b74f
Remove set_access_token usage + fail tests if FutureWarning ( #22051 )
...
* Remove set_access_token usage + fail tests if FutureWarning
* do not fail on FutureWarning in CI
---------
Co-authored-by: testbot <lucainp@hf.co>
2023-03-09 09:23:48 -05:00
Sylvain Gugger
b29e2dcaff
Fix flaky test for log level ( #21776 )
...
* Fix flaky test for log level
* Fix other flaky test
2023-02-28 16:24:14 -05:00
ydshieh
aa3787c8f0
Skip test_log_level for now
2023-02-23 12:11:20 +01:00
Sylvain Gugger
b19d64d852
Respect documentation on passive log level ( #21700 )
...
* Respect documentation on passive log level
* Fix test and set log level in examples
* Add doc
2023-02-22 09:39:18 +01:00
Aaron Gokaslan
5e8c8eb5ba
Apply ruff flake8-comprehensions ( #21694 )
2023-02-22 09:14:54 +01:00
Sylvain Gugger
cc8407522a
Fix epoch number when resuming training ( #21478 )
2023-02-06 19:34:34 -05:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting ( #21480 )
...
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
2023-02-06 18:10:56 -05:00
jeffhataws
c59d71b282
Add AWS Neuron torchrun support ( #20806 )
...
* Add XLA torchrun support
* Clarify that currently DDP doesn't work with torch.distributed XLA backend yet
* Enable DDP with torchrun and XLA (now available in PT-XLA 1.13)
* Add check for AWS Neuron availability and AWS Neuron specific compiler flag
* Change the new test's name to TestTrainerDistributedNeuronCore
* Remove "assert" and replace raised exception
* Remove compiler flag as it is optional. If needed, will be another PR.
* Use TORCHELASTIC_RUN_ID to determine whether torchrun is used
2023-01-18 11:21:19 -05:00
Sylvain Gugger
05e72aa0c4
Adapt repository creation to latest hf_hub ( #21158 )
...
* Adapt repository creation to latest hf_hub
* Update all examples
* Fix other tests, add Flax examples
* Address review comments
2023-01-18 11:14:00 -05:00
Yih-Dar
b3a0aad37d
Fix past CI ( #20967 )
...
* Fix for Past CI
* make style
* clean up
* unindent 2 blocks
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-01-12 18:04:21 +01:00
Thomas-MMJ
7ef3f19c3c
fix typo output not ouput in bitsandbytes trainer test ( #20839 )
...
fix typo output not ouput
typo was causing an error on pytest collection
2022-12-20 03:16:26 -05:00
Sylvain Gugger
08b4621899
Repurpose torchdynamo training args towards torch._dynamo ( #20498 )
...
* Repurpose torchdynamo training args towards torch._dynamo
* Add doc
2022-11-30 11:10:45 -05:00
Stas Bekman
a547d5bda5
[AnyPrecisionAdamW] test fix ( #20454 )
2022-11-25 09:02:10 -08:00
atturaioe
84c9cc6d15
Add AnyPrecisionAdamW optimizer ( #18961 )
...
* Add AnyPrecisionAdamW optimizer
* Add optim_args argument to TrainingArgs
* Add tests for AnyPrecisionOptimizer
* Change AnyPrecisionAdam default params to float32
* Move default_anyprecision_kwargs in trainer test
* Rename AnyPrecisionAdamW
2022-11-18 09:27:08 -05:00
Alexander Markov
610acc5ae9
Data collator for token classification pads labels column when receives pytorch tensors ( #20244 )
...
* token cls data_collator pads labels column
* remove walrus operator for code quality
* remove redundat space
* remove comment that was fixed
* PR comments fix
Co-authored-by: Alexander Markov <amarkov.me@gmail.com>
2022-11-16 12:18:46 -05:00
Yih-Dar
16242e1bf0
Run torchdynamo
tests ( #19056 )
...
* Enable torchdynamo tests
* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-15 11:10:16 -07:00
Younes Belkada
1ccd2515ed
small change ( #18584 )
2022-08-12 20:04:38 +02:00
Wei
7ea6ccc2b3
Enable torchdynamo with torch_tensorrt(fx path) ( #17765 )
...
* enable fx2trt
* Update perf_train_gpu_one.mdx
* Update perf_train_gpu_one.mdx
* add lib check
* update
* format
* update
* fix import check
* fix isort
* improve doc
* refactor ctx manager
* fix isort
* black format
* isort fix
* fix format
* update args
* update black
* cleanups
* Update perf_train_gpu_one.mdx
* code refactor
* code refactor to init
* remove redundancy
* isort
* replace self.args with args
Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-13 12:43:28 -04:00
jianan-gu
b7d8bd378c
Enhance IPEX integration in Trainer ( #18072 )
...
* enhance ipex import
* refine codes
* refine style
* add link
* style
Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-11 21:34:09 -07:00
neverix
8b332a6a16
Make predict() close progress bars after finishing ( #17952 ) ( #18078 )
...
* Make Trainer.predict call on_evaluate (#17952 )
* Add on_predict
* Small fix
* Small and different fix
* Add tests
2022-07-08 16:44:24 -04:00
Yih-Dar
664688b94f
higher atol to avoid flaky trainer test failure ( #17979 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 17:53:16 +02:00
Yih-Dar
fe14046421
skip some ipex tests until it works with torch 1.12 ( #17964 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-30 18:05:29 +02:00
Yih-Dar
f717d47fe0
Fix test_number_of_steps_in_training_with_ipex
( #17889 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-28 08:55:02 +02:00
Lysandre Debut
6a5272b205
Prepare transformers for v0.8.0 huggingface-hub release ( #17716 )
...
* Prepare CI for v0.8.0
* pin hfh (revert before merge)
* Revert "pin hfh (revert before merge)"
This reverts commit a0103140e1
.
* Test rc3
* Test latest rc
* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-06-21 11:51:18 -04:00
Stas Bekman
a2d34b7c04
deprecate is_torch_bf16_available ( #17738 )
...
* deprecate is_torch_bf16_available
* address suggestions
2022-06-20 08:40:11 -04:00
jianan-gu
3b29c9fdb7
Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference ( #17153 )
...
* add jit mode option and model wrap
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refine code
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add ut and refine code
* code refine
* refine code
* add inference doc
* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add cpu inference performance doc
* Update perf_infer_cpu.mdx
* Update perf_infer_cpu.mdx
* Update performance.mdx
* Update _toctree.yml
* refine jit func naming
* Update _toctree.yml
* Delete perf_infer_gpu_one.mdx
* Update perf_infer_cpu.mdx
* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add none check before jit
* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-06-14 07:56:47 -04:00
jianan-gu
34097b3304
Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch ( #17138 )
...
* init PR
* fix import ipex
* minor fix on bf16
* refine optimizer
* refine args notes
* refine code
* refine ipex optimize args
* refine half_precision_backend
* black format
* isort format
* isort format files
* flake8 format
* doc builder format
* refine codes
* remove jit and optim bits
* black preview format
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refine code
* refine notes
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* code refine
* add ipex ut
* add performance cpu doc
* link to the cpu doc from main perf doc
* install ipex into CI's docker
* Update perf_train_cpu.mdx
* Update docs/source/en/perf_train_cpu.mdx
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update perf_train_cpu.mdx
* Update perf_train_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-06-08 09:41:57 -04:00
Animesh Jain
897a8dd89f
Support compilation via Torchdynamo, AOT Autograd, NVFuser ( #17308 )
...
* Support compilation via Torchdynamo, AOT Autograd, NVFuser
* Address comments
* Lint
* Stas comments - missing quality test
* Lintere
* Quality test
* Doc lint
* Reset CUDA peak mem
* Add CustomTrainer
* require a single gpu
Co-authored-by: Stas Bekman <stas@stason.org>
2022-05-25 11:16:09 -04:00
Stas Bekman
3601aa8fc9
[tests] fix copy-n-paste error ( #17312 )
...
* [tests] fix copy-n-paste error
* fix
2022-05-18 16:00:47 -07:00
Yih-Dar
66b3e106a1
Make TrainerHyperParameterSigOptIntegrationTest slow test ( #17288 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-16 14:18:09 -04:00
Antoni Baum
47412c7d43
Ensure tensors are at least 1d for pad and concat ( #17179 )
...
* Ensure tensors are at least 1d for pad and concat
* Compatibility
* Fix
* Fix
* Add test
* Retrigger CI
* Consistency with master
* Retrigger CI
2022-05-11 13:19:08 -04:00
Antoni Baum
edcc66d27c
Remove unnecessary columns for all dataset types in Trainer
( #17166 )
...
* Remove unneeded columns for IterableDataset
* Add test
* Update trainer tests
* Edit docstring
* Lint
* Apply feedback
* Apply feedback
2022-05-11 11:11:26 -04:00
Zachary Mueller
2fbb237967
Add the auto_find_batch_size capability from Accelerate into Trainer ( #17068 )
...
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
- Adds auto_batch_size finder
- Moves training loop to an inner training loop
2022-05-09 12:29:18 -04:00
Sylvain Gugger
1c9fcd0e04
Fix RNG reload in resume training from epoch checkpoint ( #17055 )
...
* Fix RNG reload in resume training from epoch checkpoint
* Fix test
2022-05-03 10:31:24 -04:00
Sylvain Gugger
a8fa2f91f4
Make Trainer compatible with sharded checkpoints ( #17053 )
...
* Make Trainer compatible with sharded checkpoints
* Add doc
2022-05-03 09:55:10 -04:00
Manuel R. Ciosici
3104036e7f
Add support for bitsandbytes ( #15622 )
...
* Add initial BNB integration
* fixup! Add initial BNB integration
* Add bnb test decorator
* Update Adamw8bit option name
* Use the full bnb package name
* Overide bnb for all embedding layers
* Fix package name
* Formatting
* Remove unnecessary import
* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Rename AdamwBNB optimizer option
* Add training test checking that bnb memory utilization is lower
* fix merge
* fix merge; fix + extend new test
* cleanup
* expand bnb
* move all require_* candidates to testing_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-04-19 16:01:29 -04:00
code-review-doctor
a2392415e9
Some tests misusing assertTrue for comparisons fix ( #16771 )
...
* Fix issue avoid-misusing-assert-true found at https://codereview.doctor
* fix tests
* fix tf
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-04-19 14:44:08 +02:00
Sander Land
d7c8ce57d4
Avoid accessing .dataset of a DataLoader in Trainer ( #16451 )
...
* Avoid accessing .dataset of a dataloader
* style
* fix
* cleaning up, reverting some misunderstandings
* black
* add train_dataset argument to get_train_dataloader, and fix other instances of length checks
* flake8
* address comments
* fix bug
* cleanup
* add test
* Update tests/trainer/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* under torch
* merge
* stylistic suggestion
Co-authored-by: Sander Land <sander@chatdesk.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-29 15:00:18 -04:00
Sylvain Gugger
4975002df5
Reorganize file utils ( #16264 )
...
* Split file_utils in several submodules
* Fixes
* Add back more objects
* More fixes
* Who exactly decided to import that from there?
* Second suggestion to code with code review
* Revert wront move
* Fix imports
* Adapt all imports
* Adapt all imports everywhere
* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
David Hall
5b7dcc7342
Seed _get_train_sampler's generator with arg seed to improve reproducibility ( #15961 )
...
* Seed get_train_sampler's generator with arg seed to improve reproducibility
and make the world_size<=1 code path more similar to the others
* move test file into trainer test explicitly
* dumb typo
* make style lint happy
* per discussion, switch to data_seed
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-08 13:45:41 -05:00
Lysandre Debut
29c10a41d0
[Test refactor 1/5] Per-folder tests reorganization ( #15725 )
...
* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-02-23 15:46:28 -05:00