* Adding option to save/reload scaler
* Removing duplicate variable
* Adding save/reload test
* Small fixes on deterministic algorithm call
* Moving LLM test to another file to isolate its environment
* Moving back to old file and using subprocess to run test isolated
* Reverting back accidental change
* Reverting back accidental change
* add RAdamScheduleFree optimizer
* revert schedulefree version to the minimum requirement
* refine is_schedulefree_available so that it can take min_version
* refine documents
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
* update codecarbon
* replace directly-specified-test-dirs with tmp_dir
* pass tmp_dir to all get_regression_trainer
* test_trainer.py: Use tmp_dir consistently for all output_dir arguments
* fix some with...as tmp_dir blocks
* reflect the comments to improve test_trainer.py
* refresh .gitignore
* Updated docstring for _determine_best_metric.
* Updated docstring for metric_for_best_model.
* Added test case for save strategy.
* Updated incorrect test case.
* Changed eval_strategy to match save_strategy.
* Separated test cases for metric.
* Allow load_best_model when save_strategy == "best".
* Updated docstring for metric_for_best_model.
* fix GA bugs and add unit test
* narrow down model loss unit test diff gap
* format code to make ruff happy
* send num_items_in_batch argument to decoder
* fix GA loss bug in BertLMHeadModel
* use TinyStories-33M to narrow down diff gap
* fotmat code
* missing .config
* avoid add extra args
---------
Co-authored-by: kangsheng <kangsheng@meituan.com>
* Update trainer for easier handling of accumulate + proper reporting
* test
* Fixup tests
* Full fix
* Fix style
* rm comment
* Fix tests
* Minimize test + remove py 311 check
* Unused import
* Forward contrib credits from discussions
* Fix reported metrics
* Refactor, good as it's going to get
* rm pad tok id check
* object detection and audio are being annoying
* Fin
* Fin x2
---------
Co-authored-by: Gyanateet Dutta <Ryukijano@users.noreply.github.com>
* Add _determine_best_metric and new saving logic.
1. Logic to determine the best logic was separated out from
`_save_checkpoint`.
2. In `_maybe_log_save_evaluate`, whether or not a new best metric was
achieved is determined after each evaluation, and if the save strategy
is "best' then the TrainerControl is updated accordingly.
* Added SaveStrategy.
Same as IntervalStrategy, but with a new attribute called BEST.
* IntervalStrategy -> SaveStrategy
* IntervalStratgy -> SaveStrategy for save_strat.
* Interval -> Save in docstring.
* Updated docstring for save_strategy.
* Added SaveStrategy and made according changes.
`save_strategy` previously followed `IntervalStrategy` but now follows
`SaveStrategy`.
Changes were made accordingly to the code and the docstring.
* Changes from `make fixup`.
* Removed redundant metrics argument.
* Added new test_save_best_checkpoint test.
1. Checks for both cases where `metric_for_best_model` is explicitly
provided and when it's not provided.
2. The first case should have two checkpoints saved, whereas the second
should have three saved.
* Changed should_training_end saving logic.
The Trainer saves a checkpoints at the end of training by default as
long as `save_strategy != SaveStrategy.NO`. This condition was modified
to include `SaveStrategy.BEST` because it would be counterintuitive that
we'd only want the best checkpoint to be saved but the last one is as
well.
* `args.metric_for_best_model` default to loss.
* Undo metric_for_best_model update.
* Remove checking metric_for_best_model.
* Added test cases for loss and no metric.
* Added error for metric and changed default best_metric.
* Removed unused import.
* `new_best_metric` -> `is_new_best_metric`
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Applied `is_new_best_metric` to all.
Changes were made for consistency and also to fix a potential bug.
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* bookmark
* Bookmark
* Bookmark
* Actually implement
* Pass in kwarg explicitly
* Adjust for if we do or don't have labels
* Bookmark fix for od
* bookmark
* Fin
* closer
* Negate accelerate grad accum div
* Fixup not training long enough
* Add in compute_loss to take full model output
* Document
* compute_loss -> compute_loss_fn
* Add a test
* Refactor
* Refactor
* Uncomment tests
* Update tests/trainer/test_trainer.py
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Fix FSDP Initialization for resume training
* Added init_fsdp function to work with dummy values
* Fix FSDP initialization for resuming training
* Added CUDA decorator for tests
* Added torch_gpu decorator to FSDP tests
* Fixup for failing code quality tests
* Trainer - deprecate tokenizer for processing_class
* Extend chage across Seq2Seq trainer and docs
* Add tests
* Update to FutureWarning and add deprecation version
* Add AdEMAMix optimizer
* Fix test
* Update tests/trainer/test_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix galore lr display with lr schedulers
* style
* add some tests to check for displayed lrs
* copy-paste err for warmup steps
* standardize the default lr to be only in the optimizer
* trying out my luck with the reads
* add gather_use_object arguments
* fix name and pass the CI test for Seq2SeqTrainer
* make style
* make it to functools
* fix typo
* add accelerate version:
* adding warning
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* make style
* Update src/transformers/training_args.py
* check function move to initial part
* add test for eval_use_gather_object
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Implement JSON dump conversion for torch_dtype in TrainingArguments
* Add unit test for converting torch_dtype in TrainingArguments to JSON
* move unit test for converting torch_dtype into TrainerIntegrationTest class
* reformating using ruff
* convert dict_torch_dtype_to_str to private method _dict_torch_dtype_to_str
---------
Co-authored-by: jun.4 <jun.4@kakaobrain.com>
* Introduce configured_state
* Include note on tuning
* Allow for users to have defined a state already
* Include tests
* Add note on hpam tune
* Guard a bit better
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Finish rebase
* Finish rebase
* Guard carefully
* Fixup test
* Refactor
* Fin refactor
* Comment
* Update wrt feedback
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Added cache clearing for GPU efficiency.
* Added cache clearing for GPU efficiency.
* Added batch_eval_metrics capability
* Ran make fixup
* Fixed bug
* Fixed whitespace issue
* Fixed outdated condition
* Updated docstrings with instructions for batch_eval_metrics. Updated end of dataloader logic
* Added first version of batch_eval_metrics Trainer test
* Fixed batch_eval_metrics Trainer tests for both eval and predict
* Fixed batch_eval_metrics behavior for new Trainer variables
* Fixed batch_eval_metrics Trainer tests
* Ran fixup
* Trainer: load checkpoint model with multiple adapters
* Trainer._load_from_checkpoint support multiple active adapters
* PeftModel.set_adapter does not support multiple adapters yet
* Trainer._load_from_checkpoint test multiple adapters
---------
Co-authored-by: Clara Luise Pohland <clara-luise.pohland@telekom.de>
* Start rework
* Fix failing test
* Include max
* Update src/transformers/trainer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py
* add tests and raise ValueError when optimizer is None
* add second layer to test and freeze its weigths
* check if torch is available before running tests
* use decorator to check if torch is available
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix test indentation
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>