* feat: add flexible Liger Kernel configuration to TrainingArguments
Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.
Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag
Example usage:
```python
TrainingArguments(
use_liger_kernel=True,
liger_kernel_config={
"rope": True,
"swiglu": True,
"cross_entropy": False,
"fused_linear_cross_entropy": True
}
)
Closes#38905
* Address comments and update Liger section in Trainer docs
* enable misc test cases on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* tweak bamba ground truth on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* remove print
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* since 1.11.0, torchao.prototype.low_bit_optim is promoted to
torchao.optim
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix review comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* use device agnostic APIs in test cases
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* xpu now supports integer device id, aligning to CUDA behaviors
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update to use device_properties
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update comment
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* use device agnostic APIs in tests
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* more
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add reset_peak_memory_stats API
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* update
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* tmp commit (imports broken)
* working version; update tests
* remove line break
* shorter msg
* dola checks need num_beams=1; other minor PR comments
* update early trainer failing on bad gen config
* make fixup
* test msg
* enable trainer test cases on xpu
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* args keep_torch_compile=False in _save and _wwrap_method
* Fix FSDP execution on evaluation for torch_compile mode
* add test trainer FSDP + Torch Compile
* fix quality code
* make style
* Revert " make style"
This reverts commit 77e797f8829c50992cc21496be3d9a3e480e1c97.
* make style
* enable xpu in test_trainer
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* enhance _device_agnostic_dispatch to cover value
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* add default values for torch not available case
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* [fix] make legacy bnb code work
* [fix] use get with default instead of getter
* add test for bnb 8bit optim skip embed
* [fix] style
* add require annotation of bnb
---------
Co-authored-by: jaycha <jaycha@ncsoft.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Added support for seed in `DataCollatorForWholeWordMask`, and also wrote tests.
Also fixed bugs where the code hardcoded values for mask replacement probability and random replacement probability, instead of using the values passed by the user.
* formatting issues
* Used better way to generate seed in TF. Made tests more consistent.
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* fix: format codes
* chore: fix copy mismatch issue
* fix: format codes
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: restore previous words
* chore: revert unexpected changes
* feat: Saving tokenizer in collator when processing_class is None
* chore: Style issue
* chore: Typo
* dbg: Check why test failed
* dbg: Remove logics and another test failed which successed before, so should be the stablibility issue
* test: Init unit-test
* chore: Style
* chore: Add err log
* fix: Case
* Update tests/trainer/test_trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* chore: Try to use get_regression_trainer
* fix: Impl and style
* fix: Style
* fix: Case
* fix: Import err
* fix: Missed import
* fix: Import block un-sorted problem
* fix: Try another tokenizer
* fix: Test logic
* chore: Light updates
* chore: Reformat
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Just import torch AdamW instead
* Update docs too
* Make AdamW undocumented
* make fixup
* Add a basic wrapper class
* Add it back to the docs
* Just remove AdamW entirely
* Remove some AdamW references
* Drop AdamW from the public init
* make fix-copies
* Cleanup some references
* make fixup
* Delete lots of transformers.AdamW references
* Remove extra references to adamw_hf
* Set best_model_checkpoint only when ckpt exists.
Rather than set it explicitly without checking if the checkpoint directory even exists as before, now we moved the setting logic inside of _save_checkpoint and are only setting it if it exists.
* Added best_global_step to TrainerState.
* Added tests for best_model_checkpoint.
* Fixed hard-coded values in test to prevent fail.
* Added helper func and removed hard-coded best_step.
* Added side effect patch generator for _eval.
* Added evaluate side effect func.
* Removed erroneous patching.
* Fixed minor bug.
* Applied Ruff.
* Fixed Ruff problem in make style.
* Used Trainer.set_initial_training_values.
* fix wandb hp search unable to resume from sweep_id
* format styles
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Don't accidentally mutate the base_model_tp_plan
* Co-authored by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Trigger tests
* Marking grad accum test as slow
* Add a flaky decorator
* Add a flaky decorator
* Use cyril's codeblock
* Don't copy() when it's None
* Use cyril's new codeblock
* make fixup
* test
* fix
* fix
* skip some and run some first
* test fsdp
* fix
* patches for generate
* test distributed
* copy
* don't test distributed loss for hpu
* require fp16 and run first
* changes from marc's PR fixing zero3
* better alternative
* return True when fp16 support on gaudi without creating bridge
* fix
* fix tested dtype in deepspeed inference test
* test
* fix
* test
* fix
* skip
* require fp16
* run first fsdp
* Apply suggestions from code review
* address comments
* address comments and refactor test
* reduce precison
* avoid doing gaudi1 specific stuff in the genreation loop
* document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
Fixed 2 issues regarding `tests/trainer/test_data_collator.py::TFDataCollatorIntegrationTest::test_all_mask_replacement`:
1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`.
2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input #1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) | (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.
* fix: prevent model access error during Optuna hyperparameter tuning
The `transformers.integrations.integration_utils.run_hp_search_optuna` function releases model memory and sets trainer.model to None after each trial. This causes an AttributeError when subsequent Trainer.train calls attempt to access the model before reinitialization. This is only an issue when `fp16_full_eval` or `bf16_full_eval` flags are enabled.
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix: prevent second save in the end of training
* fix: prevent second save in the end of training
* test: added test for no duplicate save on epoch save strategy
* fix: removed TrainerControl
* chore: style formatting
---------
Co-authored-by: JaktensTid <jaktenstid1@gmail.com>
* Add implementation for DataCollatorForMultipleChoice based on docs.
* Add DataCollatorForMultipleChoice to import structure.
* Remove custom DataCollatorForMultipleChoice implementations from example scripts.
* Remove custom implementations of DataCollatorForMultipleChoice from docs in English, Spanish, Japanese and Korean.
* Refactor torch version of DataCollatorForMultipleChoice to be more easily understandable.
* Apply suggested changes and run make fixup.
* fix copies, style and fixup
* add missing documentation
* nits
* fix docstring
* style
* nits
* isort
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* Adding option to save/reload scaler
* Removing duplicate variable
* Adding save/reload test
* Small fixes on deterministic algorithm call
* Moving LLM test to another file to isolate its environment
* Moving back to old file and using subprocess to run test isolated
* Reverting back accidental change
* Reverting back accidental change
* add RAdamScheduleFree optimizer
* revert schedulefree version to the minimum requirement
* refine is_schedulefree_available so that it can take min_version
* refine documents
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* layernorm_decay_fix
* W293 fix
* ruff format fix
* black format
* ruff format
* erase last layer
* add test_get_parameter_names_rmsnorm
* rmsnorm fix
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
* DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing
* DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing
* Addressed review comments, modified the docstring and made a test for the DataCollatorForLanguageModeling