* remove it everywhere
* Update trainer_pt_utils.py
* Update trainer_pt_utils.py
* style
* sort list in test
* CIs
* use recursion same way as before (for intermediate layer names)
* first commit
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* use rls pytorch
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* feat: add flexible Liger Kernel configuration to TrainingArguments
Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.
Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag
Example usage:
```python
TrainingArguments(
use_liger_kernel=True,
liger_kernel_config={
"rope": True,
"swiglu": True,
"cross_entropy": False,
"fused_linear_cross_entropy": True
}
)
Closes#38905
* Address comments and update Liger section in Trainer docs
* siwtch to device agnostic autocast in nemotron to align xpu behavior w/
cuda
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix issue
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* refine
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update get_autocast_gpu_dtype to device agnostic one
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Update bamba model card
* Update the doc for bamba
* Update docs/source/en/model_doc/bamba.md
Bamba paragraph
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Bamba collection url
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Update Padding-Free Training to Notes heading
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
update examples
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Update additional info
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
consistent casing
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
simplify sentences
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Include pipeline and cli examples + fix formatting
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
update cli id
* Update quantization example
* Fix auto code formatter changes
* Update cli command + include BambaModel
* Update docs/source/en/model_doc/bamba.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* we need to check against mapping to be safe
* need to check only when inferring from image type, otherwise messes custom code
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Update trocr.md
Docs: add community fine‑tuning notebook link to TrOCR page
* apply suggested changes from PR review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/trocr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* log: Add logging when user uses split_batches and per_device_train_batch_size
* refactor: remove whitespace from blank line
* Update src/transformers/training_args.py
Change logging level to info
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Fix HQQ model param device transfer issue
* modify a comment
* clear the code and add test for hqq device/dtype
* fix test hqq code quality of imports
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>