* feat: add flexible Liger Kernel configuration to TrainingArguments
Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.
Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag
Example usage:
```python
TrainingArguments(
use_liger_kernel=True,
liger_kernel_config={
"rope": True,
"swiglu": True,
"cross_entropy": False,
"fused_linear_cross_entropy": True
}
)
Closes#38905
* Address comments and update Liger section in Trainer docs
* siwtch to device agnostic autocast in nemotron to align xpu behavior w/
cuda
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix issue
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* refine
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update get_autocast_gpu_dtype to device agnostic one
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Update bamba model card
* Update the doc for bamba
* Update docs/source/en/model_doc/bamba.md
Bamba paragraph
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Bamba collection url
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Update Padding-Free Training to Notes heading
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
update examples
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Update additional info
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
consistent casing
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
simplify sentences
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Include pipeline and cli examples + fix formatting
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
update cli id
* Update quantization example
* Fix auto code formatter changes
* Update cli command + include BambaModel
* Update docs/source/en/model_doc/bamba.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* we need to check against mapping to be safe
* need to check only when inferring from image type, otherwise messes custom code
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Update trocr.md
Docs: add community fine‑tuning notebook link to TrOCR page
* apply suggested changes from PR review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/trocr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* log: Add logging when user uses split_batches and per_device_train_batch_size
* refactor: remove whitespace from blank line
* Update src/transformers/training_args.py
Change logging level to info
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Fix HQQ model param device transfer issue
* modify a comment
* clear the code and add test for hqq device/dtype
* fix test hqq code quality of imports
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Correctly fix init
Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>
* add back the block, breaking BC but this is correct author's code
* override the test for params needing it
---------
Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>
* enable misc test cases on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* tweak bamba ground truth on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* remove print
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Moved the sources to the right
* small Changes
* Some Changes to moonshine
* Added the install to pipline
* updated the monshine model card
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated Documentation According to changes
* Fixed the model with the commits
* Changes to the roc_bert
* Final Update to the branch
* Adds Quantizaiton to the model
* Finsihed Fixing the Roc_bert docs
* Fixed Moshi
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Added the install to pipline
* updated the monshine model card
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated Documentation According to changes
* Fixed the model with the commits
* Fixed the problems
* Final Fix
* Final Fix
* Final Fix
* Update roc_bert.md
---------
Co-authored-by: Your Name <sohamprabhu@Mac.fios-router.home>
Co-authored-by: Your Name <sohamprabhu@Sohams-MacBook-Air.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* init
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* Fixed dynamo bug and image padding tests
* refactor: applied refactoring changes from SuperGlue's concat, batch and stack functions to LightGlue file
* tests: removed sdpa support and changed expected values
* chore: added some docs and refactoring
* chore: fixed copy to superpoint.image_processing_superpoint.convert_to_grayscale
* feat: adding batch implementation
* feat: added validation for preprocess and post process method to LightGlueImageProcessor
* chore: changed convert_lightglue_to_hf script to comply with new standard
* chore: changed lightglue test values to match new lightglue config pushed to hub
* chore: simplified convert_lightglue_to_hf conversion map
* feat: adding batching implementation
* chore: make style
* feat: added threshold to post_process_keypoint_matching method
* fix: added missing instructions that turns keypoints back to absolute coordinate before matching forward
* fix: added typehint and docs
* chore: make style
* [run-slow] lightglue
* fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching
* tests: added CUDA proof tests similar to SuperGlue
* chore: various changes to modeling_lightglue.py
- Added "Copies from" statements for copied functions from modeling_superglue.py
- Added missing docstrings
- Removed unused functions or classes
- Removed unnecessary statements
- Added missing typehints
- Added comments to the main forward method
* chore: various changes to convert_lightglue_to_hf.py
- Added model saving
- Added model reloading
* chore: fixed imports in lightglue files
* [run-slow] lightglue
* chore: make style
* [run-slow] lightglue
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* [run-slow] lightglue
* chore: Applied some suggestions from review
- Added missing typehints
- Refactor "cuda" to device variable
- Variable renaming
- LightGlue output order changed
- Make style
* fix: added missing grayscale argument in image processor in case use of SuperPoint keypoint detector
* fix: changed lightglue HF repo to lightglue_superpoint with grayscale default to True
* refactor: make keypoints `(batch_size, num_keypoints, keypoint_dim)` through forward and unsqueeze only before attention layer
* refactor: refactor do_layer_keypoint_pruning
* tests: added tests with no early stop and keypoint pruning
* refactor: various refactoring to modeling_lightglue.py
- Removed unused functions
- Renamed variables for consistency
- Added comments for clarity
- Set methods to private in LightGlueForKeypointMatching
- Replaced tensor initialization to list then concatenation
- Used more pythonic list comprehension for repetitive instructions
* refactor: added comments and renamed filter_matches to get_matches_from_scores
* tests: added copied from statement with superglue tests
* docs: added comment to prepare_keypoint_matching_output function in tests
* [run-slow] lightglue
* refactor: reordered _concat_early_stopped_outputs in LightGlue class
* [run-slow] lightglue
* docs: added lightglue.md model doc
* docs: added Optional typehint to LightGlueKeypointMatchingOutput
* chore: removed pad_images function
* chore: set do_grayscale default value to True in LightGlueImageProcessor
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* docs: added missing LightGlueConfig typehint in nn.Module __init__ methods
* docs: removed unnecessary code in docs
* docs: import SuperPointConfig only from a TYPE_CHECKING context
* chore: use PretrainedConfig arguments `num_hidden_layers` and `num_attention_heads` instead of `num_layers` and `num_heads`
* chore: added organization as arg in convert_lightglue_to_hf.py script
* refactor: set device variable
* chore: added "gelu" in LightGlueConfig as hidden_act parameter
* docs: added comments to reshape.flip.reshape instruction to perform cross attention
* refactor: used batched inference for keypoint detector forward pass
* fix: added fix for SDPA tests
* docs: fixed docstring for LightGlueImageProcessor
* [run-slow] lightglue
* refactor: removed unused line
* refactor: added missing arguments in LightGlueConfig init method
* docs: added missing LightGlueConfig typehint in init methods
* refactor: added checkpoint url as default variable to verify models output only if it is the default url
* fix: moved print message inside if statement
* fix: added log assignment r removal in convert script
* fix: got rid of confidence_thresholds as registered buffers
* refactor: applied suggestions from SuperGlue PR
* docs: changed copyright to 2025
* refactor: modular LightGlue
* fix: removed unnecessary import
* feat: added plot_keypoint_matching method to LightGlueImageProcessor with matplotlib soft dependency
* fix: added missing import error for matplotlib
* Updated convert script to push on ETH org
* fix: added missing licence
* fix: make fix-copies
* refactor: use cohere apply_rotary_pos_emb function
* fix: update model references to use ETH-CVG/lightglue_superpoint
* refactor: add and use intermediate_size attribute in config to inherit CLIPMLP for LightGlueMLP
* refactor: explicit variables instead of slicing
* refactor: use can_return_tuple decorator in LightGlue model
* fix: make fix-copies
* docs: Update model references in `lightglue.md` to use the correct pretrained model from ETH-CVG
* Refactor LightGlue configuration and processing classes
- Updated type hints for `keypoint_detector_config` in `LightGlueConfig` to use `SuperPointConfig` directly.
- Changed `size` parameter in `LightGlueImageProcessor` to be optional.
- Modified `position_embeddings` in `LightGlueAttention` and `LightGlueAttentionBlock` to be optional tuples.
- Cleaned up import statements across multiple files for better readability and consistency.
* refactor: Update LightGlue configuration to enforce eager attention implementation
- Added `attn_implementation="eager"` to `keypoint_detector_config` in `LightGlueConfig` and `LightGlueAttention` classes.
- Removed unnecessary logging related to attention implementation fallback.
- Cleaned up import statements for better readability.
* refactor: renamed message into attention_output
* fix: ensure device compatibility in LightGlueMatchAssignmentLayer descriptor normalization
- Updated the normalization of `m_descriptors` to use the correct device for the tensor, ensuring compatibility across different hardware setups.
* refactor: removed Conv layers from init_weights since LightGlue doesn't have any
* refactor: replace add_start_docstrings with auto_docstring in LightGlue models
- Updated LightGlue model classes to utilize the new auto_docstring utility for automatic documentation generation.
- Removed legacy docstring handling to streamline the code and improve maintainability.
* refactor: simplify LightGlue image processing tests by inheriting from SuperGlue
- Refactored `LightGlueImageProcessingTester` and `LightGlueImageProcessingTest` to inherit from their SuperGlue counterparts, reducing code duplication.
- Removed redundant methods and properties, streamlining the test setup and improving maintainability.
* test: forced eager attention implementation to LightGlue model tests
- Updated `LightGlueModelTester` to include `attn_implementation="eager"` in the model configuration.
- This change aligns the test setup with the recent updates in LightGlue configuration for eager attention.
* refactor: update LightGlue model references
* fix: import error
* test: enhance LightGlue image processing tests with setup method
- Added a setup method in `LightGlueImageProcessingTest` to initialize `LightGlueImageProcessingTester`.
- Included a docstring for `LightGlueImageProcessingTester` to clarify its purpose.
* refactor: added LightGlue image processing implementation to modular file
* refactor: moved attention blocks into the transformer layer
* fix: added missing import
* fix: added missing import in __all__ variable
* doc: added comment about enforcing eager attention because of SuperPoint
* refactor: added SuperPoint eager attention comment and moved functions to the closest they are used
---------
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Earlier PR put executorch specific sdpa and mask function in the export function. This prevent any customization that can be done to sdpa, prior to export. By moving this to __init__, we still keep the original behavior but allow users like optimum-executorch to override sdpa by setting model.config._attn_implementation.
* fixing the problem align_to_words=True leading to duplicate solutions
* adding tests
* some fixes
* some fixes
* changing the handle_duplicate_answers=False by default
* some fixese
* some fixes
* make the duplicate handling the default behaviour and merge duplicates
* make the duplicate handling the default behaviour