* feat: add flexible Liger Kernel configuration to TrainingArguments
Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.
Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag
Example usage:
```python
TrainingArguments(
use_liger_kernel=True,
liger_kernel_config={
"rope": True,
"swiglu": True,
"cross_entropy": False,
"fused_linear_cross_entropy": True
}
)
Closes#38905
* Address comments and update Liger section in Trainer docs
* siwtch to device agnostic autocast in nemotron to align xpu behavior w/
cuda
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix issue
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* refine
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update get_autocast_gpu_dtype to device agnostic one
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Fix HQQ model param device transfer issue
* modify a comment
* clear the code and add test for hqq device/dtype
* fix test hqq code quality of imports
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Correctly fix init
Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>
* add back the block, breaking BC but this is correct author's code
* override the test for params needing it
---------
Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>
* enable misc test cases on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* tweak bamba ground truth on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* remove print
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* init
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* Fixed dynamo bug and image padding tests
* refactor: applied refactoring changes from SuperGlue's concat, batch and stack functions to LightGlue file
* tests: removed sdpa support and changed expected values
* chore: added some docs and refactoring
* chore: fixed copy to superpoint.image_processing_superpoint.convert_to_grayscale
* feat: adding batch implementation
* feat: added validation for preprocess and post process method to LightGlueImageProcessor
* chore: changed convert_lightglue_to_hf script to comply with new standard
* chore: changed lightglue test values to match new lightglue config pushed to hub
* chore: simplified convert_lightglue_to_hf conversion map
* feat: adding batching implementation
* chore: make style
* feat: added threshold to post_process_keypoint_matching method
* fix: added missing instructions that turns keypoints back to absolute coordinate before matching forward
* fix: added typehint and docs
* chore: make style
* [run-slow] lightglue
* fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching
* tests: added CUDA proof tests similar to SuperGlue
* chore: various changes to modeling_lightglue.py
- Added "Copies from" statements for copied functions from modeling_superglue.py
- Added missing docstrings
- Removed unused functions or classes
- Removed unnecessary statements
- Added missing typehints
- Added comments to the main forward method
* chore: various changes to convert_lightglue_to_hf.py
- Added model saving
- Added model reloading
* chore: fixed imports in lightglue files
* [run-slow] lightglue
* chore: make style
* [run-slow] lightglue
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* [run-slow] lightglue
* chore: Applied some suggestions from review
- Added missing typehints
- Refactor "cuda" to device variable
- Variable renaming
- LightGlue output order changed
- Make style
* fix: added missing grayscale argument in image processor in case use of SuperPoint keypoint detector
* fix: changed lightglue HF repo to lightglue_superpoint with grayscale default to True
* refactor: make keypoints `(batch_size, num_keypoints, keypoint_dim)` through forward and unsqueeze only before attention layer
* refactor: refactor do_layer_keypoint_pruning
* tests: added tests with no early stop and keypoint pruning
* refactor: various refactoring to modeling_lightglue.py
- Removed unused functions
- Renamed variables for consistency
- Added comments for clarity
- Set methods to private in LightGlueForKeypointMatching
- Replaced tensor initialization to list then concatenation
- Used more pythonic list comprehension for repetitive instructions
* refactor: added comments and renamed filter_matches to get_matches_from_scores
* tests: added copied from statement with superglue tests
* docs: added comment to prepare_keypoint_matching_output function in tests
* [run-slow] lightglue
* refactor: reordered _concat_early_stopped_outputs in LightGlue class
* [run-slow] lightglue
* docs: added lightglue.md model doc
* docs: added Optional typehint to LightGlueKeypointMatchingOutput
* chore: removed pad_images function
* chore: set do_grayscale default value to True in LightGlueImageProcessor
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* docs: added missing LightGlueConfig typehint in nn.Module __init__ methods
* docs: removed unnecessary code in docs
* docs: import SuperPointConfig only from a TYPE_CHECKING context
* chore: use PretrainedConfig arguments `num_hidden_layers` and `num_attention_heads` instead of `num_layers` and `num_heads`
* chore: added organization as arg in convert_lightglue_to_hf.py script
* refactor: set device variable
* chore: added "gelu" in LightGlueConfig as hidden_act parameter
* docs: added comments to reshape.flip.reshape instruction to perform cross attention
* refactor: used batched inference for keypoint detector forward pass
* fix: added fix for SDPA tests
* docs: fixed docstring for LightGlueImageProcessor
* [run-slow] lightglue
* refactor: removed unused line
* refactor: added missing arguments in LightGlueConfig init method
* docs: added missing LightGlueConfig typehint in init methods
* refactor: added checkpoint url as default variable to verify models output only if it is the default url
* fix: moved print message inside if statement
* fix: added log assignment r removal in convert script
* fix: got rid of confidence_thresholds as registered buffers
* refactor: applied suggestions from SuperGlue PR
* docs: changed copyright to 2025
* refactor: modular LightGlue
* fix: removed unnecessary import
* feat: added plot_keypoint_matching method to LightGlueImageProcessor with matplotlib soft dependency
* fix: added missing import error for matplotlib
* Updated convert script to push on ETH org
* fix: added missing licence
* fix: make fix-copies
* refactor: use cohere apply_rotary_pos_emb function
* fix: update model references to use ETH-CVG/lightglue_superpoint
* refactor: add and use intermediate_size attribute in config to inherit CLIPMLP for LightGlueMLP
* refactor: explicit variables instead of slicing
* refactor: use can_return_tuple decorator in LightGlue model
* fix: make fix-copies
* docs: Update model references in `lightglue.md` to use the correct pretrained model from ETH-CVG
* Refactor LightGlue configuration and processing classes
- Updated type hints for `keypoint_detector_config` in `LightGlueConfig` to use `SuperPointConfig` directly.
- Changed `size` parameter in `LightGlueImageProcessor` to be optional.
- Modified `position_embeddings` in `LightGlueAttention` and `LightGlueAttentionBlock` to be optional tuples.
- Cleaned up import statements across multiple files for better readability and consistency.
* refactor: Update LightGlue configuration to enforce eager attention implementation
- Added `attn_implementation="eager"` to `keypoint_detector_config` in `LightGlueConfig` and `LightGlueAttention` classes.
- Removed unnecessary logging related to attention implementation fallback.
- Cleaned up import statements for better readability.
* refactor: renamed message into attention_output
* fix: ensure device compatibility in LightGlueMatchAssignmentLayer descriptor normalization
- Updated the normalization of `m_descriptors` to use the correct device for the tensor, ensuring compatibility across different hardware setups.
* refactor: removed Conv layers from init_weights since LightGlue doesn't have any
* refactor: replace add_start_docstrings with auto_docstring in LightGlue models
- Updated LightGlue model classes to utilize the new auto_docstring utility for automatic documentation generation.
- Removed legacy docstring handling to streamline the code and improve maintainability.
* refactor: simplify LightGlue image processing tests by inheriting from SuperGlue
- Refactored `LightGlueImageProcessingTester` and `LightGlueImageProcessingTest` to inherit from their SuperGlue counterparts, reducing code duplication.
- Removed redundant methods and properties, streamlining the test setup and improving maintainability.
* test: forced eager attention implementation to LightGlue model tests
- Updated `LightGlueModelTester` to include `attn_implementation="eager"` in the model configuration.
- This change aligns the test setup with the recent updates in LightGlue configuration for eager attention.
* refactor: update LightGlue model references
* fix: import error
* test: enhance LightGlue image processing tests with setup method
- Added a setup method in `LightGlueImageProcessingTest` to initialize `LightGlueImageProcessingTester`.
- Included a docstring for `LightGlueImageProcessingTester` to clarify its purpose.
* refactor: added LightGlue image processing implementation to modular file
* refactor: moved attention blocks into the transformer layer
* fix: added missing import
* fix: added missing import in __all__ variable
* doc: added comment about enforcing eager attention because of SuperPoint
* refactor: added SuperPoint eager attention comment and moved functions to the closest they are used
---------
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fixing the problem align_to_words=True leading to duplicate solutions
* adding tests
* some fixes
* some fixes
* changing the handle_duplicate_answers=False by default
* some fixese
* some fixes
* make the duplicate handling the default behaviour and merge duplicates
* make the duplicate handling the default behaviour
* adding model and conversion scripts
* add imports to test vjepa conversion
* fix imports and make conversion work
* fix computation for short side
* replace attention with library attention function
* cleanup more attention classes
* remove config overrides
* add test cases, fix some of the failing ones
* fix the model outputs
* fix outputs of the model per review
* fix too big model test case
* fix styling __init__.py
* fix initialization test
* remove all asserts per review
* update sorting unsorting logic as per feedback
* remove is_video per review
* remove another is_video segment
* remove unwanted stuff
* small fixes
* add docstrings for the model
* revert adding vjepa2 config here
* update styling
* add config docstrings (wip)
* fix dpr issue
* removed test failing issues
* update styles
* merge predictor configs into main config
* remove processing code, add video processor
* remove permute which is not necessary now
* fix styles
* updated vjepa2 to be in video_processing_auto
* update comment for preprocessing
* test integration test and fix the outputs
* update test values, change test to look at repeated frames for a given image
* add a simple video processing test
* refactoring pixel_values_videos and upload ckpts to original
* fix torch_fx test cases
* remove unused config
* add all config docstrings
* add more integration tests
* add basic doc
* revert unwanted styling changes
* working make fixup
* Fix model_type in config
* Add ForVideoClassification model
* update attention implementation to fit new hf standards
* fix the preprocessing logic, ensure it matches the original model
* remove use_rope logic, cleanup
* fix docstrings
* Further cleanup, update doc
* Fix model prefix
* fix get_vision_features
* VJEPA2Embeddings style refactor
* nit, style comment
* change modules default values
* Only `str` activation in config
* GradientCheckpointingLayer
* fixup
* fix conversion script
* Remove return_dict
* remove None return typehint
* Refactor VJEPA2Layer, remove use_SiLU
* Fix fx tests
* dpr -> drop_path_rates
* move *ModelOutput on top
* format docs bit
* update docs
* update docs
* update doc example
* remove prune_heads from model
* remove unused config params
* refactor embed signature
* Add vjepa to docs
* Fix config docstring
* attention head
* update defaults
* Update docs/source/en/model_doc/vjepa2.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/vjepa2.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fix import
* Min refactoring
* Update HUB_SOURCE and HUB_REPO in conversion script
* Add missing headers
* VJEPA -> V-JEPA in docs
* Add image to doc
* fix style
* fix init weights
* change checkpoint name in modeling tests
* Initial cls head setup
* remove rop attention from head (not needed)
* remove swigluffn - not needed
* Add siglip layer
* Replace with siglip layer
* Rename Siglip - VJEPA2
* remove unused modules
* remove siglip mlp
* nit
* remove MLP
* Refactor head cross attention
* refactor VJEPA2HeadCrossAttentionLayer
* nit renaming
* fixup
* remove commented code
* Add cls head params to config
* depth from config
* move pooler + classifier to the model
* Update for cls model signature
* move layers, rename a bit
* fix docs
* update weights init
* remove typehint for init
* add to auto-mapping
* enable tests
* Add conversion script
* fixup
* add to docs
* fix docs
* nit
* refactor for mapping
* clean
* Add integration test
* Fixing multi gpu test
* update not-split-modules
* update video cls test tolerance
* Increase test_inference_image tolerance
* Update no-split modules for multi gpu
* Apply suggestions from code review
* fixing multi-gpu
* fix docstring
* Add cls snippet to docs
* Update checkpoint
* Refactor DBRX tests to use CausalLMModelTest base classes
- Changed DbrxModelTester to inherit from CausalLMModelTester
- Changed DbrxModelTest to inherit from CausalLMModelTest
- Removed duplicate methods that are already in base classes
- Added required class attributes for model classes
- Updated pipeline_model_mapping to include feature-extraction
- Kept DBRX-specific configuration and test methods
- Disabled RoPE tests as DBRX's rotary embedding doesn't accept config parameter
This refactoring reduces code duplication and follows the pattern established
in other causal LM model tests like Gemma.
* Apply style fixes
* Trigger tests
* Refactor DBRX test
* Make sure the DBRX-specific settings are handled
* Use the attribute_map
* Fix attribute map
---------
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Unbreak optimum-executorch
* use static cache if has layer_types but no sliding_window
* revert view on kv_arange
---------
Co-authored-by: Guang Yang <guangyang@fb.com>
* remove it from all py files
* remove it from the doc
* remove it from examples
* style
* remove traces of _fast_init
* Update test_peft_integration.py
* CIs
* apply updates smolVLM (still needs workaround for chat template)
* add other models
* dump qwen omni for now, come back later
* port qwen omni from their impl
* wait, all qwens sample videos in same way!
* clean up
* make smolvlm backwards compatible and fix padding
* dix some tests
* fox smolvlm tests
* more clean up and test fixing
* delete unused arg
* fix
* address comments
* style
* fix test
* adding model and conversion scripts
* add imports to test vjepa conversion
* fix imports and make conversion work
* fix computation for short side
* replace attention with library attention function
* cleanup more attention classes
* remove config overrides
* add test cases, fix some of the failing ones
* fix the model outputs
* fix outputs of the model per review
* fix too big model test case
* fix styling __init__.py
* fix initialization test
* remove all asserts per review
* update sorting unsorting logic as per feedback
* remove is_video per review
* remove another is_video segment
* remove unwanted stuff
* small fixes
* add docstrings for the model
* revert adding vjepa2 config here
* update styling
* add config docstrings (wip)
* fix dpr issue
* removed test failing issues
* update styles
* merge predictor configs into main config
* remove processing code, add video processor
* remove permute which is not necessary now
* fix styles
* updated vjepa2 to be in video_processing_auto
* update comment for preprocessing
* test integration test and fix the outputs
* update test values, change test to look at repeated frames for a given image
* add a simple video processing test
* refactoring pixel_values_videos and upload ckpts to original
* fix torch_fx test cases
* remove unused config
* add all config docstrings
* add more integration tests
* add basic doc
* revert unwanted styling changes
* working make fixup
* Fix model_type in config
* update attention implementation to fit new hf standards
* fix the preprocessing logic, ensure it matches the original model
* remove use_rope logic, cleanup
* fix docstrings
* Further cleanup, update doc
* Fix model prefix
* fix get_vision_features
* VJEPA2Embeddings style refactor
* nit, style comment
* change modules default values
* Only `str` activation in config
* GradientCheckpointingLayer
* fixup
* fix conversion script
* Remove return_dict
* remove None return typehint
* Refactor VJEPA2Layer, remove use_SiLU
* Fix fx tests
* dpr -> drop_path_rates
* move *ModelOutput on top
* format docs bit
* update docs
* update docs
* update doc example
* remove prune_heads from model
* remove unused config params
* refactor embed signature
* Add vjepa to docs
* Fix config docstring
* update defaults
* Update docs/source/en/model_doc/vjepa2.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/vjepa2.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fix import
* Min refactoring
* Update HUB_SOURCE and HUB_REPO in conversion script
* Add missing headers
* VJEPA -> V-JEPA in docs
* Add image to doc
* fix style
* fix init weights
* change checkpoint name in modeling tests
---------
Co-authored-by: Koustuv Sinha <koustuv.sinha@mail.mcgill.ca>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Koustuv Sinha <koustuvsinha@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* since 1.11.0, torchao.prototype.low_bit_optim is promoted to
torchao.optim
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix review comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* enable glm4 integration cases on XPU, set xpu expectation for blip2
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* refine wording
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* refine test case names
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* run
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* add gemma2 and chameleon
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix review comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>