* Update igbird_pegasus.md
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Updating Gemma 3n docs and docstrings to clarify the relationship
between the newly trained audio encoder used in Gemma 3n and the USM
model from the original paper.
* Add Fast Image Processor for Chameleon
* add warning to resize and move blend_rgba to convert_to_rgb
* Remove unrelated files
* Update image_processing_chameleon_fast to use auto_docstring
* fix equivalence test
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* add fast image processor nougat
* test fixes
* docstring white space
* last fixes
* docstring_type
* tolerance unit test
* fix tolerance
* fix rtol
* remove traling white space
* remove white space
* note for tolerance unit test
* fix tests
* remove print
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* Update PEGASUS-X model card
* Add cache_implementation argument in quantization code example
* Update CLI example
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Remove TensorFlow and Flax badges
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs: first draft to more standard SuperPoint documentation
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs: reverted changes on Auto classes
* docs: addressed the rest of the comments
* docs: remove outdated reference to keypoint detection task guide in SuperPoint documentation
* Update superpoint.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Gemma 3n
* initial commit of Gemma 3n scaffold
* Fixing param pass through on Gemm3p5RMSNorm
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Adds AltUp to Gemma 3n
* Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3)
* Adding gemma3p5 text configs
* Adding audio config placeholders
* Adding a placeholder for vision configs
* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig
* Updating text configs
* Update src/transformers/models/gemma3p5/modular_gemma3p5.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Removing altup configs to accept the suggested configs
* Update src/transformers/models/gemma3p5/modular_gemma3p5.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating altup config
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Addressing review comments and updating text configs
* Adding a config for activation sparsity
* Updating configs to pass through options to super class init and adjust some name prefixes
* Updating laurel and altup with corrected config values
* Normalizing sub_config initializers
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating MLP with activation sparsity (#2)
* Updating DecoderBlock for Gemma 3n (#3)
* Initial Gemm3nTextModel (#4)
NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.
* Adding KV Cache Sharing
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Refactored kv cache sharing in attention
* Adding KVStore for cache sharing
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Updating KV Cache Sharing implementation
* Updating the q and k norm definitions in the attention module
* Fixing name error for q,k,v RMS norm to use the right 3n module
* Updating MLP with activation sparsity
* Updating DecoderBlock for Gemma 3.5
* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code
* Isolating KV Cache logic to relevant components
* Fixing logic error in Gemma3nAttention.forward
* Refactoring caching contributions and fixing kv_store initialization
* Simplifying Configs
* Remove errant self from super init call
* Bug fix in the Attention module - changing self.head_dim to config.head_dim
* Bug fixes in the LaurelBlock and RMS Norm super init call
* removing redundant code from a merge
* Adding per_layer_inputs to TextModel
* Adding preprocess embeddings with altup
* Adds per-layer-to-single output and a host of TODOs
* Integrating altup predict with the model workflow and other minor bug fixes
* Using nn.Embedding temporarily for text model
* It goes forward
* Minor refactor of attention sparsity and RoPE initialization
* Fixing duplicate rope_scaling param bug when loading from pretrained
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Normalizing on altup_num_inputs config option
* regenerating modeling file after syncing to HEAD
* Use torch.std(..., unbiased=False) for activation sparsity (#8)
* Refactoring to a single QVK Norm (#13)
* AltUp: support scale_corrected_output (#14)
* Converts einsums to nn.Linear (#7)
* Converts einsums to nn.Linear
* Removing unused variables
* Aligning SharedKVCache with HybridCache (#11)
* Alinging SharedKVStore with HybridCache
* Remove KVStore. Refactor apply_rotary_pos_emb for sharing
* Addressing review comments
* Supporting split modality embeddings in Gemma3n (#10)
* Adding the Embedder class
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation
* Apply suggestions from code review
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Addressing review comments, prop drilling audio and vision configs to the text config
* Removing TODO's that have been addressed
* Simplify Embedder init and add audio embeddings
* Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder
* Refactoring vision and audio embeddings into ConditionalGeneration model
---------
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating attention mask for Gemma 3.5 (#15)
* xxx_token_index to xxx_token_id
* remvoing deprecated last_cache_position
* Removing references to SigLIP
* Always init per-layer inputs
* Using torch.finfo().min for epsilon_tensor
* Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas
* fix modular GEMMA3N_INPUTS_DOCSTRING
* Gemma3nAttention inherits from Gemma3Attention
* Modular inheritance fixes
* CausalLM conversion script for 4B model (#16)
* Add Gemma3n Audio Encoder (#6)
* initial commit of Gemma 3.5 scaffold
* Fixing param pass through on Gemm3nRMSNorm
* Adds Einsum layer to Gemma 3.5
* Updating EinsumLayer API
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Adds AltUp to Gemma 3n
* Adding Gemma3n overall and text config with vision and audio config placeholders (#3)
* Adding gemma3n text configs
* Adding audio config placeholders
* Adding a placeholder for vision configs
* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig
* Updating text configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Removing altup configs to accept the suggested configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating altup config
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Addressing review comments and updating text configs
* Adding a config for activation sparsity
* Updating configs to pass through options to super class init and adjust some name prefixes
* Updating laurel and altup with corrected config values
* Normalizing sub_config initializers
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating MLP with activation sparsity (#2)
* Updating DecoderBlock for Gemma 3.5 (#3)
* Initial Gemm3nTextModel (#4)
NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.
* Adding KV Cache Sharing
* Adds Einsum layer to Gemma 3.5
* Updating EinsumLayer API
* Refactored kv cache sharing in attention
* Adding KVStore for cache sharing
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Updating KV Cache Sharing implementation
* Updating the q and k norm definitions in the attention module
* Fixing name error for q,k,v RMS norm to use the right Gemma 3n module
* Updating MLP with activation sparsity
* Updating DecoderBlock for Gemma 3.5
* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code
* Isolating KV Cache logic to relevant components
* Fixing logic error in Gemma3nAttention.forward
* Refactoring caching contributions and fixing kv_store initialization
* Simplifying Configs
* Remove errant self from super init call
* Bug fix in the Attention module - changing self.head_dim to config.head_dim
* Bug fixes in the LaurelBlock and RMS Norm super init call
* removing redundant code from a merge
* Adding per_layer_inputs to TextModel
* Adding preprocess embeddings with altup
* Adds per-layer-to-single output and a host of TODOs
* Integrating altup predict with the model workflow and other minor bug fixes
* Using nn.Embedding temporarily for text model
* It goes forward
* Minor refactor of attention sparsity and RoPE initialization
* Fixing duplicate rope_scaling param bug when loading from pretrained
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Normalizing on altup_num_inputs config option
* Adding audio encoder config
* Adds high-level components for Audio Encoder
* Implement uniform reducer for Audio Encoder
* Adding placeholders for Conformer components in Audio Encoder
* Adding placeholders for SubSampleConvProjection components in Audio Encoder
* Adding SequenceLayer component placeholders
* Implementing Gemma3nAudioEncoder with nn.Sequential
* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential
* Implementing Conformer model with SequenceLayers
* Use OrderedDict in nn.Sequential initializers
* Implements sl.Residual in Torch with nn.Sequential and OrderedDict
* Adopting a base SequenceLayer class with default forward() method
* Implementing sl.GatedLinearUnit in Torch
* Implementing sl.Swish in Torch
* Implementing sl.ReLU in Torch
* Implementing sl.Scale in Torch
* Removing sl.Dropout after tree-shaking
* Implementing sl.RMSNorm in Torch with fake shape
* Implementing sl.GroupNorm in Torch
* Implementing sl.Conv2d in Torch
* Implementing sl.Dense in Torch
* Removing sl.Delay layers, which act as pass-throughs
* Connecting shapes to configs in initializers
* Removing sl.Emit
* Implementing sl.ExpandDims in Torch
* Adding sl.GradientClipping to Torch
* Implementing sl.DenseShaped in Torch
* Implementing sl.LDPA in Torch
* Removing unused sl.CombinedQKVProj class
* Fixing erroneous type hint
* Implemnenting sl.DepthwiseConv1D in Torch
* Implementing sl.MaskInvalid in Torch
* Fixes for initialization
* Fixes for saving weights
* Removing einsums per feedback from HF staff
* Removing Sequence Layers idioms from audio encoder
* Fixes for reviewer comments
* CausalLM conversion script for 4B model
* inv_timescales to non-persistent buffer
* Addressing audio encoder Attention feedback
* Addressing Gemma3nAudioSSCPConvBlock feedback
* Addressing Gemma3nAudioConformerAttention feedback
* Addressing padding feedback
* Weights conversion loads audio state dict
* Always use vision_config so saving works
* Token id updates for configs
* Stubs for interleaving audio embs
* Addressing reviewer feedback
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
* Fixing cache access error
* Removing duplicate code from a bad merge
* Gemma 3n Text + Vision Part 1 (#17)
* testing utilities for numerics comparisons
* Corrected einsum to nn.Linear weights conversion
* Inherit scaled word embs from Gemma3 not Bart
* Fixing transposes for collapsed linears
* More transpose fixes
* numpy api fix
* RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True
* Force AltUp to float32
* Updating debugging script for AudioEncoder debugging
* Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs
* Correcting attention einsum conversions
* RMSNorm in type of x
* Fixing douplicate laurel norm/gating
* KV sharing using the right previous indices
* Refactor kv shared index computation. Correct frac_shared_layers
* Use num_shared_layers instead of inferring from a fraction
* fixing a bug for logging
* Fix shared data_ptrs in altup inits
* rope: adjust proj -> norm -> rope to preserve computation (#20)
* rope: adjust proj -> norm -> rope to preserve computation
* Removing some breaking language model fluff in ConditionalGeneration
* Consolidate query_states transforms
---------
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Vectorize the loops in AltUp (#19)
* Vectorize the loops in AltUp
* fix typo
* Expanding to support batched inputs
* remove extra debug script
* Fix AltUp.forward
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel
* Convert norm to 1/sqrt (#21)
* Convert norm to 1/sqrt
* Scale shift change per Phil's rec
* Adding default activation sparsity
* Fixing 2B config in weights conversion script
* Fixing RMSNorm parameters - adding scale_shift and with_scale
* Correcting query pre-attention scaling
* Adding query_rescale_scalar to text config
* Adding layer_idx to MLP
* Permafix for input_layernorm
* Use 1/sqrt instead of rsqrt in DecoderLayer
* Fix o_proj conversion
* Conversion script update for vision encoder
* Removing logging for debugging timm model
* Fixing bugs in Gemma3nForConditionalGeneration for text generation
* Generating the modeling_gemma3n.py file
* Removing the addition of an erroneous line in the modeling file
* Adding gemma3n text model to modeling_auto
* Bugfix: Updating the interleaving of inputs_embeds and vision_embeds
* Updating the modeling file with the latest bugfix changes
* Updating models/auto for Gemma 3n
* using AutoTokenizer in forward test
* Adding processing_gemma3n.py
* Gemma 3n configured for AutoModel. Conversion script updated.
* Removing errant merge artifacts
---------
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
* Removing errant debugging statements from Gemma 3
* Gemma3n audio model (#18)
* testing utilities for numerics comparisons
* Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock
* Add audio version of forward script based on RyanMullins' implementation
* Updating to match encoder tests. WIP: config question needs resolving
* Updates to audio classes to enable end-to-end running
* Removing vestigial classes, cleaning up print statements
* Adding SiLU / Swish to audio conformer feed forward block
* Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio
* Adding outputs to audio test
* Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model
* Update forward test to load from local weights
* Update conversion to process / output audio layers
* Update __all__ to export audio encoder
* AutoModel registration for Gemma 3n Audio
* Use AutoModel for ConditionalGeneration.audio_tower
* Fixing input_proj_linear transpose
* Fixing Gemma3NanoAudioConformerAttention.post conversion
* Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion
* Correcting indentation issue on Gemma3p5RMSNorm
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Text + Vision Part 2 (#23)
* Updates for ConditionalGeneration.get_image_features
* Adding a WIP draft of image_processing_gemma3p5.py
* Update src/transformers/models/gemma3p5/modular_gemma3p5.py
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Modular conversion after github suggested change
* Text + image gives good results
* Fixing image size preset
* Updating configs for the 2B variant in the conversion script
* Using final generation config in conversion script
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Audio Integration (#12)
* initial commit of Gemma 3n scaffold
* Fixing param pass through on Gemm3nRMSNorm
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Adds AltUp to Gemma 3n
* Adding Gemma 3n overall and text config with vision and audio config placeholders (#3)
* Adding Gemma 3n text configs
* Adding audio config placeholders
* Adding a placeholder for vision configs
* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig
* Updating text configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Removing altup configs to accept the suggested configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating altup config
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Addressing review comments and updating text configs
* Adding a config for activation sparsity
* Updating configs to pass through options to super class init and adjust some name prefixes
* Updating laurel and altup with corrected config values
* Normalizing sub_config initializers
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating MLP with activation sparsity (#2)
* Updating DecoderBlock for Gemma 3n (#3)
* Initial Gemma3nTextModel (#4)
NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.
* Adding KV Cache Sharing
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Refactored kv cache sharing in attention
* Adding KVStore for cache sharing
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Updating KV Cache Sharing implementation
* Updating the q and k norm definitions in the attention module
* Fixing name error for q,k,v RMS norm to use the right 3n module
* Updating MLP with activation sparsity
* Updating DecoderBlock for Gemma 3n
* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code
* Isolating KV Cache logic to relevant components
* Fixing logic error in Gemma3nAttention.forward
* Refactoring caching contributions and fixing kv_store initialization
* Simplifying Configs
* Remove errant self from super init call
* Bug fix in the Attention module - changing self.head_dim to config.head_dim
* Bug fixes in the LaurelBlock and RMS Norm super init call
* removing redundant code from a merge
* Adding per_layer_inputs to TextModel
* Adding preprocess embeddings with altup
* Adds per-layer-to-single output and a host of TODOs
* Integrating altup predict with the model workflow and other minor bug fixes
* Using nn.Embedding temporarily for text model
* It goes forward
* Minor refactor of attention sparsity and RoPE initialization
* Fixing duplicate rope_scaling param bug when loading from pretrained
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Normalizing on altup_num_inputs config option
* Adding audio encoder config
* Adds high-level components for Audio Encoder
* Implement uniform reducer for Audio Encoder
* Adding placeholders for Conformer components in Audio Encoder
* Adding placeholders for SubSampleConvProjection components in Audio Encoder
* Adding SequenceLayer component placeholders
* Implementing Gemma3nAudioEncoder with nn.Sequential
* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential
* Implementing Conformer model with SequenceLayers
* Use OrderedDict in nn.Sequential initializers
* Implements sl.Residual in Torch with nn.Sequential and OrderedDict
* Adopting a base SequenceLayer class with default forward() method
* Implementing sl.GatedLinearUnit in Torch
* Implementing sl.Swish in Torch
* Implementing sl.ReLU in Torch
* Implementing sl.Scale in Torch
* Removing sl.Dropout after tree-shaking
* Implementing sl.RMSNorm in Torch with fake shape
* Implementing sl.GroupNorm in Torch
* Implementing sl.Conv2d in Torch
* Implementing sl.Dense in Torch
* Removing sl.Delay layers, which act as pass-throughs
* Connecting shapes to configs in initializers
* Removing sl.Emit
* Implementing sl.ExpandDims in Torch
* Adding sl.GradientClipping to Torch
* Implementing sl.DenseShaped in Torch
* Implementing sl.LDPA in Torch
* Removing unused sl.CombinedQKVProj class
* Fixing erroneous type hint
* Implemnenting sl.DepthwiseConv1D in Torch
* Implementing sl.MaskInvalid in Torch
* Fixes for initialization
* Fixes for saving weights
* Removing einsums per feedback from HF staff
* Removing Sequence Layers idioms from audio encoder
* Fixes for reviewer comments
* Converting sl.Frontend to FeatureExtractor
* Updates for ConditionalGeneration.get_image_features
* Adding a WIP draft of image_processing_gemma3n.py
* Update modular
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Modular conversion after github suggested change
* Text + image gives good results
* Fixing image size preset
* Draft of audio data in chat template
* Removing image processing. Using SigLIP instead.
* Audio input going end-to-end
* Fixing dtype issues in audio encoder
* x-lib formatting consistency
* Adding example data
* Save preprocessor_config.json from conversion script
* Instrumentaiton for debugging
* Additional instrumentation for preprocessing debugging
* Updates to preprocessor, padding; produces correct end-to-end results on sample
* Tackling configuraiton TODOs
* Start of feature extractor refatcor
* Adds Numpy version of USM extractor, removes Torch version and dependencies
* Fixing AltUp.correct coef permute
* Supporting batches of single audio segment inputs
* Docstrings updates for config
* In-lining audio feature extraction
* Adjustments to conversion script and smoke test script
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
* Gemma 3n renaming
* Removing test data and utilities
* Renaming test files
* Gemma 3n refactor
* Fix tokenizer config in conversion script
* Address reviewer feedback
* FeatureExtractor returns float32 by default
* Adding basic tests for audio, and input name for audio encoder
* Audio integration test, updates to model_id for other integration tests
* Use scales for q and k norms (#26)
* Update audio integration test to use HF dataset
* Reviewer feedback
* Expand embedding table to full vocab size in weights conversion
* Mix-n-match MatFormers for Gemma 3n (#25)
* Remove in-place operations (#30)
* chore: removing inplace ops
* remove [tensor] * n pattern
* chore: reviewer feedback in AudioEncoder and AltUp
* More grad clipping
* Dynamo compatibility
* fix: cache slicing error
* chore: simplify shared kv cache slicing
* chore: vision encoder rename in timm
* fix: image processor do_normalize=False
* fixup: style
* chore: model_doc
* fix: docs for code quality
* chore: repo consistency
* fix: RMSNorm in float as in prior Gemmas
* fix: per_layer_inputs = None
* chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint
* chore: repo consistency
* Add initial unit tests for Gemma3nAudioFeatureExtractor (#27)
* Add initial unit tests for Gemma3nAudioFeatureExtractor
* Add basic unit tests for Gemma3nProcessor (#28)
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
* parameterize tests
---------
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
* chore: code style
* fix: test cases
* style and consistency
* fix config in the test to be coherent with layer cache sharing
* fix hidden states in tests and code
* inits and mappings
* fix modality prefixes
* test order and prefixes
* fix test exception
* fix class order and reduce model size for faster tests
* restore _checkpoint_conversion_mapping to load Caual from Conditional
* fix config mapping!
* fix: reviewer feedback
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* fix import test
* add model args
* auto_docstring
* replace test path
* consistency
* skip tests for now
* fix docstring for doc builder
* skip unused attr
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* add dia model
* add tokenizer files
* cleanup some stuff
* brut copy paste code
* rough cleanup of the modeling code
* nuke some stuff
* more nuking
* more cleanups
* updates
* add mulitLayerEmbedding vectorization
* nits
* more modeling simplifications
* updates
* update rope
* update rope
* just fixup
* update configuration files
* more cleanup!
* default config values
* update
* forgotten comma
* another comma!
* update, more cleanups
* just more nits
* more config cleanups
* time for the encoder
* fix
* sa=mall nit
* nits
* n
* refacto a bit
* cleanup
* update cv scipt
* fix last issues
* fix last nits
* styling
* small fixes
* just run 1 generation
* fixes
* nits
* fix conversion
* fix
* more fixes
* full generate
* ouf!
* fixes!
* updates
* fix
* fix cvrt
* fixup
* nits
* delete wrong test
* update
* update
* test tokenization
* let's start changing things bit by bit - fix encoder step
* removing custom generation, moving to GenerationMixin
* add encoder decoder attention masks for generation
* mask changes, correctness checked against ad29837 in dia repo
* refactor a bit already --> next cache
* too important not to push :)
* minimal cleanup + more todos
* make main overwrite modeling utils
* add cfg filter & eos filter
* add eos countdown & delay pattern
* update eos countdown
* add max step eos countdown
* fix tests
* fix some things
* fix generation with testing
* move cfg & eos stuff to logits processor
* make RepetitionPenaltyLogitsProcessor flexible
- can accept 3D scores like (batch_size, channel, vocab)
* fix input_ids concatenation dimension in GenerationMixin for flexibility
* Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility.
* Add stopping criteria
* refactor
* move delay pattern from processor to modeling like musicgen.
- add docs
- change eos countdown to eos delay pattern
* fix processor & fix tests
* refactor types
* refactor imports
* format code
* fix docstring to pass ci
* add docstring to DiaConfig & add DiaModel to test
* fix docstring
* add docstring
* fix some bugs
* check
* porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first
* experimental testing of left padding for first channel
* whoops
* Fix merge to make generation work
* fix cfg filter
* add position ids
* add todos, break things
* revert changes to generation --> we will force 2d but go 3d on custom stuff
* refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos
* some first fixes to get to 10. in generation
* some more generation fixes / adjustment
* style + rope fixes
* move cfg out, simplify a few things, more todos
* nit
* start working on custom logit processors
* nit
* quick fixes
* cfg top k
* more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar
* lets keep changes to core code minimal, only eos scaling is questionable atm
* simpler eos delay logits processor
* that was for debugging :D
* proof of concept rope
* small fix on device mismatch
* cfg fixes + delay logits max len
* transformers rope
* modular dia
* more cleanup
* keep modeling consistently 3D, generate handles 2D internally
* decoder starts with bos if nothing
* post processing prototype
* style
* lol
* force sample / greedy + fixes on padding
* style
* fixup tokenization
* nits
* revert
* start working on dia tests
* fix a lot of tests
* more test fixes
* nit
* more test fixes + some features to simplify code more
* more cleanup
* forgot that one
* autodocs
* small consistency fixes
* fix regression
* small fixes
* dia feature extraction
* docs
* wip processor
* fix processor order
* processing goes brrr
* transpose before
* small fix
* fix major bug but needs now a closer look into the custom processors esp cfg
* small thing on logits
* nits
* simplify indices and shifts
* add simpler version of padding tests back (temporarily)
* add logit processor tests
* starting tests on processor
* fix mask application during generation
* some fixes on the weights conversion
* style + fixup logits order
* simplify conversion
* nit
* remove padding tests
* nits on modeling
* hmm
* fix tests
* trigger
* probably gonna be reverted, just a quick design around audio tokenizer
* fixup typing
* post merge + more typing
* initial design for audio tokenizer
* more design changes
* nit
* more processor tests and style related things
* add to init
* protect import
* not sure why tbh
* add another protect
* more fixes
* wow
* it aint stopping :D
* another missed type issue
* ...
* change design around audio tokenizer to prioritize init and go for auto - in regards to the review
* change to new causal mask function + docstrings
* change ternary
* docs
* remove todo, i dont think its essential tbh
* remove pipeline as current pipelines do not fit in the current scheme, same as csm
* closer to wrapping up the processor
* text to audio, just for demo purposes (will likely be reverted)
* check if it's this
* save audio function
* ensure no grad
* fixes on prefixed audio, hop length is used via preprocess dac, device fixes
* integration tests (tested locally on a100) + some processor utils / fixes
* style
* nits
* another round of smaller things
* docs + some fixes (generate one might be big)
* msytery solved
* small fix on conversion
* add abstract audio tokenizer, change init check to abstract class
* nits
* update docs + fix some processing :D
* change inheritance scheme for audio tokenizer
* delete dead / unnecessary code in copied generate loop
* last nits on new pipeline behavior (+ todo on tests) + style
* trigger
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
* remove trust_remote_code
* again
* Revert "Skip some tests for now (#38931)"
This reverts commit 31d30b7224.
* again
* style
* again
* again
* style
* fix integration test
* fix tests
* style
* fix
* fix
* fix the last ones
* style
* last one
* fix last
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Initial submit
* Fix bugs:
1. add __init__ file
2. tied word embedding
3. support flash/flex attention
4. model saving and loading
* Code refactor:
* Rename encdecgemma to t5gemma.
* Split attention into self- and cross-attention
* Split stack into encoder and decoder
* Add test cases
* Add auto configuration
* Update configurations.
* Fix bugs related to copy and attribute checks
* Fix type union
* Fix merge errors
* run ruff format
* Run make style and update tests.
* Add t5gemma model doc.
* ruff and style formatting.
* Add missed module config.
* Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.).
* Update model doc.
* Minor updates following Arthur's comments:
* replace docstrings with auto_docstrings
* remove checkpoint layers
* remove deprecate_kwargs
* fix rebase errors
* Fix docstring issues.
* fix t5gemma doc issue.
* run ruff format
* Updates:
* split encoder-only model out
* make t5gemmamodel encoder-decoder only
* update token and sequence classification
* update tests
* Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.)
* Update quicktour.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add Arcee model support to transformers
- Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer
- Use LlamaTokenizer for tokenization
- Add FX graph support for Arcee models
- Create lazy loading module structure for Arcee
* feat: update YARN scaling and RoPE validation for Arcee model
* feat: add auto_docstring checkpoint config to Arcee model classes
* docs: add pre-trained model weights reference to Arcee configuration files
* refactor: move RoPE utilities to dedicated modeling_rope_utils module
* Add comprehensive test suite for Arcee model
- Add test_modeling_arcee.py following standard transformers test patterns
- Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add specific test for ReLU² activation in ArceeMLP
- Add RoPE scaling tests including YARN support
- Follow CausalLMModelTest pattern used by similar models
* Add documentation for Arcee model
- Add comprehensive model documentation with usage examples
- Include all model variants in autodoc
- Add to table of contents in proper alphabetical order
- Fixes documentation coverage for Arcee model classes
* Make style/fixup
* fix copyright year
* Sync modular conversion
* revert in legacy supported models in src/transformers/utils/fx
* cleaned redundant code in modular_arcee.py
* cleaned testing
* removed pretraining tp
* fix styles
* integration testing
---------
Co-authored-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>
* Typos
- corrected bf16 training argument
- corrected header for SDPA
* improved readability for SDPA suggested by @stevhliu
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add working idefics2 fast and improvements for fast nested images processing
* add fast image processors idefics 3 and smolvlm
* cleanup tests
* fic doc idefics2
* PR review and fix issues after merge
* Force providing disable_grouping to group_images_by_shape
* simplify group_images_by_shape
* fix modular
* Fix nits after review
* feat: add flexible Liger Kernel configuration to TrainingArguments
Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.
Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag
Example usage:
```python
TrainingArguments(
use_liger_kernel=True,
liger_kernel_config={
"rope": True,
"swiglu": True,
"cross_entropy": False,
"fused_linear_cross_entropy": True
}
)
Closes#38905
* Address comments and update Liger section in Trainer docs
* Update bamba model card
* Update the doc for bamba
* Update docs/source/en/model_doc/bamba.md
Bamba paragraph
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Bamba collection url
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Update Padding-Free Training to Notes heading
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
update examples
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
Update additional info
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
consistent casing
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
simplify sentences
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Include pipeline and cli examples + fix formatting
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bamba.md
update cli id
* Update quantization example
* Fix auto code formatter changes
* Update cli command + include BambaModel
* Update docs/source/en/model_doc/bamba.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update trocr.md
Docs: add community fine‑tuning notebook link to TrOCR page
* apply suggested changes from PR review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/trocr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Moved the sources to the right
* small Changes
* Some Changes to moonshine
* Added the install to pipline
* updated the monshine model card
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated Documentation According to changes
* Fixed the model with the commits
* Changes to the roc_bert
* Final Update to the branch
* Adds Quantizaiton to the model
* Finsihed Fixing the Roc_bert docs
* Fixed Moshi
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Fixed Problems
* Added the install to pipline
* updated the monshine model card
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/moonshine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated Documentation According to changes
* Fixed the model with the commits
* Fixed the problems
* Final Fix
* Final Fix
* Final Fix
* Update roc_bert.md
---------
Co-authored-by: Your Name <sohamprabhu@Mac.fios-router.home>
Co-authored-by: Your Name <sohamprabhu@Sohams-MacBook-Air.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* init
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* chore: various changes to LightGlue
* Fixed dynamo bug and image padding tests
* refactor: applied refactoring changes from SuperGlue's concat, batch and stack functions to LightGlue file
* tests: removed sdpa support and changed expected values
* chore: added some docs and refactoring
* chore: fixed copy to superpoint.image_processing_superpoint.convert_to_grayscale
* feat: adding batch implementation
* feat: added validation for preprocess and post process method to LightGlueImageProcessor
* chore: changed convert_lightglue_to_hf script to comply with new standard
* chore: changed lightglue test values to match new lightglue config pushed to hub
* chore: simplified convert_lightglue_to_hf conversion map
* feat: adding batching implementation
* chore: make style
* feat: added threshold to post_process_keypoint_matching method
* fix: added missing instructions that turns keypoints back to absolute coordinate before matching forward
* fix: added typehint and docs
* chore: make style
* [run-slow] lightglue
* fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching
* tests: added CUDA proof tests similar to SuperGlue
* chore: various changes to modeling_lightglue.py
- Added "Copies from" statements for copied functions from modeling_superglue.py
- Added missing docstrings
- Removed unused functions or classes
- Removed unnecessary statements
- Added missing typehints
- Added comments to the main forward method
* chore: various changes to convert_lightglue_to_hf.py
- Added model saving
- Added model reloading
* chore: fixed imports in lightglue files
* [run-slow] lightglue
* chore: make style
* [run-slow] lightglue
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* [run-slow] lightglue
* chore: Applied some suggestions from review
- Added missing typehints
- Refactor "cuda" to device variable
- Variable renaming
- LightGlue output order changed
- Make style
* fix: added missing grayscale argument in image processor in case use of SuperPoint keypoint detector
* fix: changed lightglue HF repo to lightglue_superpoint with grayscale default to True
* refactor: make keypoints `(batch_size, num_keypoints, keypoint_dim)` through forward and unsqueeze only before attention layer
* refactor: refactor do_layer_keypoint_pruning
* tests: added tests with no early stop and keypoint pruning
* refactor: various refactoring to modeling_lightglue.py
- Removed unused functions
- Renamed variables for consistency
- Added comments for clarity
- Set methods to private in LightGlueForKeypointMatching
- Replaced tensor initialization to list then concatenation
- Used more pythonic list comprehension for repetitive instructions
* refactor: added comments and renamed filter_matches to get_matches_from_scores
* tests: added copied from statement with superglue tests
* docs: added comment to prepare_keypoint_matching_output function in tests
* [run-slow] lightglue
* refactor: reordered _concat_early_stopped_outputs in LightGlue class
* [run-slow] lightglue
* docs: added lightglue.md model doc
* docs: added Optional typehint to LightGlueKeypointMatchingOutput
* chore: removed pad_images function
* chore: set do_grayscale default value to True in LightGlueImageProcessor
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* docs: added missing LightGlueConfig typehint in nn.Module __init__ methods
* docs: removed unnecessary code in docs
* docs: import SuperPointConfig only from a TYPE_CHECKING context
* chore: use PretrainedConfig arguments `num_hidden_layers` and `num_attention_heads` instead of `num_layers` and `num_heads`
* chore: added organization as arg in convert_lightglue_to_hf.py script
* refactor: set device variable
* chore: added "gelu" in LightGlueConfig as hidden_act parameter
* docs: added comments to reshape.flip.reshape instruction to perform cross attention
* refactor: used batched inference for keypoint detector forward pass
* fix: added fix for SDPA tests
* docs: fixed docstring for LightGlueImageProcessor
* [run-slow] lightglue
* refactor: removed unused line
* refactor: added missing arguments in LightGlueConfig init method
* docs: added missing LightGlueConfig typehint in init methods
* refactor: added checkpoint url as default variable to verify models output only if it is the default url
* fix: moved print message inside if statement
* fix: added log assignment r removal in convert script
* fix: got rid of confidence_thresholds as registered buffers
* refactor: applied suggestions from SuperGlue PR
* docs: changed copyright to 2025
* refactor: modular LightGlue
* fix: removed unnecessary import
* feat: added plot_keypoint_matching method to LightGlueImageProcessor with matplotlib soft dependency
* fix: added missing import error for matplotlib
* Updated convert script to push on ETH org
* fix: added missing licence
* fix: make fix-copies
* refactor: use cohere apply_rotary_pos_emb function
* fix: update model references to use ETH-CVG/lightglue_superpoint
* refactor: add and use intermediate_size attribute in config to inherit CLIPMLP for LightGlueMLP
* refactor: explicit variables instead of slicing
* refactor: use can_return_tuple decorator in LightGlue model
* fix: make fix-copies
* docs: Update model references in `lightglue.md` to use the correct pretrained model from ETH-CVG
* Refactor LightGlue configuration and processing classes
- Updated type hints for `keypoint_detector_config` in `LightGlueConfig` to use `SuperPointConfig` directly.
- Changed `size` parameter in `LightGlueImageProcessor` to be optional.
- Modified `position_embeddings` in `LightGlueAttention` and `LightGlueAttentionBlock` to be optional tuples.
- Cleaned up import statements across multiple files for better readability and consistency.
* refactor: Update LightGlue configuration to enforce eager attention implementation
- Added `attn_implementation="eager"` to `keypoint_detector_config` in `LightGlueConfig` and `LightGlueAttention` classes.
- Removed unnecessary logging related to attention implementation fallback.
- Cleaned up import statements for better readability.
* refactor: renamed message into attention_output
* fix: ensure device compatibility in LightGlueMatchAssignmentLayer descriptor normalization
- Updated the normalization of `m_descriptors` to use the correct device for the tensor, ensuring compatibility across different hardware setups.
* refactor: removed Conv layers from init_weights since LightGlue doesn't have any
* refactor: replace add_start_docstrings with auto_docstring in LightGlue models
- Updated LightGlue model classes to utilize the new auto_docstring utility for automatic documentation generation.
- Removed legacy docstring handling to streamline the code and improve maintainability.
* refactor: simplify LightGlue image processing tests by inheriting from SuperGlue
- Refactored `LightGlueImageProcessingTester` and `LightGlueImageProcessingTest` to inherit from their SuperGlue counterparts, reducing code duplication.
- Removed redundant methods and properties, streamlining the test setup and improving maintainability.
* test: forced eager attention implementation to LightGlue model tests
- Updated `LightGlueModelTester` to include `attn_implementation="eager"` in the model configuration.
- This change aligns the test setup with the recent updates in LightGlue configuration for eager attention.
* refactor: update LightGlue model references
* fix: import error
* test: enhance LightGlue image processing tests with setup method
- Added a setup method in `LightGlueImageProcessingTest` to initialize `LightGlueImageProcessingTester`.
- Included a docstring for `LightGlueImageProcessingTester` to clarify its purpose.
* refactor: added LightGlue image processing implementation to modular file
* refactor: moved attention blocks into the transformer layer
* fix: added missing import
* fix: added missing import in __all__ variable
* doc: added comment about enforcing eager attention because of SuperPoint
* refactor: added SuperPoint eager attention comment and moved functions to the closest they are used
---------
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Updated Albert model Card
* Update docs/source/en/model_doc/albert.md
added the quotes in <hfoption id="Pipeline">
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
updated checkpoints
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
changed !Tips description
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
updated text
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
updated transformer-cli implementation
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
changed text
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
removed repeated description
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update albert.md
removed lines
* Update albert.md
updated pipeline code
* Update albert.md
updated auto model code, removed quantization as model size is not large, removed the attention visualizer part
* Update docs/source/en/model_doc/albert.md
updated notes
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update albert.md
reduced a repeating point in notes
* Update docs/source/en/model_doc/albert.md
updated transformer-CLI
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
removed extra notes
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>