* Gemma 3n
* initial commit of Gemma 3n scaffold
* Fixing param pass through on Gemm3p5RMSNorm
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Adds AltUp to Gemma 3n
* Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3)
* Adding gemma3p5 text configs
* Adding audio config placeholders
* Adding a placeholder for vision configs
* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig
* Updating text configs
* Update src/transformers/models/gemma3p5/modular_gemma3p5.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Removing altup configs to accept the suggested configs
* Update src/transformers/models/gemma3p5/modular_gemma3p5.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating altup config
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Addressing review comments and updating text configs
* Adding a config for activation sparsity
* Updating configs to pass through options to super class init and adjust some name prefixes
* Updating laurel and altup with corrected config values
* Normalizing sub_config initializers
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating MLP with activation sparsity (#2)
* Updating DecoderBlock for Gemma 3n (#3)
* Initial Gemm3nTextModel (#4)
NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.
* Adding KV Cache Sharing
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Refactored kv cache sharing in attention
* Adding KVStore for cache sharing
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Updating KV Cache Sharing implementation
* Updating the q and k norm definitions in the attention module
* Fixing name error for q,k,v RMS norm to use the right 3n module
* Updating MLP with activation sparsity
* Updating DecoderBlock for Gemma 3.5
* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code
* Isolating KV Cache logic to relevant components
* Fixing logic error in Gemma3nAttention.forward
* Refactoring caching contributions and fixing kv_store initialization
* Simplifying Configs
* Remove errant self from super init call
* Bug fix in the Attention module - changing self.head_dim to config.head_dim
* Bug fixes in the LaurelBlock and RMS Norm super init call
* removing redundant code from a merge
* Adding per_layer_inputs to TextModel
* Adding preprocess embeddings with altup
* Adds per-layer-to-single output and a host of TODOs
* Integrating altup predict with the model workflow and other minor bug fixes
* Using nn.Embedding temporarily for text model
* It goes forward
* Minor refactor of attention sparsity and RoPE initialization
* Fixing duplicate rope_scaling param bug when loading from pretrained
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Normalizing on altup_num_inputs config option
* regenerating modeling file after syncing to HEAD
* Use torch.std(..., unbiased=False) for activation sparsity (#8)
* Refactoring to a single QVK Norm (#13)
* AltUp: support scale_corrected_output (#14)
* Converts einsums to nn.Linear (#7)
* Converts einsums to nn.Linear
* Removing unused variables
* Aligning SharedKVCache with HybridCache (#11)
* Alinging SharedKVStore with HybridCache
* Remove KVStore. Refactor apply_rotary_pos_emb for sharing
* Addressing review comments
* Supporting split modality embeddings in Gemma3n (#10)
* Adding the Embedder class
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation
* Apply suggestions from code review
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Update modular
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
* Addressing review comments, prop drilling audio and vision configs to the text config
* Removing TODO's that have been addressed
* Simplify Embedder init and add audio embeddings
* Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder
* Refactoring vision and audio embeddings into ConditionalGeneration model
---------
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating attention mask for Gemma 3.5 (#15)
* xxx_token_index to xxx_token_id
* remvoing deprecated last_cache_position
* Removing references to SigLIP
* Always init per-layer inputs
* Using torch.finfo().min for epsilon_tensor
* Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas
* fix modular GEMMA3N_INPUTS_DOCSTRING
* Gemma3nAttention inherits from Gemma3Attention
* Modular inheritance fixes
* CausalLM conversion script for 4B model (#16)
* Add Gemma3n Audio Encoder (#6)
* initial commit of Gemma 3.5 scaffold
* Fixing param pass through on Gemm3nRMSNorm
* Adds Einsum layer to Gemma 3.5
* Updating EinsumLayer API
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Adds AltUp to Gemma 3n
* Adding Gemma3n overall and text config with vision and audio config placeholders (#3)
* Adding gemma3n text configs
* Adding audio config placeholders
* Adding a placeholder for vision configs
* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig
* Updating text configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Removing altup configs to accept the suggested configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating altup config
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Addressing review comments and updating text configs
* Adding a config for activation sparsity
* Updating configs to pass through options to super class init and adjust some name prefixes
* Updating laurel and altup with corrected config values
* Normalizing sub_config initializers
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating MLP with activation sparsity (#2)
* Updating DecoderBlock for Gemma 3.5 (#3)
* Initial Gemm3nTextModel (#4)
NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.
* Adding KV Cache Sharing
* Adds Einsum layer to Gemma 3.5
* Updating EinsumLayer API
* Refactored kv cache sharing in attention
* Adding KVStore for cache sharing
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Updating KV Cache Sharing implementation
* Updating the q and k norm definitions in the attention module
* Fixing name error for q,k,v RMS norm to use the right Gemma 3n module
* Updating MLP with activation sparsity
* Updating DecoderBlock for Gemma 3.5
* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code
* Isolating KV Cache logic to relevant components
* Fixing logic error in Gemma3nAttention.forward
* Refactoring caching contributions and fixing kv_store initialization
* Simplifying Configs
* Remove errant self from super init call
* Bug fix in the Attention module - changing self.head_dim to config.head_dim
* Bug fixes in the LaurelBlock and RMS Norm super init call
* removing redundant code from a merge
* Adding per_layer_inputs to TextModel
* Adding preprocess embeddings with altup
* Adds per-layer-to-single output and a host of TODOs
* Integrating altup predict with the model workflow and other minor bug fixes
* Using nn.Embedding temporarily for text model
* It goes forward
* Minor refactor of attention sparsity and RoPE initialization
* Fixing duplicate rope_scaling param bug when loading from pretrained
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Normalizing on altup_num_inputs config option
* Adding audio encoder config
* Adds high-level components for Audio Encoder
* Implement uniform reducer for Audio Encoder
* Adding placeholders for Conformer components in Audio Encoder
* Adding placeholders for SubSampleConvProjection components in Audio Encoder
* Adding SequenceLayer component placeholders
* Implementing Gemma3nAudioEncoder with nn.Sequential
* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential
* Implementing Conformer model with SequenceLayers
* Use OrderedDict in nn.Sequential initializers
* Implements sl.Residual in Torch with nn.Sequential and OrderedDict
* Adopting a base SequenceLayer class with default forward() method
* Implementing sl.GatedLinearUnit in Torch
* Implementing sl.Swish in Torch
* Implementing sl.ReLU in Torch
* Implementing sl.Scale in Torch
* Removing sl.Dropout after tree-shaking
* Implementing sl.RMSNorm in Torch with fake shape
* Implementing sl.GroupNorm in Torch
* Implementing sl.Conv2d in Torch
* Implementing sl.Dense in Torch
* Removing sl.Delay layers, which act as pass-throughs
* Connecting shapes to configs in initializers
* Removing sl.Emit
* Implementing sl.ExpandDims in Torch
* Adding sl.GradientClipping to Torch
* Implementing sl.DenseShaped in Torch
* Implementing sl.LDPA in Torch
* Removing unused sl.CombinedQKVProj class
* Fixing erroneous type hint
* Implemnenting sl.DepthwiseConv1D in Torch
* Implementing sl.MaskInvalid in Torch
* Fixes for initialization
* Fixes for saving weights
* Removing einsums per feedback from HF staff
* Removing Sequence Layers idioms from audio encoder
* Fixes for reviewer comments
* CausalLM conversion script for 4B model
* inv_timescales to non-persistent buffer
* Addressing audio encoder Attention feedback
* Addressing Gemma3nAudioSSCPConvBlock feedback
* Addressing Gemma3nAudioConformerAttention feedback
* Addressing padding feedback
* Weights conversion loads audio state dict
* Always use vision_config so saving works
* Token id updates for configs
* Stubs for interleaving audio embs
* Addressing reviewer feedback
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
* Fixing cache access error
* Removing duplicate code from a bad merge
* Gemma 3n Text + Vision Part 1 (#17)
* testing utilities for numerics comparisons
* Corrected einsum to nn.Linear weights conversion
* Inherit scaled word embs from Gemma3 not Bart
* Fixing transposes for collapsed linears
* More transpose fixes
* numpy api fix
* RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True
* Force AltUp to float32
* Updating debugging script for AudioEncoder debugging
* Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs
* Correcting attention einsum conversions
* RMSNorm in type of x
* Fixing douplicate laurel norm/gating
* KV sharing using the right previous indices
* Refactor kv shared index computation. Correct frac_shared_layers
* Use num_shared_layers instead of inferring from a fraction
* fixing a bug for logging
* Fix shared data_ptrs in altup inits
* rope: adjust proj -> norm -> rope to preserve computation (#20)
* rope: adjust proj -> norm -> rope to preserve computation
* Removing some breaking language model fluff in ConditionalGeneration
* Consolidate query_states transforms
---------
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Vectorize the loops in AltUp (#19)
* Vectorize the loops in AltUp
* fix typo
* Expanding to support batched inputs
* remove extra debug script
* Fix AltUp.forward
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel
* Convert norm to 1/sqrt (#21)
* Convert norm to 1/sqrt
* Scale shift change per Phil's rec
* Adding default activation sparsity
* Fixing 2B config in weights conversion script
* Fixing RMSNorm parameters - adding scale_shift and with_scale
* Correcting query pre-attention scaling
* Adding query_rescale_scalar to text config
* Adding layer_idx to MLP
* Permafix for input_layernorm
* Use 1/sqrt instead of rsqrt in DecoderLayer
* Fix o_proj conversion
* Conversion script update for vision encoder
* Removing logging for debugging timm model
* Fixing bugs in Gemma3nForConditionalGeneration for text generation
* Generating the modeling_gemma3n.py file
* Removing the addition of an erroneous line in the modeling file
* Adding gemma3n text model to modeling_auto
* Bugfix: Updating the interleaving of inputs_embeds and vision_embeds
* Updating the modeling file with the latest bugfix changes
* Updating models/auto for Gemma 3n
* using AutoTokenizer in forward test
* Adding processing_gemma3n.py
* Gemma 3n configured for AutoModel. Conversion script updated.
* Removing errant merge artifacts
---------
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
* Removing errant debugging statements from Gemma 3
* Gemma3n audio model (#18)
* testing utilities for numerics comparisons
* Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock
* Add audio version of forward script based on RyanMullins' implementation
* Updating to match encoder tests. WIP: config question needs resolving
* Updates to audio classes to enable end-to-end running
* Removing vestigial classes, cleaning up print statements
* Adding SiLU / Swish to audio conformer feed forward block
* Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio
* Adding outputs to audio test
* Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model
* Update forward test to load from local weights
* Update conversion to process / output audio layers
* Update __all__ to export audio encoder
* AutoModel registration for Gemma 3n Audio
* Use AutoModel for ConditionalGeneration.audio_tower
* Fixing input_proj_linear transpose
* Fixing Gemma3NanoAudioConformerAttention.post conversion
* Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion
* Correcting indentation issue on Gemma3p5RMSNorm
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Text + Vision Part 2 (#23)
* Updates for ConditionalGeneration.get_image_features
* Adding a WIP draft of image_processing_gemma3p5.py
* Update src/transformers/models/gemma3p5/modular_gemma3p5.py
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Modular conversion after github suggested change
* Text + image gives good results
* Fixing image size preset
* Updating configs for the 2B variant in the conversion script
* Using final generation config in conversion script
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Audio Integration (#12)
* initial commit of Gemma 3n scaffold
* Fixing param pass through on Gemm3nRMSNorm
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Adds AltUp to Gemma 3n
* Adding Gemma 3n overall and text config with vision and audio config placeholders (#3)
* Adding Gemma 3n text configs
* Adding audio config placeholders
* Adding a placeholder for vision configs
* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig
* Updating text configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Removing altup configs to accept the suggested configs
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating altup config
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Addressing review comments and updating text configs
* Adding a config for activation sparsity
* Updating configs to pass through options to super class init and adjust some name prefixes
* Updating laurel and altup with corrected config values
* Normalizing sub_config initializers
---------
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Updating MLP with activation sparsity (#2)
* Updating DecoderBlock for Gemma 3n (#3)
* Initial Gemma3nTextModel (#4)
NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.
* Adding KV Cache Sharing
* Adds Einsum layer to Gemma 3n
* Updating EinsumLayer API
* Refactored kv cache sharing in attention
* Adding KVStore for cache sharing
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update modular
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
* Undoing erroneous force push
* Reverting RMSNorm to with_scale by default
* Adds LAuReL to Gemma 3n
* Updating KV Cache Sharing implementation
* Updating the q and k norm definitions in the attention module
* Fixing name error for q,k,v RMS norm to use the right 3n module
* Updating MLP with activation sparsity
* Updating DecoderBlock for Gemma 3n
* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code
* Isolating KV Cache logic to relevant components
* Fixing logic error in Gemma3nAttention.forward
* Refactoring caching contributions and fixing kv_store initialization
* Simplifying Configs
* Remove errant self from super init call
* Bug fix in the Attention module - changing self.head_dim to config.head_dim
* Bug fixes in the LaurelBlock and RMS Norm super init call
* removing redundant code from a merge
* Adding per_layer_inputs to TextModel
* Adding preprocess embeddings with altup
* Adds per-layer-to-single output and a host of TODOs
* Integrating altup predict with the model workflow and other minor bug fixes
* Using nn.Embedding temporarily for text model
* It goes forward
* Minor refactor of attention sparsity and RoPE initialization
* Fixing duplicate rope_scaling param bug when loading from pretrained
---------
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Normalizing on altup_num_inputs config option
* Adding audio encoder config
* Adds high-level components for Audio Encoder
* Implement uniform reducer for Audio Encoder
* Adding placeholders for Conformer components in Audio Encoder
* Adding placeholders for SubSampleConvProjection components in Audio Encoder
* Adding SequenceLayer component placeholders
* Implementing Gemma3nAudioEncoder with nn.Sequential
* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential
* Implementing Conformer model with SequenceLayers
* Use OrderedDict in nn.Sequential initializers
* Implements sl.Residual in Torch with nn.Sequential and OrderedDict
* Adopting a base SequenceLayer class with default forward() method
* Implementing sl.GatedLinearUnit in Torch
* Implementing sl.Swish in Torch
* Implementing sl.ReLU in Torch
* Implementing sl.Scale in Torch
* Removing sl.Dropout after tree-shaking
* Implementing sl.RMSNorm in Torch with fake shape
* Implementing sl.GroupNorm in Torch
* Implementing sl.Conv2d in Torch
* Implementing sl.Dense in Torch
* Removing sl.Delay layers, which act as pass-throughs
* Connecting shapes to configs in initializers
* Removing sl.Emit
* Implementing sl.ExpandDims in Torch
* Adding sl.GradientClipping to Torch
* Implementing sl.DenseShaped in Torch
* Implementing sl.LDPA in Torch
* Removing unused sl.CombinedQKVProj class
* Fixing erroneous type hint
* Implemnenting sl.DepthwiseConv1D in Torch
* Implementing sl.MaskInvalid in Torch
* Fixes for initialization
* Fixes for saving weights
* Removing einsums per feedback from HF staff
* Removing Sequence Layers idioms from audio encoder
* Fixes for reviewer comments
* Converting sl.Frontend to FeatureExtractor
* Updates for ConditionalGeneration.get_image_features
* Adding a WIP draft of image_processing_gemma3n.py
* Update modular
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
* Modular conversion after github suggested change
* Text + image gives good results
* Fixing image size preset
* Draft of audio data in chat template
* Removing image processing. Using SigLIP instead.
* Audio input going end-to-end
* Fixing dtype issues in audio encoder
* x-lib formatting consistency
* Adding example data
* Save preprocessor_config.json from conversion script
* Instrumentaiton for debugging
* Additional instrumentation for preprocessing debugging
* Updates to preprocessor, padding; produces correct end-to-end results on sample
* Tackling configuraiton TODOs
* Start of feature extractor refatcor
* Adds Numpy version of USM extractor, removes Torch version and dependencies
* Fixing AltUp.correct coef permute
* Supporting batches of single audio segment inputs
* Docstrings updates for config
* In-lining audio feature extraction
* Adjustments to conversion script and smoke test script
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
* Gemma 3n renaming
* Removing test data and utilities
* Renaming test files
* Gemma 3n refactor
* Fix tokenizer config in conversion script
* Address reviewer feedback
* FeatureExtractor returns float32 by default
* Adding basic tests for audio, and input name for audio encoder
* Audio integration test, updates to model_id for other integration tests
* Use scales for q and k norms (#26)
* Update audio integration test to use HF dataset
* Reviewer feedback
* Expand embedding table to full vocab size in weights conversion
* Mix-n-match MatFormers for Gemma 3n (#25)
* Remove in-place operations (#30)
* chore: removing inplace ops
* remove [tensor] * n pattern
* chore: reviewer feedback in AudioEncoder and AltUp
* More grad clipping
* Dynamo compatibility
* fix: cache slicing error
* chore: simplify shared kv cache slicing
* chore: vision encoder rename in timm
* fix: image processor do_normalize=False
* fixup: style
* chore: model_doc
* fix: docs for code quality
* chore: repo consistency
* fix: RMSNorm in float as in prior Gemmas
* fix: per_layer_inputs = None
* chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint
* chore: repo consistency
* Add initial unit tests for Gemma3nAudioFeatureExtractor (#27)
* Add initial unit tests for Gemma3nAudioFeatureExtractor
* Add basic unit tests for Gemma3nProcessor (#28)
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
* parameterize tests
---------
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
* chore: code style
* fix: test cases
* style and consistency
* fix config in the test to be coherent with layer cache sharing
* fix hidden states in tests and code
* inits and mappings
* fix modality prefixes
* test order and prefixes
* fix test exception
* fix class order and reduce model size for faster tests
* restore _checkpoint_conversion_mapping to load Caual from Conditional
* fix config mapping!
* fix: reviewer feedback
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* fix import test
* add model args
* auto_docstring
* replace test path
* consistency
* skip tests for now
* fix docstring for doc builder
* skip unused attr
---------
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* add dia model
* add tokenizer files
* cleanup some stuff
* brut copy paste code
* rough cleanup of the modeling code
* nuke some stuff
* more nuking
* more cleanups
* updates
* add mulitLayerEmbedding vectorization
* nits
* more modeling simplifications
* updates
* update rope
* update rope
* just fixup
* update configuration files
* more cleanup!
* default config values
* update
* forgotten comma
* another comma!
* update, more cleanups
* just more nits
* more config cleanups
* time for the encoder
* fix
* sa=mall nit
* nits
* n
* refacto a bit
* cleanup
* update cv scipt
* fix last issues
* fix last nits
* styling
* small fixes
* just run 1 generation
* fixes
* nits
* fix conversion
* fix
* more fixes
* full generate
* ouf!
* fixes!
* updates
* fix
* fix cvrt
* fixup
* nits
* delete wrong test
* update
* update
* test tokenization
* let's start changing things bit by bit - fix encoder step
* removing custom generation, moving to GenerationMixin
* add encoder decoder attention masks for generation
* mask changes, correctness checked against ad29837 in dia repo
* refactor a bit already --> next cache
* too important not to push :)
* minimal cleanup + more todos
* make main overwrite modeling utils
* add cfg filter & eos filter
* add eos countdown & delay pattern
* update eos countdown
* add max step eos countdown
* fix tests
* fix some things
* fix generation with testing
* move cfg & eos stuff to logits processor
* make RepetitionPenaltyLogitsProcessor flexible
- can accept 3D scores like (batch_size, channel, vocab)
* fix input_ids concatenation dimension in GenerationMixin for flexibility
* Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility.
* Add stopping criteria
* refactor
* move delay pattern from processor to modeling like musicgen.
- add docs
- change eos countdown to eos delay pattern
* fix processor & fix tests
* refactor types
* refactor imports
* format code
* fix docstring to pass ci
* add docstring to DiaConfig & add DiaModel to test
* fix docstring
* add docstring
* fix some bugs
* check
* porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first
* experimental testing of left padding for first channel
* whoops
* Fix merge to make generation work
* fix cfg filter
* add position ids
* add todos, break things
* revert changes to generation --> we will force 2d but go 3d on custom stuff
* refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos
* some first fixes to get to 10. in generation
* some more generation fixes / adjustment
* style + rope fixes
* move cfg out, simplify a few things, more todos
* nit
* start working on custom logit processors
* nit
* quick fixes
* cfg top k
* more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar
* lets keep changes to core code minimal, only eos scaling is questionable atm
* simpler eos delay logits processor
* that was for debugging :D
* proof of concept rope
* small fix on device mismatch
* cfg fixes + delay logits max len
* transformers rope
* modular dia
* more cleanup
* keep modeling consistently 3D, generate handles 2D internally
* decoder starts with bos if nothing
* post processing prototype
* style
* lol
* force sample / greedy + fixes on padding
* style
* fixup tokenization
* nits
* revert
* start working on dia tests
* fix a lot of tests
* more test fixes
* nit
* more test fixes + some features to simplify code more
* more cleanup
* forgot that one
* autodocs
* small consistency fixes
* fix regression
* small fixes
* dia feature extraction
* docs
* wip processor
* fix processor order
* processing goes brrr
* transpose before
* small fix
* fix major bug but needs now a closer look into the custom processors esp cfg
* small thing on logits
* nits
* simplify indices and shifts
* add simpler version of padding tests back (temporarily)
* add logit processor tests
* starting tests on processor
* fix mask application during generation
* some fixes on the weights conversion
* style + fixup logits order
* simplify conversion
* nit
* remove padding tests
* nits on modeling
* hmm
* fix tests
* trigger
* probably gonna be reverted, just a quick design around audio tokenizer
* fixup typing
* post merge + more typing
* initial design for audio tokenizer
* more design changes
* nit
* more processor tests and style related things
* add to init
* protect import
* not sure why tbh
* add another protect
* more fixes
* wow
* it aint stopping :D
* another missed type issue
* ...
* change design around audio tokenizer to prioritize init and go for auto - in regards to the review
* change to new causal mask function + docstrings
* change ternary
* docs
* remove todo, i dont think its essential tbh
* remove pipeline as current pipelines do not fit in the current scheme, same as csm
* closer to wrapping up the processor
* text to audio, just for demo purposes (will likely be reverted)
* check if it's this
* save audio function
* ensure no grad
* fixes on prefixed audio, hop length is used via preprocess dac, device fixes
* integration tests (tested locally on a100) + some processor utils / fixes
* style
* nits
* another round of smaller things
* docs + some fixes (generate one might be big)
* msytery solved
* small fix on conversion
* add abstract audio tokenizer, change init check to abstract class
* nits
* update docs + fix some processing :D
* change inheritance scheme for audio tokenizer
* delete dead / unnecessary code in copied generate loop
* last nits on new pipeline behavior (+ todo on tests) + style
* trigger
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* make it go brrrr
* date time
* update
* fix
* up
* uppp
* up
* no number i
* udpate
* fix
* [paligemma] fix processor with suffix (#38365)
fix pg processor
* [video utils] group and reorder by number of frames (#38374)
fix
* Fix convert to original state dict for VLMs (#38385)
* fix convert to original state dict
* fix
* lint
* Update modeling_utils.py
* update
* warn
* no verbose
* fginal
* ouft
* style
---------
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
* updates
* fixup
* fix tests
* fix test
* fix
* let it be here for now, till monday
* two more fixes
* persimmon
* fixup
* fix
* fixup
* make sure fuyu runs now that LM has new attn API
* fixup + tests
* qwen vl uses new mask interface as well
* qwen image features format
* update
* remove image_sizes
* address comments
* i am dumb...
* initial design
* update all video processors
* add tests
* need to add qwen2-vl (not tested yet)
* add qwen2-vl in auto map
* fix copies
* isort
* resolve confilicts kinda
* nit:
* qwen2-vl is happy now
* qwen2-5 happy
* other models are happy
* fix copies
* fix tests
* add docs
* CI green now?
* add more tests
* even more changes + tests
* doc builder fail
* nit
* Update src/transformers/models/auto/processing_auto.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* small update
* imports correctly
* dump, otherwise this is getting unmanagebale T-T
* dump
* update
* another update
* update
* tests
* move
* modular
* docs
* test
* another update
* init
* remove flakiness in tests
* fixup
* clean up and remove commented lines
* docs
* skip this one!
* last fix after rebasing
* run fixup
* delete slow files
* remove unnecessary tests + clean up a bit
* small fixes
* fix tests
* more updates
* docs
* fix tests
* update
* style
* fix qwen2-5-vl
* fixup
* fixup
* unflatten batch when preparing
* dump, come back soon
* add docs and fix some tests
* how to guard this with new dummies?
* chat templates in qwen
* address some comments
* remove `Fast` suffix
* fixup
* oops should be imported from transforms
* typo in requires dummies
* new model added with video support
* fixup once more
* last fixup I hope
* revert image processor name + comments
* oh, this is why fetch test is failing
* fix tests
* fix more tests
* fixup
* add new models: internvl, smolvlm
* update docs
* imprt once
* fix failing tests
* do we need to guard it here again, why?
* new model was added, update it
* remove testcase from tester
* fix tests
* make style
* not related CI fail, lets' just fix here
* mark flaky for now, filas 15 out of 100
* style
* maybe we can do this way?
* don't download images in setup class
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* i guessreverted all CdGen classes
* style
* llava onevision
* fix copies
* fix some tests
* some more tests
* dump
* skip these
* nevermind, i am dumb
* revert fix not needed
* fixup
* fixup
* another fixup
* more fixup to make ci finally happy
* fixup after rebasing
* fix qwen tests
* add internVL + typos here and there
* image token index -> id
* style
* fix init weights
* revert blip-2 not supported
* address comments
* fix copies
* revert blip2 test file as well
* as discussed internally, revert back CdGen models
* fix some tests
* fix more tests for compile
* CI red
* fix copies
* enumerate explicitly allowed models
* address comments
* fix tests
* fixup
* style again
* add tests for new model class
* another fixup ( x _ x )
* [fixup] unused attributes can be removed post-deprecation
* Let notification service succeed even when artifacts and reported jobs on github have mismatch
* Use default trace msg if no trace msg available
* Add pop_default helper fn
* style
* copy the last changes from broken PR
* small format
* some fixes and refactoring after review
* format
* add config attr for loss
* some fixes and refactoring
* fix copies
* fix style
* add test for d-fine resnet
* fix decoder layer prop
* fix dummies
* format init
* remove extra print
* refactor modeling, move resnet into separate folder
* fix resnet config
* change resnet on hgnet_v2, add clamp into decoder
* fix init
* fix config doc
* fix init
* fix dummies
* fix config docs
* fix hgnet_v2 config typo
* format modular
* add image classification for hgnet, some refactoring
* format tests
* fix dummies
* fix init
* fix style
* fix init for hgnet v2
* fix index.md, add init rnage for hgnet
* fix conversion
* add missing attr to encoder
* add loss for d-fine, add additional output for rt-detr decoder
* tests and docs fixes
* fix rt_detr v2 conversion
* some fixes for loos and decoder output
* some fixes for loss
* small fix for converted modeling
* add n model config, some todo comments for modular
* convert script adjustments and fixes, small refact
* remove extra output for rt_detr
* make some outputs optionsl, fix conversion
* some posr merge fixes
* small fix
* last field fix
* fix not split for hgnet_v2
* disable parallelism test for hgnet_v2 image classification
* skip multi gpu for d-fine
* adjust after merge init
* remove extra comment
* fix repo name references
* small fixes for tests
* Fix checkpoint path
* Fix consistency
* Fixing docs
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* added the configuartion for sam_hq
* added the modeelling for sam_hq
* added the sam hq mask decoder with hq features
* added the code for the samhq
* added the code for the samhq
* added the code for the samhq
* Delete src/transformers/models/sam_hq/modelling_sam_hq.py
* added the code for the samhq
* added the code for the samhq
* added the chnages for the modeelling
* added the code for sam hq for image processing
* added code for the sam hq model
* added the required changes
* added the changes
* added the key mappings for the sam hq
* adding the working code of samhq
* added the required files
* adding the pt object
* added the push to hub account
* added the args for the sam maks decoder
* added the args for the sam hq vision config
* aded the some more documentation
* removed the unecessary spaces
* all required chnages
* removed the image processor
* added the required file
* added the changes for the checkcopies
* added the code for modular file
* added the changes for the __init file
* added the code for the interm embeds
* added the code for sam hq
* added the changes for modular file
* added the test file
* added the changes required
* added the changes required
* added the code for the
* added the cl errors
* added the changes
* added the required changes
* added the some code
* added the code for the removing image processor
* added the test dimensins
* added the code for the removing extra used variables
* added the code for modeluar file hf_mlp for a better name
* removed abbrevaation in core functionality
* removed abbrevaation in core functionality
* .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory
* added the code which is after make fixup
* added some test for the intermediate embeddings test
* added the code for the torch support in sam hq
* added the code for the updated modular file
* added the changes for documentations as mentioned
* removed the heading
* add the changes for the code
* first mentioned issue resolved
* added the changes code to processor
* added the easy loading to init file
* added the changes to code
* added the code to changes
* added the code to work
* added the code for sam hq
* added the code for sam hq
* added the code for the point pad value
* added the small test for the image embeddings and intermediate embedding
* added the code
* added the code
* added the code for the tests
* added the code
* added ythe code for the processor file
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code for tests and some checks
* added some code
* added the code
* added the code
* added some code
* added some code
* added the changes for required
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added the code
* added some changes
* added some changes
* removed spaces and quality checks
* added some code
* added some code
* added some code
* added code quality checks
* added the checks for quality checks
* addded some code which fixes test_inference_mask_generation_no_point
* added code for the test_inference_mask_generation_one_point_one_bb
* added code for the test_inference_mask_generation_one_point_one_bb_zero
* added code for the test_inference_mask_generation_one_box
* added some code in modelling for testing
* added some code which sort maks with high score
* added some code
* added some code
* added some code for the move KEYS_TO_MODIFY_MAPPING
* added some code for the unsqueeze removal
* added some code for the unsqueeze removal
* added some code
* added some code
* add some code
* added some code
* added some code
* added some testign values changed
* added changes to code in sam hq for readbility purpose
* added pre commit checks
* added the fix samvisionmodel for compatibilty
* added the changes made on sam by cyyever
* fixed the tests for samhq
* added some the code
* added some code related to init file issue during merge conflicts
* remobved the merge conflicts
* added changes mentioned by aruther and mobap
* added changes mentioned by aruther and mobap
* solving quality checks
* added the changes for input clearly
* added the changes
* added changes in mask generation file rgearding model inputs and sam hq quargs in processor file
* added changes in processor file
* added the Setup -> setupclass conversion
* added the code mentioned for processor
* added changes for the code
* added some code
* added some code
* added some code
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* initial commit
* add convert internvl
* add first end-to-end working internvl
* nit prompt and image proc
* add working chat template
* add conversion llama-based models
* add tests
* pass all tests
* fix isort
* fix modular after main merge
* add video processing for internvl
* add support for interlaced images and videos
* Remove processing and config from modular, add more tests
* add llama model tests
* Modify processor for compatibility with refactored got ocr image processor
* add comments in processor
* Add docs and nits
* change video processing to use custom sample_indices_fn
* rebase and fix tests
* add processor tests
* Add changes Raushan review
* Use the new attention interface for the vision model
* nits
* add support for custom video_load_backend
* remove mention to InternVLTokenizer
* refactor vision model to simplify logic
* refactor processor for better readibility
* fix copies
* fix require av processor test
* refactor internVL vision
* Update processor and fix processing tests
* fix docstring
* update convert_weights for internvl3
* change image processor to fast by default
* remove do_center_crop=True in convert_weights
* force use_cache to True
* push_to_hub before reloading
* fix internVLVision for larger models
* update convert weight for qk norm
* fix convert_weights
* fix eos_token_id in convert
* update docs and integration tests
* make modifs after review
* fix wrong k_norm and reduce modular
* change image_token_index to image_token_id
* change checkpoint to OpenGVLab org
* last nits
* explicitely del self.num_key_value_groups
* add extra special tokens
* use only `xxx_token_id` for multimodal tokens
* update modeling files as well
* fixup
* why fixup doesn't fix modular docstring first?
* janus, need to update configs in the hub still
* last fixup