transformers/tests/models
Ryan Mullins c63cfd6a83
Gemma 3n (#39059)
* Gemma 3n

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3p5RMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3)

* Adding gemma3p5 text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* regenerating modeling file after syncing to HEAD

* Use torch.std(..., unbiased=False) for activation sparsity (#8)

* Refactoring to a single QVK Norm (#13)

* AltUp: support scale_corrected_output (#14)

* Converts einsums to nn.Linear (#7)

* Converts einsums to nn.Linear

* Removing unused variables

* Aligning SharedKVCache with HybridCache (#11)

* Alinging SharedKVStore with HybridCache

* Remove KVStore. Refactor apply_rotary_pos_emb for sharing

* Addressing review comments

* Supporting split modality embeddings in Gemma3n (#10)

* Adding the Embedder class

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation

* Apply suggestions from code review

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, prop drilling audio and vision configs to the text config

* Removing TODO's that have been addressed

* Simplify Embedder init and add audio embeddings

* Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder

* Refactoring vision and audio embeddings into ConditionalGeneration model

---------

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating attention mask for Gemma 3.5 (#15)

* xxx_token_index to xxx_token_id

* remvoing deprecated last_cache_position

* Removing references to SigLIP

* Always init per-layer inputs

* Using torch.finfo().min for epsilon_tensor

* Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas

* fix modular GEMMA3N_INPUTS_DOCSTRING

* Gemma3nAttention inherits from Gemma3Attention

* Modular inheritance fixes

* CausalLM conversion script for 4B model (#16)

* Add Gemma3n Audio Encoder (#6)

* initial commit of Gemma 3.5 scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3n overall and text config with vision and audio config placeholders (#3)

* Adding gemma3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3.5 (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right Gemma 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* CausalLM conversion script for 4B model

* inv_timescales to non-persistent buffer

* Addressing audio encoder Attention feedback

* Addressing Gemma3nAudioSSCPConvBlock feedback

* Addressing Gemma3nAudioConformerAttention feedback

* Addressing padding feedback

* Weights conversion loads audio state dict

* Always use vision_config so saving works

* Token id updates for configs

* Stubs for interleaving audio embs

* Addressing reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Fixing cache access error

* Removing duplicate code from a bad merge

* Gemma 3n Text + Vision Part 1 (#17)

* testing utilities for numerics comparisons

* Corrected einsum to nn.Linear weights conversion

* Inherit scaled word embs from Gemma3 not Bart

* Fixing transposes for collapsed linears

* More transpose fixes

* numpy api fix

* RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True

* Force AltUp  to float32

* Updating debugging script for AudioEncoder debugging

* Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs

* Correcting attention einsum conversions

* RMSNorm in type of x

* Fixing douplicate laurel norm/gating

* KV sharing using the right previous indices

* Refactor kv shared index computation. Correct frac_shared_layers

* Use num_shared_layers instead of inferring from a fraction

* fixing a bug for logging

* Fix shared data_ptrs in altup inits

* rope: adjust proj -> norm -> rope to preserve computation (#20)

* rope: adjust proj -> norm -> rope to preserve computation

* Removing some breaking language model fluff in ConditionalGeneration

* Consolidate query_states transforms

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Vectorize the loops in AltUp (#19)

* Vectorize the loops in AltUp

* fix typo

* Expanding to support batched inputs

* remove extra debug script

* Fix AltUp.forward

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel

* Convert norm to 1/sqrt (#21)

* Convert norm to 1/sqrt

* Scale shift change per Phil's rec

* Adding default activation sparsity

* Fixing 2B config in weights conversion script

* Fixing RMSNorm parameters - adding scale_shift and with_scale

* Correcting query pre-attention scaling

* Adding query_rescale_scalar to text config

* Adding layer_idx to MLP

* Permafix for input_layernorm

* Use 1/sqrt instead of rsqrt in DecoderLayer

* Fix o_proj conversion

* Conversion script update for vision encoder

* Removing logging for debugging timm model

* Fixing bugs in Gemma3nForConditionalGeneration for text generation

* Generating the modeling_gemma3n.py file

* Removing the addition of an erroneous line in the modeling file

* Adding gemma3n text model to modeling_auto

* Bugfix: Updating the interleaving of inputs_embeds and vision_embeds

* Updating the modeling file with the latest bugfix changes

* Updating models/auto for Gemma 3n

* using AutoTokenizer in forward test

* Adding processing_gemma3n.py

* Gemma 3n configured for AutoModel. Conversion script updated.

* Removing errant merge artifacts

---------

Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Removing errant debugging statements from Gemma 3

* Gemma3n audio model (#18)

* testing utilities for numerics comparisons

* Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock

* Add audio version of forward script based on RyanMullins' implementation

* Updating to match encoder tests. WIP: config question needs resolving

* Updates to audio classes to enable end-to-end running

* Removing vestigial classes, cleaning up print statements

* Adding SiLU / Swish to audio conformer feed forward block

* Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio

* Adding outputs to audio test

* Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model

* Update forward test to load from local weights

* Update conversion to process / output audio layers

* Update __all__ to export audio encoder

* AutoModel registration for Gemma 3n Audio

* Use AutoModel for ConditionalGeneration.audio_tower

* Fixing input_proj_linear transpose

* Fixing Gemma3NanoAudioConformerAttention.post conversion

* Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion

* Correcting indentation issue on Gemma3p5RMSNorm

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Text + Vision Part 2 (#23)

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3p5.py

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Updating configs for the 2B variant in the conversion script

* Using final generation config in conversion script

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Audio Integration (#12)

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma 3n overall and text config with vision and audio config placeholders (#3)

* Adding Gemma 3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemma3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3n

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* Converting sl.Frontend to FeatureExtractor

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3n.py

* Update modular

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Draft of audio data in chat template

* Removing image processing. Using SigLIP instead.

* Audio input going end-to-end

* Fixing dtype issues in audio encoder

* x-lib formatting consistency

* Adding example data

* Save preprocessor_config.json from conversion script

* Instrumentaiton for debugging

* Additional instrumentation for preprocessing debugging

* Updates to preprocessor, padding; produces correct end-to-end results on sample

* Tackling configuraiton TODOs

* Start of feature extractor refatcor

* Adds Numpy version of USM extractor, removes Torch version and dependencies

* Fixing AltUp.correct coef permute

* Supporting batches of single audio segment inputs

* Docstrings updates for config

* In-lining audio feature extraction

* Adjustments to conversion script and smoke test script

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>

* Gemma 3n renaming

* Removing test data and utilities

* Renaming test files

* Gemma 3n refactor

* Fix tokenizer config in conversion script

* Address reviewer feedback

* FeatureExtractor returns float32 by default

* Adding basic tests for audio, and input name for audio encoder

* Audio integration test, updates to model_id for other integration tests

* Use scales for q and k norms (#26)

* Update audio integration test to use HF dataset

* Reviewer feedback

* Expand embedding table to full vocab size in weights conversion

* Mix-n-match MatFormers for Gemma 3n (#25)

* Remove in-place operations (#30)

* chore: removing inplace ops

* remove [tensor] * n pattern

* chore: reviewer feedback in AudioEncoder and AltUp

* More grad clipping

* Dynamo compatibility

* fix: cache slicing error

* chore: simplify shared kv cache slicing

* chore: vision encoder rename in timm

* fix: image processor do_normalize=False

* fixup: style

* chore: model_doc

* fix: docs for code quality

* chore: repo consistency

* fix: RMSNorm in float as in prior Gemmas

* fix: per_layer_inputs = None

* chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint

* chore: repo consistency

* Add initial unit tests for Gemma3nAudioFeatureExtractor (#27)

* Add initial unit tests for Gemma3nAudioFeatureExtractor

* Add basic unit tests for Gemma3nProcessor (#28)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* parameterize tests

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* chore: code style

* fix: test cases

* style and consistency

* fix config in the test to be coherent with layer cache sharing

* fix hidden states in tests and code

* inits and mappings

* fix modality prefixes

* test order and prefixes

* fix test exception

* fix class order and reduce model size for faster tests

* restore _checkpoint_conversion_mapping to load Caual from Conditional

* fix config mapping!

* fix: reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix import test

* add model args

* auto_docstring

* replace test path

* consistency

* skip tests for now

* fix docstring for doc builder

* skip unused attr

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-26 17:55:47 +02:00
..
albert Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
align Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
altclip 🚨 🚨 Setup -> setupclass conversion (#37282) 2025-04-08 17:15:37 +01:00
arcee Add Arcee model support (#38621) 2025-06-24 15:05:29 +02:00
aria Don't run AriaForConditionalGenerationModelTest on CircleCI (#38615) 2025-06-06 11:30:31 +02:00
audio_spectrogram_transformer Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
auto Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
autoformer 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
aya_vision Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
bamba enable misc test cases on XPU (#38852) 2025-06-18 09:20:49 +02:00
bark enable more test cases on xpu (#38572) 2025-06-06 09:29:51 +02:00
bart Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
barthez Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
bartpho Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
beit Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
bert [tests] remove TF tests (uses of require_tf) (#38944) 2025-06-25 17:29:10 +00:00
bert_generation Fix typos in strings and comments (#37799) 2025-04-28 11:39:11 +01:00
bert_japanese Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
bertweet Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
big_bird Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
bigbird_pegasus [generation] bring back tests on vision models (#38603) 2025-06-06 08:23:15 +00:00
biogpt Bart: new cache format (#35314) 2025-05-16 13:26:54 +02:00
bit Add ImageProcessorFast to BiT processor (#37180) 2025-04-14 17:07:48 +02:00
bitnet Add Bitnet model (#37742) 2025-04-28 15:08:46 +02:00
blenderbot Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
blenderbot_small Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
blip Fix more flaky test_initialization (#38932) 2025-06-20 17:28:32 +02:00
blip_2 [generation] bring back tests on vision models (#38603) 2025-06-06 08:23:15 +00:00
bloom Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
bridgetower Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
bros Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
byt5 Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
camembert Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
canine Skip torchscript tests for 2 models (#38643) 2025-06-06 20:17:37 +02:00
chameleon Update some tests for torch 2.7.1 (#38701) 2025-06-10 11:46:52 +02:00
chinese_clip Add Fast Chinese-CLIP Processor (#37012) 2025-04-15 18:31:20 +02:00
clap Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
clip Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
clipseg Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
clvp Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
code_llama remove unhandled parameter (#38145) 2025-06-02 15:57:32 +02:00
codegen Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
cohere Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
cohere2 Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
colpali Add ColQwen2 to 🤗 transformers (#35778) 2025-06-02 12:58:01 +00:00
colqwen2 Update some tests for torch 2.7.1 (#38701) 2025-06-10 11:46:52 +02:00
conditional_detr 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
convbert Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
convnext Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
convnextv2 Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
cpm Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
cpmant Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
csm Update CsmForConditionalGenerationIntegrationTest (#38424) 2025-05-28 10:20:43 +02:00
ctrl Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
cvt Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
d_fine enable misc test cases on XPU (#38852) 2025-06-18 09:20:49 +02:00
dab_detr 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
dac Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
data2vec Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
dbrx Refactor DBRX tests to use CausalLMModelTest base classes (#38475) 2025-06-13 16:22:12 +01:00
deberta Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
deberta_v2 Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
decision_transformer Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
deepseek_v3 Skip sdpa dispatch on flash test due to unsupported head dims (#39010) 2025-06-24 20:16:56 +02:00
deformable_detr 🚨🚨 Fix initialization of Mask2Former (#38864) 2025-06-18 09:46:22 +02:00
deit Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
depth_anything Skip some export tests on torch 2.7 (#38677) 2025-06-12 12:47:15 +02:00
depth_pro Fix more flaky test_initialization (#38932) 2025-06-20 17:28:32 +02:00
detr enable more test cases on xpu (#38572) 2025-06-06 09:29:51 +02:00
dia Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
diffllama Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
dinat Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
dinov2 Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
dinov2_with_registers Fix more flaky test_initialization (#38932) 2025-06-20 17:28:32 +02:00
distilbert Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
dit Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
donut 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
dots1 [Model] add dots1 (#38143) 2025-06-25 11:38:25 +02:00
dpr Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
dpt Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
efficientnet Add EfficientNet Image PreProcessor (#37055) 2025-04-16 21:59:24 +02:00
electra Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
emu3 update emu3 test (#38543) 2025-06-03 11:02:01 +02:00
encodec Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
encoder_decoder Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
ernie Remove old code for PyTorch, Accelerator and tokenizers (#37234) 2025-04-10 20:54:21 +02:00
esm Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
falcon 🚨 🚨 Inherited CausalLM Tests (#37590) 2025-05-23 18:29:31 +01:00
falcon_h1 [Falcon H1] Fix slow path forward pass (#38320) 2025-05-26 15:30:35 +02:00
falcon_mamba Fix FalconMambaIntegrationTests (#38566) 2025-06-19 13:50:33 +02:00
fastspeech2_conformer 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
flaubert Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
flava Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
fnet 🚨 rm already deprecated pad_to_max_length arg (#37617) 2025-05-01 15:21:55 +02:00
focalnet Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
fsmt Fix fsmt tests (#38904) 2025-06-19 10:56:34 +02:00
funnel Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
fuyu 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
gemma Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
gemma2 Unbreak optimum-executorch (#38646) 2025-06-13 11:13:32 +02:00
gemma3 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
gemma3n Gemma 3n (#39059) 2025-06-26 17:55:47 +02:00
git 🚨 🚨 Setup -> setupclass conversion (#37282) 2025-04-08 17:15:37 +01:00
glm Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
glm4 enable misc test cases on XPU (#38852) 2025-06-18 09:20:49 +02:00
glm4v GLM-4.1V Model support (#38431) 2025-06-25 10:43:05 +02:00
glpn 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
got_ocr2 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
gpt_bigcode Remove head mask in generative models (#35786) 2025-05-15 10:44:19 +02:00
gpt_neo Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
gpt_neox 🚨 🚨 Inherited CausalLM Tests (#37590) 2025-05-23 18:29:31 +01:00
gpt_neox_japanese Remove old code for PyTorch, Accelerator and tokenizers (#37234) 2025-04-10 20:54:21 +02:00
gpt_sw3 Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
gpt2 [tests] remove TF tests (uses of require_tf) (#38944) 2025-06-25 17:29:10 +00:00
gptj Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
granite switch to device agnostic device calling for test cases (#38247) 2025-05-26 10:18:53 +02:00
granite_speech Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
granitemoe switch to device agnostic device calling for test cases (#38247) 2025-05-26 10:18:53 +02:00
granitemoehybrid switch to device agnostic device calling for test cases (#38247) 2025-05-26 10:18:53 +02:00
granitemoeshared switch to device agnostic device calling for test cases (#38247) 2025-05-26 10:18:53 +02:00
grounding_dino enable more test cases on xpu (#38572) 2025-06-06 09:29:51 +02:00
groupvit Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
helium Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
herbert Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
hgnet_v2 Add D-FINE Model into Transformers (#36261) 2025-04-29 12:17:55 +01:00
hiera 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
hubert Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
ibert Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
idefics Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
idefics2 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
idefics3 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
ijepa Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
imagegpt [test] update test_past_key_values_format (#37614) 2025-04-22 11:07:34 +01:00
informer 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
instructblip Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
instructblipvideo Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
internvl Internvl fix (#38946) 2025-06-26 13:44:59 +02:00
jamba Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
janus Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
jetmoe 🚨 🚨 Inherited CausalLM Tests (#37590) 2025-05-23 18:29:31 +01:00
kosmos2 [VLMs] support attention backends (#37576) 2025-05-08 18:18:54 +02:00
kyutai_speech_to_text add _keep_in_fp32_modules_strict (#39058) 2025-06-26 13:55:28 +00:00
layoutlm Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
layoutlmv2 Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
layoutlmv3 [tests] remove TF tests (uses of require_tf) (#38944) 2025-06-25 17:29:10 +00:00
layoutxlm Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
led Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
levit Add Fast LeViT Processor (#37154) 2025-04-14 17:07:36 +02:00
lightglue Add LightGlue model (#31718) 2025-06-17 18:10:23 +02:00
lilt Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
llama Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
llama4 Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
llava Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
llava_next Fix llava_next tests (#38813) 2025-06-13 15:19:41 +02:00
llava_next_video Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
llava_onevision Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
longformer Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
longt5 Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
luke 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
lxmert Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
m2m_100 🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108) 2025-05-22 17:12:58 +02:00
mamba Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
mamba2 align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284) 2025-06-19 11:48:23 +00:00
marian Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
markuplm No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
mask2former 🚨🚨 Fix initialization of Mask2Former (#38864) 2025-06-18 09:46:22 +02:00
maskformer 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
mbart Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
mbart50 Use lru_cache for tokenization tests (#36818) 2025-03-28 15:09:35 +01:00
megatron_bert Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
megatron_gpt2 Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
mgp_str Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
mimi Add kyutai stt (#38909) 2025-06-24 18:01:15 +02:00
minimax Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
mistral fix mistral and mistral3 tests (#38978) 2025-06-23 17:07:18 +02:00
mistral3 Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
mixtral Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
mlcd Add MLCD model (#36182) 2025-04-15 11:33:09 +01:00
mllama Fix mllama (#38704) 2025-06-12 16:15:35 +02:00
mluke Fix typos in strings and comments (#37799) 2025-04-28 11:39:11 +01:00
mobilebert Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
mobilenet_v1 Add Fast Image Processor for MobileNetV1 (#37111) 2025-04-23 15:55:41 -04:00
mobilenet_v2 Add Fast Mobilenet-V2 Processor (#37113) 2025-04-14 17:08:47 +02:00
mobilevit Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
mobilevitv2 Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
modernbert Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
moonshine Skip torchscript tests for 2 models (#38643) 2025-06-06 20:17:37 +02:00
moshi Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
mpnet Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
mpt Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
mra Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
mt5 Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
musicgen enable misc test cases on XPU (#38852) 2025-06-18 09:20:49 +02:00
musicgen_melody enable misc test cases on XPU (#38852) 2025-06-18 09:20:49 +02:00
mvp Fix typos in strings and comments (#37799) 2025-04-28 11:39:11 +01:00
myt5 🚨 🚨 Setup -> setupclass conversion (#37282) 2025-04-08 17:15:37 +01:00
nemotron switch to device agnostic device calling for test cases (#38247) 2025-05-26 10:18:53 +02:00
nllb Use lru_cache for tokenization tests (#36818) 2025-03-28 15:09:35 +01:00
nllb_moe Fix typos in strings and comments (#37799) 2025-04-28 11:39:11 +01:00
nougat Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
nystromformer Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
olmo Unbreak optimum-executorch (#38646) 2025-06-13 11:13:32 +02:00
olmo2 Make HF implementation match original OLMo 2 models for lower precisions (#38131) 2025-05-19 15:35:23 +02:00
olmoe Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
omdet_turbo enable more test cases on xpu (#38572) 2025-06-06 09:29:51 +02:00
oneformer Fix OneFormer integration test (#38016) 2025-05-12 16:02:41 +02:00
openai Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
opt Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
owlv2 🚨 🚨 Setup -> setupclass conversion (#37282) 2025-04-08 17:15:37 +01:00
owlvit Add Fast owlvit Processor (#37164) 2025-04-14 17:58:09 +02:00
paligemma Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
paligemma2 Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
patchtsmixer 🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108) 2025-05-22 17:12:58 +02:00
patchtst Force torch>=2.6 with torch.load to avoid vulnerability issue (#37785) 2025-04-25 16:57:09 +02:00
pegasus Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
pegasus_x 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
perceiver Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
persimmon 🚨 🚨 Inherited CausalLM Tests (#37590) 2025-05-23 18:29:31 +01:00
phi 🚨 🚨 Inherited CausalLM Tests (#37590) 2025-05-23 18:29:31 +01:00
phi3 Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
phi4_multimodal Fix phi4_multimodal tests (#38816) 2025-06-18 09:39:17 +02:00
phimoe Fix MoE gradient test (#38438) 2025-05-28 16:44:20 +01:00
phobert Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
pix2struct Fix more flaky test_initialization (#38932) 2025-06-20 17:28:32 +02:00
pixtral Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
plbart 🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108) 2025-05-22 17:12:58 +02:00
poolformer Add Fast Image Processor for PoolFormer (#37182) 2025-04-23 15:55:33 -04:00
pop2piano [tests] remove TF tests (uses of require_tf) (#38944) 2025-06-25 17:29:10 +00:00
prompt_depth_anything Skip some export tests on torch 2.7 (#38677) 2025-06-12 12:47:15 +02:00
prophetnet [generation] bring back tests on vision models (#38603) 2025-06-06 08:23:15 +00:00
pvt Add Fast PVT Processor (#37204) 2025-04-23 15:55:20 -04:00
pvt_v2 Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
qwen2 Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
qwen2_5_omni Fix qwen_2_5 omni (#38658) 2025-06-12 14:43:54 +02:00
qwen2_5_vl Fix qwen2_5_vl tests (#38845) 2025-06-17 10:55:24 +02:00
qwen2_audio Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
qwen2_moe Fix MoE gradient test (#38438) 2025-05-28 16:44:20 +01:00
qwen2_vl Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
qwen3 Fix qwen3 tests (#38862) 2025-06-17 15:21:36 +02:00
qwen3_moe Fix qwen3_moe tests (#38865) 2025-06-18 14:36:03 +02:00
rag Fix rag (#38585) 2025-06-23 17:42:46 +02:00
recurrent_gemma Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
reformer Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
regnet Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
rembert Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
resnet Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
roberta Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
roberta_prelayernorm Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
roc_bert Remove old code for PyTorch, Accelerator and tokenizers (#37234) 2025-04-10 20:54:21 +02:00
roformer Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
rt_detr enable more test cases on xpu (#38572) 2025-06-06 09:29:51 +02:00
rt_detr_v2 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
rwkv 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
sam [tests] remove TF tests (uses of require_tf) (#38944) 2025-06-25 17:29:10 +00:00
sam_hq Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
seamless_m4t [seamless_m4t] Skip some tests when speech is not available (#38430) 2025-06-02 09:17:28 +00:00
seamless_m4t_v2 [seamless_m4t] Skip some tests when speech is not available (#38430) 2025-06-02 09:17:28 +00:00
segformer Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
seggpt Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
sew Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
sew_d Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
shieldgemma2 Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
siglip [tests] expand flex-attn test for vision models (#38434) 2025-06-03 07:40:44 +00:00
siglip2 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
smollm3 Add SmolLM3 (#38755) 2025-06-25 15:12:15 +00:00
smolvlm Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
speech_encoder_decoder Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
speech_to_text Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
speecht5 [generation] bring back tests on vision models (#38603) 2025-06-06 08:23:15 +00:00
splinter Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
squeezebert Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
stablelm 🚨 🚨 Inherited CausalLM Tests (#37590) 2025-05-23 18:29:31 +01:00
starcoder2 🚨 🚨 Inherited CausalLM Tests (#37590) 2025-05-23 18:29:31 +01:00
superglue Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
superpoint Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
swiftformer Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
swin Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
swin2sr Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
swinv2 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
switch_transformers [generation] bring back tests on vision models (#38603) 2025-06-06 08:23:15 +00:00
t5 Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
t5gemma Encoder-Decoder Gemma (#38332) 2025-06-25 09:05:10 +00:00
table_transformer 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
tapas [tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051) 2025-06-26 16:25:00 +01:00
textnet Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
time_series_transformer 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
timesfm Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
timesformer Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
timm_backbone Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
timm_wrapper Add kwargs for timm.create_model in TimmWrapper (#38860) 2025-06-20 12:00:09 +00:00
trocr 🚨 🚨 Setup -> setupclass conversion (#37282) 2025-04-08 17:15:37 +01:00
tvp Add Optional to remaining types (#37808) 2025-04-28 14:20:45 +01:00
udop Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
umt5 [generation] bring back tests on vision models (#38603) 2025-06-06 08:23:15 +00:00
unispeech Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
unispeech_sat Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
univnet No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
upernet Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
video_llava Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
videomae [tests] expand flex-attn test for vision models (#38434) 2025-06-03 07:40:44 +00:00
vilt Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
vipllava [tests] expand flex-attn test for vision models (#38434) 2025-06-03 07:40:44 +00:00
vision_encoder_decoder Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
vision_text_dual_encoder [tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051) 2025-06-26 16:25:00 +01:00
visual_bert 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
vit Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
vit_mae Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
vit_msn Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
vitdet Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
vitmatte Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
vitpose Skip some export tests on torch 2.7 (#38677) 2025-06-12 12:47:15 +02:00
vitpose_backbone Remove dead protected imports (#38980) 2025-06-23 13:44:50 +02:00
vits Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
vivit 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
vjepa2 Add V-JEPA for video classification model (#38788) 2025-06-13 17:56:15 +01:00
wav2vec2 [tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051) 2025-06-26 16:25:00 +01:00
wav2vec2_bert 🚨 🚨 Setup -> setupclass conversion (#37282) 2025-04-08 17:15:37 +01:00
wav2vec2_conformer Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
wav2vec2_phoneme Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
wav2vec2_with_lm Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
wavlm Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
whisper [tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051) 2025-06-26 16:25:00 +01:00
x_clip 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
xglm Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
xlm Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
xlm_roberta Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
xlm_roberta_xl Remove old code for PyTorch, Accelerator and tokenizers (#37234) 2025-04-10 20:54:21 +02:00
xlnet Deprecate TF + JAX (#38758) 2025-06-11 17:28:06 +01:00
xmod Remove old code for PyTorch, Accelerator and tokenizers (#37234) 2025-04-10 20:54:21 +02:00
yolos 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
yoso Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
zamba Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
zamba2 Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
zoedepth Skip some export tests on torch 2.7 (#38677) 2025-06-12 12:47:15 +02:00
__init__.py Move test model folders (#17034) 2022-05-03 14:42:02 +02:00