transformers/docs/source/en/model_doc
Ryan Mullins c63cfd6a83
Gemma 3n (#39059)
* Gemma 3n

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3p5RMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3)

* Adding gemma3p5 text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* regenerating modeling file after syncing to HEAD

* Use torch.std(..., unbiased=False) for activation sparsity (#8)

* Refactoring to a single QVK Norm (#13)

* AltUp: support scale_corrected_output (#14)

* Converts einsums to nn.Linear (#7)

* Converts einsums to nn.Linear

* Removing unused variables

* Aligning SharedKVCache with HybridCache (#11)

* Alinging SharedKVStore with HybridCache

* Remove KVStore. Refactor apply_rotary_pos_emb for sharing

* Addressing review comments

* Supporting split modality embeddings in Gemma3n (#10)

* Adding the Embedder class

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation

* Apply suggestions from code review

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, prop drilling audio and vision configs to the text config

* Removing TODO's that have been addressed

* Simplify Embedder init and add audio embeddings

* Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder

* Refactoring vision and audio embeddings into ConditionalGeneration model

---------

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating attention mask for Gemma 3.5 (#15)

* xxx_token_index to xxx_token_id

* remvoing deprecated last_cache_position

* Removing references to SigLIP

* Always init per-layer inputs

* Using torch.finfo().min for epsilon_tensor

* Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas

* fix modular GEMMA3N_INPUTS_DOCSTRING

* Gemma3nAttention inherits from Gemma3Attention

* Modular inheritance fixes

* CausalLM conversion script for 4B model (#16)

* Add Gemma3n Audio Encoder (#6)

* initial commit of Gemma 3.5 scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3n overall and text config with vision and audio config placeholders (#3)

* Adding gemma3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3.5 (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right Gemma 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* CausalLM conversion script for 4B model

* inv_timescales to non-persistent buffer

* Addressing audio encoder Attention feedback

* Addressing Gemma3nAudioSSCPConvBlock feedback

* Addressing Gemma3nAudioConformerAttention feedback

* Addressing padding feedback

* Weights conversion loads audio state dict

* Always use vision_config so saving works

* Token id updates for configs

* Stubs for interleaving audio embs

* Addressing reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Fixing cache access error

* Removing duplicate code from a bad merge

* Gemma 3n Text + Vision Part 1 (#17)

* testing utilities for numerics comparisons

* Corrected einsum to nn.Linear weights conversion

* Inherit scaled word embs from Gemma3 not Bart

* Fixing transposes for collapsed linears

* More transpose fixes

* numpy api fix

* RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True

* Force AltUp  to float32

* Updating debugging script for AudioEncoder debugging

* Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs

* Correcting attention einsum conversions

* RMSNorm in type of x

* Fixing douplicate laurel norm/gating

* KV sharing using the right previous indices

* Refactor kv shared index computation. Correct frac_shared_layers

* Use num_shared_layers instead of inferring from a fraction

* fixing a bug for logging

* Fix shared data_ptrs in altup inits

* rope: adjust proj -> norm -> rope to preserve computation (#20)

* rope: adjust proj -> norm -> rope to preserve computation

* Removing some breaking language model fluff in ConditionalGeneration

* Consolidate query_states transforms

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Vectorize the loops in AltUp (#19)

* Vectorize the loops in AltUp

* fix typo

* Expanding to support batched inputs

* remove extra debug script

* Fix AltUp.forward

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel

* Convert norm to 1/sqrt (#21)

* Convert norm to 1/sqrt

* Scale shift change per Phil's rec

* Adding default activation sparsity

* Fixing 2B config in weights conversion script

* Fixing RMSNorm parameters - adding scale_shift and with_scale

* Correcting query pre-attention scaling

* Adding query_rescale_scalar to text config

* Adding layer_idx to MLP

* Permafix for input_layernorm

* Use 1/sqrt instead of rsqrt in DecoderLayer

* Fix o_proj conversion

* Conversion script update for vision encoder

* Removing logging for debugging timm model

* Fixing bugs in Gemma3nForConditionalGeneration for text generation

* Generating the modeling_gemma3n.py file

* Removing the addition of an erroneous line in the modeling file

* Adding gemma3n text model to modeling_auto

* Bugfix: Updating the interleaving of inputs_embeds and vision_embeds

* Updating the modeling file with the latest bugfix changes

* Updating models/auto for Gemma 3n

* using AutoTokenizer in forward test

* Adding processing_gemma3n.py

* Gemma 3n configured for AutoModel. Conversion script updated.

* Removing errant merge artifacts

---------

Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Removing errant debugging statements from Gemma 3

* Gemma3n audio model (#18)

* testing utilities for numerics comparisons

* Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock

* Add audio version of forward script based on RyanMullins' implementation

* Updating to match encoder tests. WIP: config question needs resolving

* Updates to audio classes to enable end-to-end running

* Removing vestigial classes, cleaning up print statements

* Adding SiLU / Swish to audio conformer feed forward block

* Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio

* Adding outputs to audio test

* Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model

* Update forward test to load from local weights

* Update conversion to process / output audio layers

* Update __all__ to export audio encoder

* AutoModel registration for Gemma 3n Audio

* Use AutoModel for ConditionalGeneration.audio_tower

* Fixing input_proj_linear transpose

* Fixing Gemma3NanoAudioConformerAttention.post conversion

* Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion

* Correcting indentation issue on Gemma3p5RMSNorm

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Text + Vision Part 2 (#23)

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3p5.py

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Updating configs for the 2B variant in the conversion script

* Using final generation config in conversion script

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Audio Integration (#12)

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma 3n overall and text config with vision and audio config placeholders (#3)

* Adding Gemma 3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemma3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3n

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* Converting sl.Frontend to FeatureExtractor

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3n.py

* Update modular

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Draft of audio data in chat template

* Removing image processing. Using SigLIP instead.

* Audio input going end-to-end

* Fixing dtype issues in audio encoder

* x-lib formatting consistency

* Adding example data

* Save preprocessor_config.json from conversion script

* Instrumentaiton for debugging

* Additional instrumentation for preprocessing debugging

* Updates to preprocessor, padding; produces correct end-to-end results on sample

* Tackling configuraiton TODOs

* Start of feature extractor refatcor

* Adds Numpy version of USM extractor, removes Torch version and dependencies

* Fixing AltUp.correct coef permute

* Supporting batches of single audio segment inputs

* Docstrings updates for config

* In-lining audio feature extraction

* Adjustments to conversion script and smoke test script

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>

* Gemma 3n renaming

* Removing test data and utilities

* Renaming test files

* Gemma 3n refactor

* Fix tokenizer config in conversion script

* Address reviewer feedback

* FeatureExtractor returns float32 by default

* Adding basic tests for audio, and input name for audio encoder

* Audio integration test, updates to model_id for other integration tests

* Use scales for q and k norms (#26)

* Update audio integration test to use HF dataset

* Reviewer feedback

* Expand embedding table to full vocab size in weights conversion

* Mix-n-match MatFormers for Gemma 3n (#25)

* Remove in-place operations (#30)

* chore: removing inplace ops

* remove [tensor] * n pattern

* chore: reviewer feedback in AudioEncoder and AltUp

* More grad clipping

* Dynamo compatibility

* fix: cache slicing error

* chore: simplify shared kv cache slicing

* chore: vision encoder rename in timm

* fix: image processor do_normalize=False

* fixup: style

* chore: model_doc

* fix: docs for code quality

* chore: repo consistency

* fix: RMSNorm in float as in prior Gemmas

* fix: per_layer_inputs = None

* chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint

* chore: repo consistency

* Add initial unit tests for Gemma3nAudioFeatureExtractor (#27)

* Add initial unit tests for Gemma3nAudioFeatureExtractor

* Add basic unit tests for Gemma3nProcessor (#28)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* parameterize tests

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* chore: code style

* fix: test cases

* style and consistency

* fix config in the test to be coherent with layer cache sharing

* fix hidden states in tests and code

* inits and mappings

* fix modality prefixes

* test order and prefixes

* fix test exception

* fix class order and reduce model size for faster tests

* restore _checkpoint_conversion_mapping to load Caual from Conditional

* fix config mapping!

* fix: reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix import test

* add model args

* auto_docstring

* replace test path

* consistency

* skip tests for now

* fix docstring for doc builder

* skip unused attr

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-26 17:55:47 +02:00
..
albert.md Remove merge conflict artifacts in Albert model doc (#38849) 2025-06-16 14:21:18 -07:00
align.md Updated the Model docs - for the ALIGN model (#38072) 2025-05-28 09:19:09 -07:00
altclip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
arcee.md Add Arcee model support (#38621) 2025-06-24 15:05:29 +02:00
aria.md Updated Aria model card (#38472) 2025-06-05 14:36:54 -07:00
audio-spectrogram-transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
auto.md Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
autoformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
aya_vision.md Updated aya_vision.md (#38749) 2025-06-16 10:46:30 -07:00
bamba.md Update bamba model card (#38853) 2025-06-18 16:01:25 -07:00
bark.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bart.md New bart model card (#37858) 2025-05-27 11:51:41 -07:00
barthez.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bartpho.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
beit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bert-generation.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bert-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
bertweet.md Updated BERTweet model card. (#37981) 2025-05-27 11:51:22 -07:00
big_bird.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bigbird_pegasus.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
biogpt.md Update BioGPT model card (#38214) 2025-05-23 13:03:47 -07:00
bit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bitnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blenderbot-small.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blenderbot.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blip-2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
blip.md Update blip model card (#38513) 2025-06-20 13:46:19 -07:00
bloom.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
bort.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bridgetower.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
bros.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
byt5.md Standardize ByT5 model card format (#38699) 2025-06-09 15:02:50 -07:00
camembert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
canine.md New canine model card (#38631) 2025-06-10 09:30:05 -07:00
chameleon.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
chinese_clip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
clap.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
clip.md Updated the model card for CLIP (#37040) 2025-04-02 14:57:38 -07:00
clipseg.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
clvp.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
code_llama.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
codegen.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cohere.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
cohere2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
colpali.md Add ColQwen2 to 🤗 transformers (#35778) 2025-06-02 12:58:01 +00:00
colqwen2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
conditional_detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
convbert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
convnext.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
convnextv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cpm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cpmant.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
csm.md [CSM] update model id (#38211) 2025-05-27 17:03:55 +02:00
ctrl.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
cvt.md Update CvT documentation with improved usage examples and additional … (#38731) 2025-06-17 10:30:03 -07:00
d_fine.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dab-detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dac.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
data2vec.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dbrx.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deberta-v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deberta.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
decision_transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deepseek_v3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deformable_detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deplot.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
depth_anything_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
depth_anything.md Update model card for Depth Anything (#37065) 2025-04-04 11:36:05 -07:00
depth_pro.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
deta.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
detr.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
dia.md Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
dialogpt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
diffllama.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dinat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dinov2_with_registers.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dinov2.md Add usage example for DINOv2 (#37398) 2025-05-01 08:54:22 -07:00
distilbert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
dit.md [Docs] New DiT model card (#38721) 2025-06-12 10:26:50 -07:00
donut.md Add Fast Image Processor for Donut (#37081) 2025-04-14 16:24:01 +02:00
dots1.md [Model] add dots1 (#38143) 2025-06-25 11:38:25 +02:00
dpr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
dpt.md 36978 | Fast image processor for DPT model (#37481) 2025-06-18 17:33:29 +00:00
efficientformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
efficientnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
electra.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
emu3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
encodec.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
encoder-decoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ernie_m.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ernie.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
esm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
falcon_h1.md [MODEL] Add Falcon H1 (#38249) 2025-05-21 10:43:11 +02:00
falcon_mamba.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
falcon.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
falcon3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fastspeech2_conformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
flan-t5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
flan-ul2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
flaubert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
flava.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
fnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
focalnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
fsmt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
funnel.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
fuyu.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
gemma.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
gemma2.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
gemma3.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
gemma3n.md Gemma 3n (#39059) 2025-06-26 17:55:47 +02:00
git.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
glm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
glm4.md Add glm4 (#37388) 2025-04-09 14:02:04 +02:00
glm4v.md GLM-4.1V Model support (#38431) 2025-06-25 10:43:05 +02:00
glpn.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
got_ocr2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
gpt_bigcode.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
gpt_neo.md New gpt neo model card (#38505) 2025-06-04 09:56:47 -07:00
gpt_neox_japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt_neox.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt-sw3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gpt2.md Aligning modling code for GPT2 to work with vLLM (fallback) (#36934) 2025-05-02 09:55:16 +02:00
gptj.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
gptsan-japanese.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
granite_speech.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
granite.md Update granite.md (#37791) 2025-05-27 12:55:15 -07:00
granitemoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
granitemoehybrid.md Add GraniteMoeHybrid support for 4.0 (#37658) 2025-05-06 06:47:43 +02:00
granitemoeshared.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
granitevision.md Update Granite Vision Model Path / Tests (#35998) 2025-02-03 20:06:03 +01:00
graphormer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
grounding-dino.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
groupvit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
helium.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
herbert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
hgnet_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
hiera.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
hubert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ibert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
idefics.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
idefics2.md Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
idefics3.md Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
ijepa.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
imagegpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
informer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
instructblip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
instructblipvideo.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
internvl.md 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00
jamba.md 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
janus.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
jetmoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
jukebox.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
kosmos-2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
kyutai_speech_to_text.md [Kyutai-STT] correct model type + model id (#39035) 2025-06-25 16:09:00 +00:00
layoutlm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
layoutlmv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
layoutlmv3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
layoutxlm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
led.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
levit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
lightglue.md Add LightGlue model (#31718) 2025-06-17 18:10:23 +02:00
lilt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llama.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
llama2.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
llama3.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
llama4.md Add llama4 (#37307) 2025-04-05 22:02:22 +02:00
llava_next_video.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llava_next.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llava_onevision.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llava.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
longformer.md Fix broken tag in Longformer model card (#38828) 2025-06-16 07:44:40 -07:00
longt5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
luke.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
lxmert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
m2m_100.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
madlad-400.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mamba.md Simplify and update trl examples (#38772) 2025-06-13 12:03:49 +00:00
mamba2.md Simplify and update trl examples (#38772) 2025-06-13 12:03:49 +00:00
marian.md 🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108) 2025-05-22 17:12:58 +02:00
markuplm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mask2former.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
maskformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
matcha.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mbart.md Remove head mask in generative models (#35786) 2025-05-15 10:44:19 +02:00
mctct.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mega.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
megatron_gpt2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
megatron-bert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mgp-str.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mimi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
minimax.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mistral.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
mistral3.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
mixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mlcd.md Add MLCD model (#36182) 2025-04-15 11:33:09 +01:00
mllama.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
mluke.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mms.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mobilebert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
mobilenet_v1.md Model card for mobilenet v1 and v2 (#37948) 2025-05-28 09:20:19 -07:00
mobilenet_v2.md Model card for mobilenet v1 and v2 (#37948) 2025-05-28 09:20:19 -07:00
mobilevit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mobilevitv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
modernbert.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
moonshine.md Updated moonshine modelcard (#38711) 2025-06-12 10:27:17 -07:00
moshi.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mpnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mpt.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
mra.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mt5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
musicgen_melody.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
musicgen.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
mvp.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
myt5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nemotron.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nezha.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nllb-moe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nllb.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nougat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
nystromformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
olmo.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
olmo2.md Updated model card for OLMo2 (#38394) 2025-05-27 16:24:36 -07:00
olmoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
omdet-turbo.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
oneformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
open-llama.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
openai-gpt.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
opt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
owlv2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
owlvit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
paligemma.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
patchtsmixer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
patchtst.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pegasus_x.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pegasus.md Add missing div in Pegasus model card (#38773) 2025-06-12 10:27:07 -07:00
perceiver.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
persimmon.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
phi.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
phi3.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
phi4_multimodal.md Update Phi4 converter (#37594) 2025-04-17 23:08:24 +02:00
phimoe.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
phobert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pix2struct.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pixtral.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
plbart.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
poolformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pop2piano.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
prompt_depth_anything.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
prophetnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pvt_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
pvt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qdqbert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qwen2_5_omni.md [doc] fix the code examples in qwen doc (#37803) 2025-04-28 11:56:32 +01:00
qwen2_5_vl.md 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
qwen2_audio.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qwen2_moe.md Add Qwen2 MoE model card (#38649) 2025-06-11 15:14:01 -07:00
qwen2_vl.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
qwen2.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
qwen3_moe.md Adding Qwen3 and Qwen3MoE (#36878) 2025-03-31 09:50:49 +02:00
qwen3.md Adding Qwen3 and Qwen3MoE (#36878) 2025-03-31 09:50:49 +02:00
rag.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
realm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
recurrent_gemma.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
reformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
regnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
rembert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
resnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
retribert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
roberta-prelayernorm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
roberta.md [docs] updated roberta model card (#38777) 2025-06-13 12:02:44 -07:00
roc_bert.md Update roc bert docs (#38835) 2025-06-17 11:02:18 -07:00
roformer.md [docs]: update roformer.md model card (#37946) 2025-05-23 16:27:56 -07:00
rt_detr_v2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
rt_detr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
rwkv.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
sam_hq.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
sam.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
seamless_m4t_v2.md Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
seamless_m4t.md Remove script datasets in tests (#38940) 2025-06-25 14:31:20 +00:00
segformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
seggpt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
sew-d.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
sew.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
shieldgemma2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
siglip.md chore: update model card for SigLIP (#37585) 2025-04-18 13:30:41 -07:00
siglip2.md chore: update SigLIP2 model card (#37624) 2025-04-25 12:46:17 -07:00
smollm3.md Add SmolLM3 (#38755) 2025-06-25 15:12:15 +00:00
smolvlm.md Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157) 2025-06-23 14:17:25 +00:00
speech_to_text_2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
speech_to_text.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
speech-encoder-decoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
speecht5.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
splinter.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
squeezebert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
stablelm.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
starcoder2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
superglue.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
superpoint.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
swiftformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
swin.md docs(swin): Update Swin model card to standard format (#37628) 2025-05-21 16:16:43 -07:00
swin2sr.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
swinv2.md docs(swinv2): Update SwinV2 model card to new standard format (#37942) 2025-05-23 13:04:13 -07:00
switch_transformers.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
t5.md Transformers cli clean command (#37657) 2025-04-30 12:15:43 +01:00
t5gemma.md Encoder-Decoder Gemma (#38332) 2025-06-25 09:05:10 +00:00
t5v1.1.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
table-transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
tapas.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tapex.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
textnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
time_series_transformer.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
timesfm.md Add TimesFM Time Series Forecasting Model (#34082) 2025-04-16 15:00:53 +02:00
timesformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
timm_wrapper.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
trajectory_transformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
transfo-xl.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
trocr.md Docs: Add custom fine-tuning tutorial to TrOCR model page (#38847) 2025-06-18 09:38:58 -07:00
tvlt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
tvp.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
udop.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
ul2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
umt5.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
unispeech-sat.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
unispeech.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
univnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
upernet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
van.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
video_llava.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
videomae.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vilt.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vipllava.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vision-encoder-decoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vision-text-dual-encoder.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
visual_bert.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vit_hybrid.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vit_mae.md Updated the model card for ViTMAE (#38302) 2025-05-28 09:19:43 -07:00
vit_msn.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vit.md [docs] Model docs (#36469) 2025-03-21 15:35:22 -07:00
vitdet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vitmatte.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vitpose.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vits.md Update VITS model card (#37335) 2025-04-15 13:16:05 -07:00
vivit.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
vjepa2.md Add V-JEPA for video classification model (#38788) 2025-06-13 17:56:15 +01:00
wav2vec2_phoneme.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
wav2vec2-bert.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
wav2vec2-conformer.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
wav2vec2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
wavlm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
whisper.md [Whisper] handle deprecation of forced_decoder_ids (#38232) 2025-05-22 09:16:38 +00:00
xclip.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xglm.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlm-prophetnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlm-roberta-xl.md Created model card for xlm-roberta-xl (#38597) 2025-06-09 13:00:38 -07:00
xlm-roberta.md Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596) 2025-06-09 12:26:31 -07:00
xlm-v.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlm.md Created model card for XLM model (#38595) 2025-06-09 12:26:23 -07:00
xlnet.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xls_r.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xlsr_wav2vec2.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
xmod.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
yolos.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
yoso.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
zamba.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zamba2.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
zoedepth.md added fast image processor for ZoeDepth and expanded tests accordingly (#38515) 2025-06-04 22:59:17 +00:00