transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-19 12:38:23 +06:00

History

Ryan Mullins c63cfd6a83 Gemma 3n (#39059 ) * Gemma 3n * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (#8) * Refactoring to a single QVK Norm (#13) * AltUp: support scale_corrected_output (#14) * Converts einsums to nn.Linear (#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating attention mask for Gemma 3.5 (#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (#16) * Add Gemma3n Audio Encoder (#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Vectorize the loops in AltUp (#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Text + Vision Part 2 (#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Audio Integration (#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemma3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: pculliton <phillipculliton@gmail.com> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (#25) * Remove in-place operations (#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (#28) Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * parameterize tests --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix import test * add model args * auto_docstring * replace test path * consistency * skip tests for now * fix docstring for doc builder * skip unused attr --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>		2025-06-26 17:55:47 +02:00
..
albert.md	Remove merge conflict artifacts in Albert model doc (#38849 )	2025-06-16 14:21:18 -07:00
align.md	Updated the Model docs - for the ALIGN model (#38072 )	2025-05-28 09:19:09 -07:00
altclip.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
arcee.md	Add Arcee model support (#38621 )	2025-06-24 15:05:29 +02:00
aria.md	Updated Aria model card (#38472 )	2025-06-05 14:36:54 -07:00
audio-spectrogram-transformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
auto.md	Add Dia model (#38405 )	2025-06-26 11:04:23 +00:00
autoformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
aya_vision.md	Updated aya_vision.md (#38749 )	2025-06-16 10:46:30 -07:00
bamba.md	Update bamba model card (#38853 )	2025-06-18 16:01:25 -07:00
bark.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
bart.md	New bart model card (#37858 )	2025-05-27 11:51:41 -07:00
barthez.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
bartpho.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
beit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
bert-generation.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
bert-japanese.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
bert.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
bertweet.md	Updated BERTweet model card. (#37981 )	2025-05-27 11:51:22 -07:00
big_bird.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
bigbird_pegasus.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
biogpt.md	Update BioGPT model card (#38214 )	2025-05-23 13:03:47 -07:00
bit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
bitnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
blenderbot-small.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
blenderbot.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
blip-2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
blip.md	Update blip model card (#38513 )	2025-06-20 13:46:19 -07:00
bloom.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
bort.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
bridgetower.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
bros.md	No more Tuple, List, Dict (#38797 )	2025-06-17 19:37:18 +01:00
byt5.md	Standardize ByT5 model card format (#38699 )	2025-06-09 15:02:50 -07:00
camembert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
canine.md	New canine model card (#38631 )	2025-06-10 09:30:05 -07:00
chameleon.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
chinese_clip.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
clap.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
clip.md	Updated the model card for CLIP (#37040 )	2025-04-02 14:57:38 -07:00
clipseg.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
clvp.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
code_llama.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
codegen.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
cohere.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
cohere2.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
colpali.md	Add ColQwen2 to 🤗 transformers (#35778 )	2025-06-02 12:58:01 +00:00
colqwen2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
conditional_detr.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
convbert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
convnext.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
convnextv2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
cpm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
cpmant.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
csm.md	[CSM] update model id (#38211 )	2025-05-27 17:03:55 +02:00
ctrl.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
cvt.md	Update CvT documentation with improved usage examples and additional … (#38731 )	2025-06-17 10:30:03 -07:00
d_fine.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
dab-detr.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
dac.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
data2vec.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
dbrx.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
deberta-v2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
deberta.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
decision_transformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
deepseek_v3.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
deformable_detr.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
deit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
deplot.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
depth_anything_v2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
depth_anything.md	Update model card for Depth Anything (#37065 )	2025-04-04 11:36:05 -07:00
depth_pro.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
deta.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
detr.md	No more Tuple, List, Dict (#38797 )	2025-06-17 19:37:18 +01:00
dia.md	Add Dia model (#38405 )	2025-06-26 11:04:23 +00:00
dialogpt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
diffllama.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
dinat.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
dinov2_with_registers.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
dinov2.md	Add usage example for DINOv2 (#37398 )	2025-05-01 08:54:22 -07:00
distilbert.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
dit.md	[Docs] New DiT model card (#38721 )	2025-06-12 10:26:50 -07:00
donut.md	Add Fast Image Processor for Donut (#37081 )	2025-04-14 16:24:01 +02:00
dots1.md	[Model] add dots1 (#38143 )	2025-06-25 11:38:25 +02:00
dpr.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
dpt.md	36978 \| Fast image processor for DPT model (#37481 )	2025-06-18 17:33:29 +00:00
efficientformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
efficientnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
electra.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
emu3.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
encodec.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
encoder-decoder.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
ernie_m.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
ernie.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
esm.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
falcon_h1.md	[MODEL] Add Falcon H1 (#38249 )	2025-05-21 10:43:11 +02:00
falcon_mamba.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
falcon.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
falcon3.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
fastspeech2_conformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
flan-t5.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
flan-ul2.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
flaubert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
flava.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
fnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
focalnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
fsmt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
funnel.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
fuyu.md	🔴 [VLM] Add base model without head (#37033 )	2025-05-07 17:47:51 +02:00
gemma.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
gemma2.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
gemma3.md	🔴 [VLM] Add base model without head (#37033 )	2025-05-07 17:47:51 +02:00
gemma3n.md	Gemma 3n (#39059 )	2025-06-26 17:55:47 +02:00
git.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
glm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
glm4.md	Add glm4 (#37388 )	2025-04-09 14:02:04 +02:00
glm4v.md	GLM-4.1V Model support (#38431 )	2025-06-25 10:43:05 +02:00
glpn.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
got_ocr2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
gpt_bigcode.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
gpt_neo.md	New gpt neo model card (#38505 )	2025-06-04 09:56:47 -07:00
gpt_neox_japanese.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
gpt_neox.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
gpt-sw3.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
gpt2.md	Aligning modling code for GPT2 to work with vLLM (fallback) (#36934 )	2025-05-02 09:55:16 +02:00
gptj.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
gptsan-japanese.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
granite_speech.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
granite.md	Update granite.md (#37791 )	2025-05-27 12:55:15 -07:00
granitemoe.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
granitemoehybrid.md	Add GraniteMoeHybrid support for 4.0 (#37658 )	2025-05-06 06:47:43 +02:00
granitemoeshared.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
granitevision.md	Update Granite Vision Model Path / Tests (#35998 )	2025-02-03 20:06:03 +01:00
graphormer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
grounding-dino.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
groupvit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
helium.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
herbert.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
hgnet_v2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
hiera.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
hubert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
ibert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
idefics.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
idefics2.md	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
idefics3.md	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
ijepa.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
imagegpt.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
informer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
instructblip.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
instructblipvideo.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
internvl.md	🔴 Video processors as a separate class (#35206 )	2025-05-12 11:55:51 +02:00
jamba.md	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
janus.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
jetmoe.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
jukebox.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
kosmos-2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
kyutai_speech_to_text.md	[Kyutai-STT] correct model type + model id (#39035 )	2025-06-25 16:09:00 +00:00
layoutlm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
layoutlmv2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
layoutlmv3.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
layoutxlm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
led.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
levit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
lightglue.md	Add LightGlue model (#31718 )	2025-06-17 18:10:23 +02:00
lilt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
llama.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
llama2.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
llama3.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
llama4.md	Add llama4 (#37307 )	2025-04-05 22:02:22 +02:00
llava_next_video.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
llava_next.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
llava_onevision.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
llava.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
longformer.md	Fix broken tag in Longformer model card (#38828 )	2025-06-16 07:44:40 -07:00
longt5.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
luke.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
lxmert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
m2m_100.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
madlad-400.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
mamba.md	Simplify and update trl examples (#38772 )	2025-06-13 12:03:49 +00:00
mamba2.md	Simplify and update trl examples (#38772 )	2025-06-13 12:03:49 +00:00
marian.md	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 )	2025-05-22 17:12:58 +02:00
markuplm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mask2former.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
maskformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
matcha.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mbart.md	Remove head mask in generative models (#35786 )	2025-05-15 10:44:19 +02:00
mctct.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mega.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
megatron_gpt2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
megatron-bert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mgp-str.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mimi.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
minimax.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mistral.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
mistral3.md	🔴 [VLM] Add base model without head (#37033 )	2025-05-07 17:47:51 +02:00
mixtral.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
mlcd.md	Add MLCD model (#36182 )	2025-04-15 11:33:09 +01:00
mllama.md	🔴 [VLM] Add base model without head (#37033 )	2025-05-07 17:47:51 +02:00
mluke.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mms.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mobilebert.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
mobilenet_v1.md	Model card for mobilenet v1 and v2 (#37948 )	2025-05-28 09:20:19 -07:00
mobilenet_v2.md	Model card for mobilenet v1 and v2 (#37948 )	2025-05-28 09:20:19 -07:00
mobilevit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mobilevitv2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
modernbert.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
moonshine.md	Updated moonshine modelcard (#38711 )	2025-06-12 10:27:17 -07:00
moshi.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
mpnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mpt.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
mra.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mt5.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
musicgen_melody.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
musicgen.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
mvp.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
myt5.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
nat.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
nemotron.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
nezha.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
nllb-moe.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
nllb.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
nougat.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
nystromformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
olmo.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
olmo2.md	Updated model card for OLMo2 (#38394 )	2025-05-27 16:24:36 -07:00
olmoe.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
omdet-turbo.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
oneformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
open-llama.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
openai-gpt.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
opt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
owlv2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
owlvit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
paligemma.md	🔴 [VLM] Add base model without head (#37033 )	2025-05-07 17:47:51 +02:00
patchtsmixer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
patchtst.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
pegasus_x.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
pegasus.md	Add missing div in Pegasus model card (#38773 )	2025-06-12 10:27:07 -07:00
perceiver.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
persimmon.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
phi.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
phi3.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
phi4_multimodal.md	Update Phi4 converter (#37594 )	2025-04-17 23:08:24 +02:00
phimoe.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
phobert.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
pix2struct.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
pixtral.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
plbart.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
poolformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
pop2piano.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
prompt_depth_anything.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
prophetnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
pvt_v2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
pvt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
qdqbert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
qwen2_5_omni.md	[doc] fix the code examples in qwen doc (#37803 )	2025-04-28 11:56:32 +01:00
qwen2_5_vl.md	🔴 [VLM] Add base model without head (#37033 )	2025-05-07 17:47:51 +02:00
qwen2_audio.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
qwen2_moe.md	Add Qwen2 MoE model card (#38649 )	2025-06-11 15:14:01 -07:00
qwen2_vl.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
qwen2.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
qwen3_moe.md	Adding Qwen3 and Qwen3MoE (#36878 )	2025-03-31 09:50:49 +02:00
qwen3.md	Adding Qwen3 and Qwen3MoE (#36878 )	2025-03-31 09:50:49 +02:00
rag.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
realm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
recurrent_gemma.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
reformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
regnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
rembert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
resnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
retribert.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
roberta-prelayernorm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
roberta.md	[docs] updated roberta model card (#38777 )	2025-06-13 12:02:44 -07:00
roc_bert.md	Update roc bert docs (#38835 )	2025-06-17 11:02:18 -07:00
roformer.md	[docs]: update roformer.md model card (#37946 )	2025-05-23 16:27:56 -07:00
rt_detr_v2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
rt_detr.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
rwkv.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
sam_hq.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
sam.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
seamless_m4t_v2.md	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
seamless_m4t.md	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
segformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
seggpt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
sew-d.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
sew.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
shieldgemma2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
siglip.md	chore: update model card for SigLIP (#37585 )	2025-04-18 13:30:41 -07:00
siglip2.md	chore: update SigLIP2 model card (#37624 )	2025-04-25 12:46:17 -07:00
smollm3.md	Add SmolLM3 (#38755 )	2025-06-25 15:12:15 +00:00
smolvlm.md	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
speech_to_text_2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
speech_to_text.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
speech-encoder-decoder.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
speecht5.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
splinter.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
squeezebert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
stablelm.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
starcoder2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
superglue.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
superpoint.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
swiftformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
swin.md	docs(swin): Update Swin model card to standard format (#37628 )	2025-05-21 16:16:43 -07:00
swin2sr.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
swinv2.md	docs(swinv2): Update SwinV2 model card to new standard format (#37942 )	2025-05-23 13:04:13 -07:00
switch_transformers.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
t5.md	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
t5gemma.md	Encoder-Decoder Gemma (#38332 )	2025-06-25 09:05:10 +00:00
t5v1.1.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
table-transformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
tapas.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
tapex.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
textnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
time_series_transformer.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
timesfm.md	Add TimesFM Time Series Forecasting Model (#34082 )	2025-04-16 15:00:53 +02:00
timesformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
timm_wrapper.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
trajectory_transformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
transfo-xl.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
trocr.md	Docs: Add custom fine-tuning tutorial to TrOCR model page (#38847 )	2025-06-18 09:38:58 -07:00
tvlt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
tvp.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
udop.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
ul2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
umt5.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
unispeech-sat.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
unispeech.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
univnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
upernet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
van.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
video_llava.md	No more Tuple, List, Dict (#38797 )	2025-06-17 19:37:18 +01:00
videomae.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vilt.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vipllava.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vision-encoder-decoder.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vision-text-dual-encoder.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
visual_bert.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vit_hybrid.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vit_mae.md	Updated the model card for ViTMAE (#38302 )	2025-05-28 09:19:43 -07:00
vit_msn.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vit.md	[docs] Model docs (#36469 )	2025-03-21 15:35:22 -07:00
vitdet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vitmatte.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vitpose.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vits.md	Update VITS model card (#37335 )	2025-04-15 13:16:05 -07:00
vivit.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
vjepa2.md	Add V-JEPA for video classification model (#38788 )	2025-06-13 17:56:15 +01:00
wav2vec2_phoneme.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
wav2vec2-bert.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
wav2vec2-conformer.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
wav2vec2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
wavlm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
whisper.md	[Whisper] handle deprecation of `forced_decoder_ids` (#38232 )	2025-05-22 09:16:38 +00:00
xclip.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
xglm.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
xlm-prophetnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
xlm-roberta-xl.md	Created model card for xlm-roberta-xl (#38597 )	2025-06-09 13:00:38 -07:00
xlm-roberta.md	Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596 )	2025-06-09 12:26:31 -07:00
xlm-v.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
xlm.md	Created model card for XLM model (#38595 )	2025-06-09 12:26:23 -07:00
xlnet.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
xls_r.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
xlsr_wav2vec2.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
xmod.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
yolos.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
yoso.md	Use HF papers (#38184 )	2025-06-13 11:07:09 +00:00
zamba.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
zamba2.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
zoedepth.md	added fast image processor for ZoeDepth and expanded tests accordingly (#38515 )	2025-06-04 22:59:17 +00:00