transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 19:21:31 +06:00

History

Ryan Mullins c63cfd6a83 Gemma 3n (#39059 ) * Gemma 3n * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (#8) * Refactoring to a single QVK Norm (#13) * AltUp: support scale_corrected_output (#14) * Converts einsums to nn.Linear (#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating attention mask for Gemma 3.5 (#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (#16) * Add Gemma3n Audio Encoder (#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Vectorize the loops in AltUp (#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Text + Vision Part 2 (#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Audio Integration (#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemma3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: pculliton <phillipculliton@gmail.com> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (#25) * Remove in-place operations (#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (#28) Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * parameterize tests --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix import test * add model args * auto_docstring * replace test path * consistency * skip tests for now * fix docstring for doc builder * skip unused attr --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>		2025-06-26 17:55:47 +02:00
..
albert	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
align	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
altclip	🚨 🚨 Setup -> setupclass conversion (#37282 )	2025-04-08 17:15:37 +01:00
arcee	Add Arcee model support (#38621 )	2025-06-24 15:05:29 +02:00
aria	Don't run `AriaForConditionalGenerationModelTest` on CircleCI (#38615 )	2025-06-06 11:30:31 +02:00
audio_spectrogram_transformer	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
auto	Add Dia model (#38405 )	2025-06-26 11:04:23 +00:00
autoformer	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
aya_vision	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
bamba	enable misc test cases on XPU (#38852 )	2025-06-18 09:20:49 +02:00
bark	enable more test cases on xpu (#38572 )	2025-06-06 09:29:51 +02:00
bart	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
barthez	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
bartpho	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
beit	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
bert	[tests] remove TF tests (uses of `require_tf`) (#38944 )	2025-06-25 17:29:10 +00:00
bert_generation	Fix typos in strings and comments (#37799 )	2025-04-28 11:39:11 +01:00
bert_japanese	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
bertweet	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
big_bird	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
bigbird_pegasus	[generation] bring back tests on vision models (#38603 )	2025-06-06 08:23:15 +00:00
biogpt	Bart: new cache format (#35314 )	2025-05-16 13:26:54 +02:00
bit	Add ImageProcessorFast to BiT processor (#37180 )	2025-04-14 17:07:48 +02:00
bitnet	Add Bitnet model (#37742 )	2025-04-28 15:08:46 +02:00
blenderbot	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
blenderbot_small	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
blip	Fix more flaky `test_initialization` (#38932 )	2025-06-20 17:28:32 +02:00
blip_2	[generation] bring back tests on vision models (#38603 )	2025-06-06 08:23:15 +00:00
bloom	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
bridgetower	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
bros	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
byt5	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
camembert	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
canine	Skip torchscript tests for 2 models (#38643 )	2025-06-06 20:17:37 +02:00
chameleon	Update some tests for torch 2.7.1 (#38701 )	2025-06-10 11:46:52 +02:00
chinese_clip	Add Fast Chinese-CLIP Processor (#37012 )	2025-04-15 18:31:20 +02:00
clap	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
clip	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
clipseg	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
clvp	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
code_llama	remove unhandled parameter (#38145 )	2025-06-02 15:57:32 +02:00
codegen	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
cohere	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
cohere2	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
colpali	Add ColQwen2 to 🤗 transformers (#35778 )	2025-06-02 12:58:01 +00:00
colqwen2	Update some tests for torch 2.7.1 (#38701 )	2025-06-10 11:46:52 +02:00
conditional_detr	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
convbert	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
convnext	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
convnextv2	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
cpm	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
cpmant	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
csm	Update `CsmForConditionalGenerationIntegrationTest` (#38424 )	2025-05-28 10:20:43 +02:00
ctrl	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
cvt	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
d_fine	enable misc test cases on XPU (#38852 )	2025-06-18 09:20:49 +02:00
dab_detr	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
dac	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
data2vec	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
dbrx	Refactor DBRX tests to use CausalLMModelTest base classes (#38475 )	2025-06-13 16:22:12 +01:00
deberta	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
deberta_v2	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
decision_transformer	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
deepseek_v3	Skip sdpa dispatch on flash test due to unsupported head dims (#39010 )	2025-06-24 20:16:56 +02:00
deformable_detr	🚨🚨 Fix initialization of Mask2Former (#38864 )	2025-06-18 09:46:22 +02:00
deit	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
depth_anything	Skip some export tests on torch 2.7 (#38677 )	2025-06-12 12:47:15 +02:00
depth_pro	Fix more flaky `test_initialization` (#38932 )	2025-06-20 17:28:32 +02:00
detr	enable more test cases on xpu (#38572 )	2025-06-06 09:29:51 +02:00
dia	Add Dia model (#38405 )	2025-06-26 11:04:23 +00:00
diffllama	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
dinat	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
dinov2	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
dinov2_with_registers	Fix more flaky `test_initialization` (#38932 )	2025-06-20 17:28:32 +02:00
distilbert	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
dit	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
donut	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
dots1	[Model] add dots1 (#38143 )	2025-06-25 11:38:25 +02:00
dpr	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
dpt	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
efficientnet	Add EfficientNet Image PreProcessor (#37055 )	2025-04-16 21:59:24 +02:00
electra	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
emu3	update emu3 test (#38543 )	2025-06-03 11:02:01 +02:00
encodec	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
encoder_decoder	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
ernie	Remove old code for PyTorch, Accelerator and tokenizers (#37234 )	2025-04-10 20:54:21 +02:00
esm	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
falcon	🚨 🚨 Inherited CausalLM Tests (#37590 )	2025-05-23 18:29:31 +01:00
falcon_h1	[Falcon H1] Fix slow path forward pass (#38320 )	2025-05-26 15:30:35 +02:00
falcon_mamba	Fix `FalconMambaIntegrationTests` (#38566 )	2025-06-19 13:50:33 +02:00
fastspeech2_conformer	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
flaubert	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
flava	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
fnet	🚨 rm already deprecated pad_to_max_length arg (#37617 )	2025-05-01 15:21:55 +02:00
focalnet	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
fsmt	Fix `fsmt` tests (#38904 )	2025-06-19 10:56:34 +02:00
funnel	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
fuyu	🔴 [VLM] Add base model without head (#37033 )	2025-05-07 17:47:51 +02:00
gemma	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
gemma2	Unbreak optimum-executorch (#38646 )	2025-06-13 11:13:32 +02:00
gemma3	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
gemma3n	Gemma 3n (#39059 )	2025-06-26 17:55:47 +02:00
git	🚨 🚨 Setup -> setupclass conversion (#37282 )	2025-04-08 17:15:37 +01:00
glm	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
glm4	enable misc test cases on XPU (#38852 )	2025-06-18 09:20:49 +02:00
glm4v	GLM-4.1V Model support (#38431 )	2025-06-25 10:43:05 +02:00
glpn	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
got_ocr2	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
gpt_bigcode	Remove head mask in generative models (#35786 )	2025-05-15 10:44:19 +02:00
gpt_neo	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
gpt_neox	🚨 🚨 Inherited CausalLM Tests (#37590 )	2025-05-23 18:29:31 +01:00
gpt_neox_japanese	Remove old code for PyTorch, Accelerator and tokenizers (#37234 )	2025-04-10 20:54:21 +02:00
gpt_sw3	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
gpt2	[tests] remove TF tests (uses of `require_tf`) (#38944 )	2025-06-25 17:29:10 +00:00
gptj	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
granite	switch to device agnostic device calling for test cases (#38247 )	2025-05-26 10:18:53 +02:00
granite_speech	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
granitemoe	switch to device agnostic device calling for test cases (#38247 )	2025-05-26 10:18:53 +02:00
granitemoehybrid	switch to device agnostic device calling for test cases (#38247 )	2025-05-26 10:18:53 +02:00
granitemoeshared	switch to device agnostic device calling for test cases (#38247 )	2025-05-26 10:18:53 +02:00
grounding_dino	enable more test cases on xpu (#38572 )	2025-06-06 09:29:51 +02:00
groupvit	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
helium	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
herbert	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
hgnet_v2	Add D-FINE Model into Transformers (#36261 )	2025-04-29 12:17:55 +01:00
hiera	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
hubert	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
ibert	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
idefics	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
idefics2	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
idefics3	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
ijepa	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
imagegpt	[test] update `test_past_key_values_format` (#37614 )	2025-04-22 11:07:34 +01:00
informer	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
instructblip	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
instructblipvideo	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
internvl	Internvl fix (#38946 )	2025-06-26 13:44:59 +02:00
jamba	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
janus	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
jetmoe	🚨 🚨 Inherited CausalLM Tests (#37590 )	2025-05-23 18:29:31 +01:00
kosmos2	[VLMs] support attention backends (#37576 )	2025-05-08 18:18:54 +02:00
kyutai_speech_to_text	add _keep_in_fp32_modules_strict (#39058 )	2025-06-26 13:55:28 +00:00
layoutlm	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
layoutlmv2	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
layoutlmv3	[tests] remove TF tests (uses of `require_tf`) (#38944 )	2025-06-25 17:29:10 +00:00
layoutxlm	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
led	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
levit	Add Fast LeViT Processor (#37154 )	2025-04-14 17:07:36 +02:00
lightglue	Add LightGlue model (#31718 )	2025-06-17 18:10:23 +02:00
lilt	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
llama	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
llama4	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
llava	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
llava_next	Fix `llava_next` tests (#38813 )	2025-06-13 15:19:41 +02:00
llava_next_video	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
llava_onevision	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
longformer	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
longt5	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
luke	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
lxmert	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
m2m_100	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 )	2025-05-22 17:12:58 +02:00
mamba	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
mamba2	align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284 )	2025-06-19 11:48:23 +00:00
marian	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
markuplm	No more Tuple, List, Dict (#38797 )	2025-06-17 19:37:18 +01:00
mask2former	🚨🚨 Fix initialization of Mask2Former (#38864 )	2025-06-18 09:46:22 +02:00
maskformer	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
mbart	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
mbart50	Use `lru_cache` for tokenization tests (#36818 )	2025-03-28 15:09:35 +01:00
megatron_bert	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
megatron_gpt2	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
mgp_str	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
mimi	Add kyutai stt (#38909 )	2025-06-24 18:01:15 +02:00
minimax	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
mistral	fix `mistral` and `mistral3` tests (#38978 )	2025-06-23 17:07:18 +02:00
mistral3	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
mixtral	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
mlcd	Add MLCD model (#36182 )	2025-04-15 11:33:09 +01:00
mllama	Fix `mllama` (#38704 )	2025-06-12 16:15:35 +02:00
mluke	Fix typos in strings and comments (#37799 )	2025-04-28 11:39:11 +01:00
mobilebert	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
mobilenet_v1	Add Fast Image Processor for MobileNetV1 (#37111 )	2025-04-23 15:55:41 -04:00
mobilenet_v2	Add Fast Mobilenet-V2 Processor (#37113 )	2025-04-14 17:08:47 +02:00
mobilevit	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
mobilevitv2	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
modernbert	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
moonshine	Skip torchscript tests for 2 models (#38643 )	2025-06-06 20:17:37 +02:00
moshi	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
mpnet	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
mpt	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
mra	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
mt5	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
musicgen	enable misc test cases on XPU (#38852 )	2025-06-18 09:20:49 +02:00
musicgen_melody	enable misc test cases on XPU (#38852 )	2025-06-18 09:20:49 +02:00
mvp	Fix typos in strings and comments (#37799 )	2025-04-28 11:39:11 +01:00
myt5	🚨 🚨 Setup -> setupclass conversion (#37282 )	2025-04-08 17:15:37 +01:00
nemotron	switch to device agnostic device calling for test cases (#38247 )	2025-05-26 10:18:53 +02:00
nllb	Use `lru_cache` for tokenization tests (#36818 )	2025-03-28 15:09:35 +01:00
nllb_moe	Fix typos in strings and comments (#37799 )	2025-04-28 11:39:11 +01:00
nougat	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
nystromformer	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
olmo	Unbreak optimum-executorch (#38646 )	2025-06-13 11:13:32 +02:00
olmo2	Make HF implementation match original OLMo 2 models for lower precisions (#38131 )	2025-05-19 15:35:23 +02:00
olmoe	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
omdet_turbo	enable more test cases on xpu (#38572 )	2025-06-06 09:29:51 +02:00
oneformer	Fix OneFormer integration test (#38016 )	2025-05-12 16:02:41 +02:00
openai	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
opt	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
owlv2	🚨 🚨 Setup -> setupclass conversion (#37282 )	2025-04-08 17:15:37 +01:00
owlvit	Add Fast owlvit Processor (#37164 )	2025-04-14 17:58:09 +02:00
paligemma	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
paligemma2	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
patchtsmixer	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 )	2025-05-22 17:12:58 +02:00
patchtst	Force torch>=2.6 with torch.load to avoid vulnerability issue (#37785 )	2025-04-25 16:57:09 +02:00
pegasus	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
pegasus_x	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
perceiver	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
persimmon	🚨 🚨 Inherited CausalLM Tests (#37590 )	2025-05-23 18:29:31 +01:00
phi	🚨 🚨 Inherited CausalLM Tests (#37590 )	2025-05-23 18:29:31 +01:00
phi3	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
phi4_multimodal	Fix `phi4_multimodal` tests (#38816 )	2025-06-18 09:39:17 +02:00
phimoe	Fix MoE gradient test (#38438 )	2025-05-28 16:44:20 +01:00
phobert	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
pix2struct	Fix more flaky `test_initialization` (#38932 )	2025-06-20 17:28:32 +02:00
pixtral	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
plbart	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 )	2025-05-22 17:12:58 +02:00
poolformer	Add Fast Image Processor for PoolFormer (#37182 )	2025-04-23 15:55:33 -04:00
pop2piano	[tests] remove TF tests (uses of `require_tf`) (#38944 )	2025-06-25 17:29:10 +00:00
prompt_depth_anything	Skip some export tests on torch 2.7 (#38677 )	2025-06-12 12:47:15 +02:00
prophetnet	[generation] bring back tests on vision models (#38603 )	2025-06-06 08:23:15 +00:00
pvt	Add Fast PVT Processor (#37204 )	2025-04-23 15:55:20 -04:00
pvt_v2	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
qwen2	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
qwen2_5_omni	Fix `qwen_2_5 omni` (#38658 )	2025-06-12 14:43:54 +02:00
qwen2_5_vl	Fix `qwen2_5_vl` tests (#38845 )	2025-06-17 10:55:24 +02:00
qwen2_audio	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
qwen2_moe	Fix MoE gradient test (#38438 )	2025-05-28 16:44:20 +01:00
qwen2_vl	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
qwen3	Fix `qwen3` tests (#38862 )	2025-06-17 15:21:36 +02:00
qwen3_moe	Fix `qwen3_moe` tests (#38865 )	2025-06-18 14:36:03 +02:00
rag	Fix `rag` (#38585 )	2025-06-23 17:42:46 +02:00
recurrent_gemma	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
reformer	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
regnet	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
rembert	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
resnet	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
roberta	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
roberta_prelayernorm	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
roc_bert	Remove old code for PyTorch, Accelerator and tokenizers (#37234 )	2025-04-10 20:54:21 +02:00
roformer	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
rt_detr	enable more test cases on xpu (#38572 )	2025-06-06 09:29:51 +02:00
rt_detr_v2	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
rwkv	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
sam	[tests] remove TF tests (uses of `require_tf`) (#38944 )	2025-06-25 17:29:10 +00:00
sam_hq	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
seamless_m4t	[seamless_m4t] Skip some tests when speech is not available (#38430 )	2025-06-02 09:17:28 +00:00
seamless_m4t_v2	[seamless_m4t] Skip some tests when speech is not available (#38430 )	2025-06-02 09:17:28 +00:00
segformer	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
seggpt	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
sew	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
sew_d	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
shieldgemma2	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
siglip	[tests] expand flex-attn test for vision models (#38434 )	2025-06-03 07:40:44 +00:00
siglip2	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
smollm3	Add SmolLM3 (#38755 )	2025-06-25 15:12:15 +00:00
smolvlm	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
speech_encoder_decoder	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
speech_to_text	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
speecht5	[generation] bring back tests on vision models (#38603 )	2025-06-06 08:23:15 +00:00
splinter	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
squeezebert	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
stablelm	🚨 🚨 Inherited CausalLM Tests (#37590 )	2025-05-23 18:29:31 +01:00
starcoder2	🚨 🚨 Inherited CausalLM Tests (#37590 )	2025-05-23 18:29:31 +01:00
superglue	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
superpoint	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
swiftformer	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
swin	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
swin2sr	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
swinv2	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
switch_transformers	[generation] bring back tests on vision models (#38603 )	2025-06-06 08:23:15 +00:00
t5	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
t5gemma	Encoder-Decoder Gemma (#38332 )	2025-06-25 09:05:10 +00:00
table_transformer	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
tapas	[tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051 )	2025-06-26 16:25:00 +01:00
textnet	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
time_series_transformer	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
timesfm	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
timesformer	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
timm_backbone	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
timm_wrapper	Add kwargs for timm.create_model in TimmWrapper (#38860 )	2025-06-20 12:00:09 +00:00
trocr	🚨 🚨 Setup -> setupclass conversion (#37282 )	2025-04-08 17:15:37 +01:00
tvp	Add Optional to remaining types (#37808 )	2025-04-28 14:20:45 +01:00
udop	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
umt5	[generation] bring back tests on vision models (#38603 )	2025-06-06 08:23:15 +00:00
unispeech	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
unispeech_sat	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
univnet	No more Tuple, List, Dict (#38797 )	2025-06-17 19:37:18 +01:00
upernet	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
video_llava	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
videomae	[tests] expand flex-attn test for vision models (#38434 )	2025-06-03 07:40:44 +00:00
vilt	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
vipllava	[tests] expand flex-attn test for vision models (#38434 )	2025-06-03 07:40:44 +00:00
vision_encoder_decoder	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
vision_text_dual_encoder	[tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051 )	2025-06-26 16:25:00 +01:00
visual_bert	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
vit	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
vit_mae	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
vit_msn	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
vitdet	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
vitmatte	Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157 )	2025-06-23 14:17:25 +00:00
vitpose	Skip some export tests on torch 2.7 (#38677 )	2025-06-12 12:47:15 +02:00
vitpose_backbone	Remove dead protected imports (#38980 )	2025-06-23 13:44:50 +02:00
vits	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
vivit	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
vjepa2	Add V-JEPA for video classification model (#38788 )	2025-06-13 17:56:15 +01:00
wav2vec2	[tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051 )	2025-06-26 16:25:00 +01:00
wav2vec2_bert	🚨 🚨 Setup -> setupclass conversion (#37282 )	2025-04-08 17:15:37 +01:00
wav2vec2_conformer	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
wav2vec2_phoneme	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
wav2vec2_with_lm	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
wavlm	Remove script datasets in tests (#38940 )	2025-06-25 14:31:20 +00:00
whisper	[tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051 )	2025-06-26 16:25:00 +01:00
x_clip	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
xglm	Expectation fixes and added AMD expectations (#38729 )	2025-06-13 16:14:58 +02:00
xlm	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
xlm_roberta	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
xlm_roberta_xl	Remove old code for PyTorch, Accelerator and tokenizers (#37234 )	2025-04-10 20:54:21 +02:00
xlnet	Deprecate TF + JAX (#38758 )	2025-06-11 17:28:06 +01:00
xmod	Remove old code for PyTorch, Accelerator and tokenizers (#37234 )	2025-04-10 20:54:21 +02:00
yolos	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 )	2025-05-23 17:17:38 +02:00
yoso	Use Python 3.9 syntax in tests (#37343 )	2025-04-08 14:12:08 +02:00
zamba	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
zamba2	Remove all traces of `low_cpu_mem_usage` (#38792 )	2025-06-12 16:39:33 +02:00
zoedepth	Skip some export tests on torch 2.7 (#38677 )	2025-06-12 12:47:15 +02:00
__init__.py	Move test model folders (#17034 )	2022-05-03 14:42:02 +02:00