transformers/utils/check_config_attributes.py
Arthur 19d58d31f1
Add MLLama (#33703)
* current changes

* nit

* Add cross_attenttion_mask to processor

* multi-image fixed

* Add cross_attenttion_mask to processor

* cross attn works in all cases

* WIP refactoring function for image processor

* WIP refactoring image processor functions

* Refactor preprocess to use global loops instead of list nested list comps

* Docstrings

* Add channels unification

* fix dtype issues

* Update docsrings and format

* Consistent max_image_tiles

* current script

* updates

* Add convert to rgb

* Add image processor tests

* updates!

* update

* god damn it I am dumb sometimes

* Precompute aspect ratios

* now this works, full match

* fix 😉

* nits

* style

* fix model and conversion

* nit

* nit

* kinda works

* hack for sdpa non-contiguous bias

* nits here and there

* latest c hanges

* merge?

* run forward

* Add aspect_ratio_mask

* vision attention mask

* update script and config variable names

* nit

* nits

* be able to load

* style

* nits

* there

* nits

* make forward run

* small update

* enable generation multi-turn

* nit

* nit

* Clean up a bit for errors and typos

* A bit more constant fixes

* 90B keys and shapes match

* Fix for 11B model

* Fixup, remove debug part

* Docs

* Make max_aspect_ratio_id to be minimal

* Update image processing code to match new implementation

* Adjust conversion for final checkpoint state

* Change dim in repeat_interleave (accordig to meta code)

* tmp fix for num_tiles

* Fix for conversion (gate<->up, q/k_proj rope permute)

* nits

* codestyle

* Vision encoder fixes

* pass cross attn mask further

* Refactor aspect ratio mask

* Disable text-only generation

* Fix cross attention layers order, remove q/k norm rotation for cross atention layers

* Refactor gated position embeddings

* fix bugs but needs test with new weights

* rope scaling should be llama3

* Fix rope scaling name

* Remove debug for linear layer

* fix copies

* Make mask prepare private func

* Remove linear patch embed

* Make precomputed embeddings as nn.Embedding module

* MllamaPrecomputedAspectRatioEmbedding with config init

* Remove unused self.output_dim

* nit, intermediate layers

* Rename ln and pos_embed

* vision_chunk_size -> image_size

* return_intermediate -> intermediate_layers_indices

* vision_input_dim -> hidden_size

* Fix copied from statements

* fix most tests

* Fix more copied from

* layer_id->layer_idx

* Comment

* Fix tests for processor

* Copied from for _prepare_4d_causal_attention_mask_with_cache_position

* Style fix

* Add MllamaForCausalLM

* WIP fixing tests

* Remove duplicated layers

* Remove dummy file

* Fix style

* Fix consistency

* Fix some TODOs

* fix language_model instantiation, add docstring

* Move docstring, remove todos for precomputed embeds (we cannot init them properly)

* Add initial docstrings

* Fix

* fix some tests

* lets skip these

* nits, remove print, style

* Add one more copied from

* Improve test message

* Make validate func private

* Fix dummy objects

* Refactor `data_format` a bit + add comment

* typos/nits

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* fix dummy objects and imports

* Add chat template config json

* remove num_kv_heads from vision attention

* fix

* move some commits and add more tests

* fix test

* Remove `update_key_name` from modeling utils

* remove num-kv-heads again

* some prelimiary docs

* Update chat template + tests

* nit, conversion script max_num_tiles from params

* Fix warning for text-only generation

* Update conversion script for instruct models

* Update chat template in converstion + test

* add tests for CausalLM model

* model_max_length, avoid null chat_template

* Refactor conversion script

* Fix forward

* Fix integration tests

* Refactor vision config + docs

* Fix default

* Refactor text config

* Doc fixes

* Remove unused args, fix docs example

* Squashed commit of the following:

commit b51ce5a2efffbecdefbf6fc92ee87372ec9d8830
Author: qubvel <qubvel@gmail.com>
Date:   Wed Sep 18 13:39:15 2024 +0000

    Move model + add output hidden states and output attentions

* Fix num_channels

* Add mllama text and mllama vision models

* Fixing repo consistency

* Style fix

* Fixing repo consistency

* Fixing unused config params

* Fix failed tests after refactoring

* hidden_activation -> hidden_act  for text mlp

* Remove from_pretrained from sub-configs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/convert_mllama_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Reuse lambda in conversion script

* Remove run.py

* Update docs/source/en/model_doc/mllama.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mllama/processing_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused LlamaTokenizerFast

* Fix logging

* Refactor gating

* Remove cycle for collecting intermediate states

* Refactor text-only check, add integration test for text-only

* Revert from pretrained to configs

* Fix example

* Add auto `bos_token` adding in processor

* Fix tips

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Enable supports_gradient_checkpointing model flag

* add eager/sdpa options

* don't skip attn tests and bring back GC skips (did i really remove those?)

* Fix signature, but get error with None gradient

* Fix output attention tests

* Disable GC back

* Change no split modules

* Fix dropout

* Style

* Add Mllama to sdpa list

* Add post init for vision model

* Refine config for MllamaForCausalLMModelTest and skipped tests for CausalLM model

* if skipped, say it, don't pass

* Clean vision tester config

* Doc for args

* Update tests/models/mllama/test_modeling_mllama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add cross_attention_mask to test

* typehint

* Remove todo

* Enable gradient checkpointing

* Docstring

* Style

* Fixing and skipping some tests for new cache

* Mark flaky test

* Skip `test_sdpa_can_compile_dynamic` test

* Fixing some offload tests

* Add direct GenerationMixin inheritance

* Remove unused code

* Add initializer_range to vision config

* update the test to make sure we show if split

* fix gc?

* Fix repo consistency

* Undo modeling utils debug changes

* Fix link

* mllama -> Mllama

* [mllama] -> [Mllama]

* Enable compile test for CausalLM model (text-only)

* Fix TextModel prefix

* Update doc

* Docs for forward, type hints, and vision model prefix

* make sure to reset

* fix init

* small script refactor and styling

* nit

* updates!

* some nits

* Interpolate embeddings for 560 size and update integration tests

* nit

* does not suppor static cache!

* update

* fix

* nit2

* this?

* Fix conversion

* Style

* 4x memory improvement with image cache AFAIK

* Token decorator for tests

* Skip failing tests

* update processor errors

* fix split issues

* style

* weird

* style

* fix failing tests

* update

* nit fixing the whisper tests

* fix path

* update

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: pavel <ubuntu@ip-10-90-0-11.ec2.internal>
Co-authored-by: qubvel <qubvel@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-09-25 19:56:25 +02:00

358 lines
15 KiB
Python

# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import inspect
import os
import re
from transformers.configuration_utils import PretrainedConfig
from transformers.utils import direct_transformers_import
# All paths are set with the intent you should run this script from the root of the repo with the command
# python utils/check_config_docstrings.py
PATH_TO_TRANSFORMERS = "src/transformers"
# This is to make sure the transformers module imported is the one in the repo.
transformers = direct_transformers_import(PATH_TO_TRANSFORMERS)
CONFIG_MAPPING = transformers.models.auto.configuration_auto.CONFIG_MAPPING
SPECIAL_CASES_TO_ALLOW = {
# 'max_position_embeddings' is not used in modeling file, but needed for eval frameworks like Huggingface's lighteval (https://github.com/huggingface/lighteval/blob/af24080ea4f16eaf1683e353042a2dfc9099f038/src/lighteval/models/base_model.py#L264).
# periods and offsers are not used in modeling file, but used in the configuration file to define `layers_block_type` and `layers_num_experts`.
"JambaConfig": [
"max_position_embeddings",
"attn_layer_offset",
"attn_layer_period",
"expert_layer_offset",
"expert_layer_period",
],
"Qwen2Config": ["use_sliding_window"],
"Qwen2MoeConfig": ["use_sliding_window"],
"Qwen2VLConfig": ["use_sliding_window"],
# `cache_implementation` should be in the default generation config, but we don't yet support per-model
# generation configs (TODO joao)
"Gemma2Config": ["tie_word_embeddings", "cache_implementation"],
# used to compute the property `self.chunk_length`
"EncodecConfig": ["overlap"],
# used to compute the property `self.layers_block_type`
"RecurrentGemmaConfig": ["block_types"],
# used as in the config to define `intermediate_size`
"MambaConfig": ["expand"],
# used as in the config to define `intermediate_size`
"FalconMambaConfig": ["expand"],
# used as `self.bert_model = BertModel(config, ...)`
"DPRConfig": True,
"FuyuConfig": True,
# not used in modeling files, but it's an important information
"FSMTConfig": ["langs"],
# used internally in the configuration class file
"GPTNeoConfig": ["attention_types"],
# used internally in the configuration class file
"EsmConfig": ["is_folding_model"],
# used during training (despite we don't have training script for these models yet)
"Mask2FormerConfig": ["ignore_value"],
# `ignore_value` used during training (despite we don't have training script for these models yet)
# `norm` used in conversion script (despite not using in the modeling file)
"OneFormerConfig": ["ignore_value", "norm"],
# used internally in the configuration class file
"T5Config": ["feed_forward_proj"],
# used internally in the configuration class file
# `tokenizer_class` get default value `T5Tokenizer` intentionally
"MT5Config": ["feed_forward_proj", "tokenizer_class"],
"UMT5Config": ["feed_forward_proj", "tokenizer_class"],
# used internally in the configuration class file
"LongT5Config": ["feed_forward_proj"],
# used internally in the configuration class file
"Pop2PianoConfig": ["feed_forward_proj"],
# used internally in the configuration class file
"SwitchTransformersConfig": ["feed_forward_proj"],
# having default values other than `1e-5` - we can't fix them without breaking
"BioGptConfig": ["layer_norm_eps"],
# having default values other than `1e-5` - we can't fix them without breaking
"GLPNConfig": ["layer_norm_eps"],
# having default values other than `1e-5` - we can't fix them without breaking
"SegformerConfig": ["layer_norm_eps"],
# having default values other than `1e-5` - we can't fix them without breaking
"CvtConfig": ["layer_norm_eps"],
# having default values other than `1e-5` - we can't fix them without breaking
"PerceiverConfig": ["layer_norm_eps"],
# used internally to calculate the feature size
"InformerConfig": ["num_static_real_features", "num_time_features"],
# used internally to calculate the feature size
"TimeSeriesTransformerConfig": ["num_static_real_features", "num_time_features"],
# used internally to calculate the feature size
"AutoformerConfig": ["num_static_real_features", "num_time_features"],
# used internally to calculate `mlp_dim`
"SamVisionConfig": ["mlp_ratio"],
# For (head) training, but so far not implemented
"ClapAudioConfig": ["num_classes"],
# Not used, but providing useful information to users
"SpeechT5HifiGanConfig": ["sampling_rate"],
# used internally in the configuration class file
"UdopConfig": ["feed_forward_proj"],
# Actually used in the config or generation config, in that case necessary for the sub-components generation
"SeamlessM4TConfig": [
"max_new_tokens",
"t2u_max_new_tokens",
"t2u_decoder_attention_heads",
"t2u_decoder_ffn_dim",
"t2u_decoder_layers",
"t2u_encoder_attention_heads",
"t2u_encoder_ffn_dim",
"t2u_encoder_layers",
"t2u_max_position_embeddings",
],
# Actually used in the config or generation config, in that case necessary for the sub-components generation
"SeamlessM4Tv2Config": [
"max_new_tokens",
"t2u_decoder_attention_heads",
"t2u_decoder_ffn_dim",
"t2u_decoder_layers",
"t2u_encoder_attention_heads",
"t2u_encoder_ffn_dim",
"t2u_encoder_layers",
"t2u_max_position_embeddings",
"t2u_variance_pred_dropout",
"t2u_variance_predictor_embed_dim",
"t2u_variance_predictor_hidden_dim",
"t2u_variance_predictor_kernel_size",
],
"MllamaTextConfig": [
"initializer_range",
],
"MllamaVisionConfig": [
"initializer_range",
"supported_aspect_ratios",
],
}
# TODO (ydshieh): Check the failing cases, try to fix them or move some cases to the above block once we are sure
SPECIAL_CASES_TO_ALLOW.update(
{
"CLIPSegConfig": True,
"DeformableDetrConfig": True,
"DinatConfig": True,
"DonutSwinConfig": True,
"FastSpeech2ConformerConfig": True,
"FSMTConfig": True,
"LayoutLMv2Config": True,
"MaskFormerSwinConfig": True,
"MT5Config": True,
# For backward compatibility with trust remote code models
"MptConfig": True,
"MptAttentionConfig": True,
"OneFormerConfig": True,
"PerceiverConfig": True,
"RagConfig": True,
"SpeechT5Config": True,
"SwinConfig": True,
"Swin2SRConfig": True,
"Swinv2Config": True,
"SwitchTransformersConfig": True,
"TableTransformerConfig": True,
"TapasConfig": True,
"UniSpeechConfig": True,
"UniSpeechSatConfig": True,
"WavLMConfig": True,
"WhisperConfig": True,
# TODO: @Arthur (for `alignment_head` and `alignment_layer`)
"JukeboxPriorConfig": True,
# TODO: @Younes (for `is_decoder`)
"Pix2StructTextConfig": True,
"IdeficsConfig": True,
"IdeficsVisionConfig": True,
"IdeficsPerceiverConfig": True,
}
)
def check_attribute_being_used(config_class, attributes, default_value, source_strings):
"""Check if any name in `attributes` is used in one of the strings in `source_strings`
Args:
config_class (`type`):
The configuration class for which the arguments in its `__init__` will be checked.
attributes (`List[str]`):
The name of an argument (or attribute) and its variant names if any.
default_value (`Any`):
A default value for the attribute in `attributes` assigned in the `__init__` of `config_class`.
source_strings (`List[str]`):
The python source code strings in the same modeling directory where `config_class` is defined. The file
containing the definition of `config_class` should be excluded.
"""
attribute_used = False
for attribute in attributes:
for modeling_source in source_strings:
# check if we can find `config.xxx`, `getattr(config, "xxx", ...)` or `getattr(self.config, "xxx", ...)`
if (
f"config.{attribute}" in modeling_source
or f'getattr(config, "{attribute}"' in modeling_source
or f'getattr(self.config, "{attribute}"' in modeling_source
):
attribute_used = True
# Deal with multi-line cases
elif (
re.search(
rf'getattr[ \t\v\n\r\f]*\([ \t\v\n\r\f]*(self\.)?config,[ \t\v\n\r\f]*"{attribute}"',
modeling_source,
)
is not None
):
attribute_used = True
# `SequenceSummary` is called with `SequenceSummary(config)`
elif attribute in [
"summary_type",
"summary_use_proj",
"summary_activation",
"summary_last_dropout",
"summary_proj_to_labels",
"summary_first_dropout",
]:
if "SequenceSummary" in modeling_source:
attribute_used = True
if attribute_used:
break
if attribute_used:
break
# common and important attributes, even if they do not always appear in the modeling files
attributes_to_allow = [
"bos_index",
"eos_index",
"pad_index",
"unk_index",
"mask_index",
"image_size",
"use_cache",
"out_features",
"out_indices",
"sampling_rate",
# backbone related arguments passed to load_backbone
"use_pretrained_backbone",
"backbone",
"backbone_config",
"use_timm_backbone",
"backbone_kwargs",
]
attributes_used_in_generation = ["encoder_no_repeat_ngram_size"]
# Special cases to be allowed
case_allowed = True
if not attribute_used:
case_allowed = False
for attribute in attributes:
# Allow if the default value in the configuration class is different from the one in `PretrainedConfig`
if attribute in ["is_encoder_decoder"] and default_value is True:
case_allowed = True
elif attribute in ["tie_word_embeddings"] and default_value is False:
case_allowed = True
# Allow cases without checking the default value in the configuration class
elif attribute in attributes_to_allow + attributes_used_in_generation:
case_allowed = True
elif attribute.endswith("_token_id"):
case_allowed = True
# configuration class specific cases
if not case_allowed:
allowed_cases = SPECIAL_CASES_TO_ALLOW.get(config_class.__name__, [])
case_allowed = allowed_cases is True or attribute in allowed_cases
return attribute_used or case_allowed
def check_config_attributes_being_used(config_class):
"""Check the arguments in `__init__` of `config_class` are used in the modeling files in the same directory
Args:
config_class (`type`):
The configuration class for which the arguments in its `__init__` will be checked.
"""
# Get the parameters in `__init__` of the configuration class, and the default values if any
signature = dict(inspect.signature(config_class.__init__).parameters)
parameter_names = [x for x in list(signature.keys()) if x not in ["self", "kwargs"]]
parameter_defaults = [signature[param].default for param in parameter_names]
# If `attribute_map` exists, an attribute can have different names to be used in the modeling files, and as long
# as one variant is used, the test should pass
reversed_attribute_map = {}
if len(config_class.attribute_map) > 0:
reversed_attribute_map = {v: k for k, v in config_class.attribute_map.items()}
# Get the path to modeling source files
config_source_file = inspect.getsourcefile(config_class)
model_dir = os.path.dirname(config_source_file)
# Let's check against all frameworks: as long as one framework uses an attribute, we are good.
modeling_paths = [os.path.join(model_dir, fn) for fn in os.listdir(model_dir) if fn.startswith("modeling_")]
# Get the source code strings
modeling_sources = []
for path in modeling_paths:
if os.path.isfile(path):
with open(path, encoding="utf8") as fp:
modeling_sources.append(fp.read())
unused_attributes = []
for config_param, default_value in zip(parameter_names, parameter_defaults):
# `attributes` here is all the variant names for `config_param`
attributes = [config_param]
# some configuration classes have non-empty `attribute_map`, and both names could be used in the
# corresponding modeling files. As long as one of them appears, it is fine.
if config_param in reversed_attribute_map:
attributes.append(reversed_attribute_map[config_param])
if not check_attribute_being_used(config_class, attributes, default_value, modeling_sources):
unused_attributes.append(attributes[0])
return sorted(unused_attributes)
def check_config_attributes():
"""Check the arguments in `__init__` of all configuration classes are used in python files"""
configs_with_unused_attributes = {}
for _config_class in list(CONFIG_MAPPING.values()):
# Skip deprecated models
if "models.deprecated" in _config_class.__module__:
continue
# Some config classes are not in `CONFIG_MAPPING` (e.g. `CLIPVisionConfig`, `Blip2VisionConfig`, etc.)
config_classes_in_module = [
cls
for name, cls in inspect.getmembers(
inspect.getmodule(_config_class),
lambda x: inspect.isclass(x)
and issubclass(x, PretrainedConfig)
and inspect.getmodule(x) == inspect.getmodule(_config_class),
)
]
for config_class in config_classes_in_module:
unused_attributes = check_config_attributes_being_used(config_class)
if len(unused_attributes) > 0:
configs_with_unused_attributes[config_class.__name__] = unused_attributes
if len(configs_with_unused_attributes) > 0:
error = "The following configuration classes contain unused attributes in the corresponding modeling files:\n"
for name, attributes in configs_with_unused_attributes.items():
error += f"{name}: {attributes}\n"
raise ValueError(error)
if __name__ == "__main__":
check_config_attributes()