Add support for fast image processors in add-new-model-like CLI (#36313)

* add support for fast image processors in add-new-model-like

* fix header not found add-fast-image-processor-cli

* Encourage adding fast image processor

* nit

* start improve doc

* update docs

* make requested modifs
This commit is contained in:
Yoni Gozlan 2025-03-13 14:16:37 -04:00 committed by GitHub
parent 48ef468c74
commit 69bc848480
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 157 additions and 18 deletions

View File

@ -476,7 +476,7 @@ When both implementations produce the same output, verify the outputs are within
torch.allclose(original_output, output, atol=1e-3)
```
This is typically the most difficult part of the process. Congratulations if you've made it this far!
This is typically the most difficult part of the process. Congratulations if you've made it this far!
And if you're stuck or struggling with this step, don't hesitate to ask for help on your pull request.
@ -541,6 +541,48 @@ input_ids = tokenizer(input_str).input_ids
When both implementations have the same `input_ids`, add a tokenizer test file. This file is analogous to the modeling test files. The tokenizer test files should contain a couple of hardcoded integration tests.
## Implement image processor
> [!TIP]
> Fast image processors use the [torchvision](https://pytorch.org/vision/stable/index.html) library and can perform image processing on the GPU, significantly improving processing speed.
> We recommend adding a fast image processor ([`BaseImageProcessorFast`]) in addition to the "slow" image processor ([`BaseImageProcessor`]) to provide users with the best performance. Feel free to tag [@yonigozlan](https://github.com/yonigozlan) for help adding a [`BaseImageProcessorFast`].
While this example doesn't include an image processor, you may need to implement one if your model requires image inputs. The image processor is responsible for converting images into a format suitable for your model. Before implementing a new one, check whether an existing image processor in the Transformers library can be reused, as many models share similar image processing techniques. Note that you can also use [modular](./modular_transformers) for image processors to reuse existing components.
If you do need to implement a new image processor, refer to an existing image processor to understand the expected structure. Slow image processors ([`BaseImageProcessor`]) and fast image processors ([`BaseImageProcessorFast`]) are designed differently, so make sure you follow the correct structure based on the processor type you're implementing.
Run the following command (only if you haven't already created the fast image processor with the `transformers-cli add-new-model-like` command) to generate the necessary imports and to create a prefilled template for the fast image processor. Modify the template to fit your model.
```bash
transformers-cli add-fast-image-processor --model-name your_model_name
```
This command will generate the necessary imports and provide a pre-filled template for the fast image processor. You can then modify it to fit your model's needs.
Add tests for the image processor in `tests/models/your_model_name/test_image_processing_your_model_name.py`. These tests should be similar to those for other image processors and should verify that the image processor correctly handles image inputs. If your image processor includes unique features or processing methods, ensure you add specific tests for those as well.
## Implement processor
If your model accepts multiple modalities, like text and images, you need to add a processor. The processor centralizes the preprocessing of different modalities before passing them to the model.
The processor should call the appropriate modality-specific processors within its `__call__` function to handle each type of input correctly. Be sure to check existing processors in the library to understand their expected structure. Transformers uses the following convention in the `__call__` function signature.
```python
def __call__(
self,
images: ImageInput = None,
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
audio=None,
videos=None,
**kwargs: Unpack[YourModelProcessorKwargs],
) -> BatchFeature:
...
```
`YourModelProcessorKwargs` is a `TypedDict` that includes all the typical processing arguments and any extra arguments a specific processor may require.
Add tests for the processor in `tests/models/your_model_name/test_processor_your_model_name.py`. These tests should be similar to those for other processors and should verify that the processor correctly handles the different modalities.
## Integration tests
Now that you have a model and tokenizer, add end-to-end integration tests for the model and tokenizer to `tests/models/brand_new_llama/test_modeling_brand_new_llama.py`.
@ -620,4 +662,4 @@ There are four timelines for model additions depending on the model contributor
- **Hub-first release**: Transformers [remote-code](./models#custom-models) feature allows Transformers-based projects to be shared directly on the Hub. This is a good option if you don't have the bandwidth to add a model directly to Transformers.
If a model ends up being very popular, then it's very likely that we'll integrate it in Transformers ourselves to enable better support (documentation, maintenance, optimization, etc.) for it. A Hub-first release is the most frictionless way to add a model.
If a model ends up being very popular, then it's very likely that we'll integrate it in Transformers ourselves to enable better support (documentation, maintenance, optimization, etc.) for it. A Hub-first release is the most frictionless way to add a model.

View File

@ -414,11 +414,35 @@ def get_fast_image_processing_content_header(content: str) -> str:
"""
Get the header of the slow image processor file.
"""
# get all lines before and including the line containing """Image processor
content_header = re.search(r"^(.*?\n)*?\"\"\"Image processor.*", content)
# get all the commented lines at the beginning of the file
content_header = re.search(r"^# coding=utf-8\n(#[^\n]*\n)*", content, re.MULTILINE)
if not content_header:
logger.warning("Couldn't find the content header in the slow image processor file. Using a default header.")
return (
f"# coding=utf-8\n"
f"# Copyright {CURRENT_YEAR} The HuggingFace Team. All rights reserved.\n"
f"#\n"
f'# Licensed under the Apache License, Version 2.0 (the "License");\n'
f"# you may not use this file except in compliance with the License.\n"
f"# You may obtain a copy of the License at\n"
f"#\n"
f"# http://www.apache.org/licenses/LICENSE-2.0\n"
f"#\n"
f"# Unless required by applicable law or agreed to in writing, software\n"
f'# distributed under the License is distributed on an "AS IS" BASIS,\n'
f"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n"
f"# See the License for the specific language governing permissions and\n"
f"# limitations under the License.\n"
f"\n"
)
content_header = content_header.group(0)
# replace the year in the copyright
content_header = re.sub(r"# Copyright (\d+)\s", f"# Copyright {CURRENT_YEAR} ", content_header)
content_header = content_header.replace("Image processor", "Fast Image processor")
# get the line starting with """Image processor in content if it exists
match = re.search(r'^"""Image processor.*$', content, re.MULTILINE)
if match:
content_header += match.group(0).replace("Image processor", "Fast Image processor")
return content_header

View File

@ -29,6 +29,7 @@ from ..models import auto as auto_module
from ..models.auto.configuration_auto import model_type_to_module_name
from ..utils import is_flax_available, is_tf_available, is_torch_available, logging
from . import BaseTransformersCLICommand
from .add_fast_image_processor import add_fast_image_processor
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
@ -66,6 +67,9 @@ class ModelPatterns:
image_processor_class (`str`, *optional*):
The image processor class associated with this model (leave to `None` for models that don't use an image
processor).
image_processor_fast_class (`str`, *optional*):
The fast image processor class associated with this model (leave to `None` for models that don't use a fast
image processor).
feature_extractor_class (`str`, *optional*):
The feature extractor class associated with this model (leave to `None` for models that don't use a feature
extractor).
@ -82,6 +86,7 @@ class ModelPatterns:
config_class: Optional[str] = None
tokenizer_class: Optional[str] = None
image_processor_class: Optional[str] = None
image_processor_fast_class: Optional[str] = None
feature_extractor_class: Optional[str] = None
processor_class: Optional[str] = None
@ -107,6 +112,7 @@ ATTRIBUTE_TO_PLACEHOLDER = {
"config_class": "[CONFIG_CLASS]",
"tokenizer_class": "[TOKENIZER_CLASS]",
"image_processor_class": "[IMAGE_PROCESSOR_CLASS]",
"image_processor_fast_class": "[IMAGE_PROCESSOR_FAST_CLASS]",
"feature_extractor_class": "[FEATURE_EXTRACTOR_CLASS]",
"processor_class": "[PROCESSOR_CLASS]",
"checkpoint": "[CHECKPOINT]",
@ -339,7 +345,13 @@ def replace_model_patterns(
# contains the camel-cased named, but will be treated before.
attributes_to_check = ["config_class"]
# Add relevant preprocessing classes
for attr in ["tokenizer_class", "image_processor_class", "feature_extractor_class", "processor_class"]:
for attr in [
"tokenizer_class",
"image_processor_class",
"image_processor_fast_class",
"feature_extractor_class",
"processor_class",
]:
if getattr(old_model_patterns, attr) is not None and getattr(new_model_patterns, attr) is not None:
attributes_to_check.append(attr)
@ -763,10 +775,10 @@ def retrieve_info_for_model(model_type, frameworks: Optional[List[str]] = None):
tokenizer_class = None
image_processor_classes = auto_module.image_processing_auto.IMAGE_PROCESSOR_MAPPING_NAMES.get(model_type, None)
if isinstance(image_processor_classes, tuple):
image_processor_class = image_processor_classes[0] # we take the slow image processor class.
image_processor_class, image_processor_fast_class = image_processor_classes
else:
image_processor_class = image_processor_classes
image_processor_fast_class = None
feature_extractor_class = auto_module.feature_extraction_auto.FEATURE_EXTRACTOR_MAPPING_NAMES.get(model_type, None)
processor_class = auto_module.processing_auto.PROCESSOR_MAPPING_NAMES.get(model_type, None)
@ -800,6 +812,7 @@ def retrieve_info_for_model(model_type, frameworks: Optional[List[str]] = None):
config_class=config_class,
tokenizer_class=tokenizer_class,
image_processor_class=image_processor_class,
image_processor_fast_class=image_processor_fast_class,
feature_extractor_class=feature_extractor_class,
processor_class=processor_class,
)
@ -957,6 +970,7 @@ def add_model_to_main_init(
processing_classes = [
old_model_patterns.tokenizer_class,
old_model_patterns.image_processor_class,
old_model_patterns.image_processor_fast_class,
old_model_patterns.feature_extractor_class,
old_model_patterns.processor_class,
]
@ -1034,7 +1048,7 @@ AUTO_CLASSES_PATTERNS = {
' ("{model_type}", "{pretrained_archive_map}"),',
],
"feature_extraction_auto.py": [' ("{model_type}", "{feature_extractor_class}"),'],
"image_processing_auto.py": [' ("{model_type}", "{image_processor_class}"),'],
"image_processing_auto.py": [' ("{model_type}", "{image_processor_classes}"),'],
"modeling_auto.py": [' ("{model_type}", "{any_pt_class}"),'],
"modeling_tf_auto.py": [' ("{model_type}", "{any_tf_class}"),'],
"modeling_flax_auto.py": [' ("{model_type}", "{any_flax_class}"),'],
@ -1068,14 +1082,27 @@ def add_model_to_auto_classes(
)
elif "{config_class}" in pattern:
new_patterns.append(pattern.replace("{config_class}", old_model_patterns.config_class))
elif "{image_processor_class}" in pattern:
elif "{image_processor_classes}" in pattern:
if (
old_model_patterns.image_processor_class is not None
and new_model_patterns.image_processor_class is not None
):
new_patterns.append(
pattern.replace("{image_processor_class}", old_model_patterns.image_processor_class)
)
if (
old_model_patterns.image_processor_fast_class is not None
and new_model_patterns.image_processor_fast_class is not None
):
new_patterns.append(
pattern.replace(
'"{image_processor_classes}"',
f'("{old_model_patterns.image_processor_class}", "{old_model_patterns.image_processor_fast_class}")',
)
)
else:
new_patterns.append(
pattern.replace(
'"{image_processor_classes}"', f'("{old_model_patterns.image_processor_class}",)'
)
)
elif "{feature_extractor_class}" in pattern:
if (
old_model_patterns.feature_extractor_class is not None
@ -1101,7 +1128,6 @@ def add_model_to_auto_classes(
new_model_line = new_model_line.replace(
old_model_patterns.model_camel_cased, new_model_patterns.model_camel_cased
)
add_content_to_file(full_name, new_model_line, add_after=old_model_line)
# Tokenizers require special handling
@ -1198,6 +1224,10 @@ def duplicate_doc_file(
# We only add the image processor if necessary
if old_model_patterns.image_processor_class != new_model_patterns.image_processor_class:
new_blocks.append(new_block)
elif "ImageProcessorFast" in block_class:
# We only add the image processor if necessary
if old_model_patterns.image_processor_fast_class != new_model_patterns.image_processor_fast_class:
new_blocks.append(new_block)
elif "FeatureExtractor" in block_class:
# We only add the feature extractor if necessary
if old_model_patterns.feature_extractor_class != new_model_patterns.feature_extractor_class:
@ -1281,6 +1311,7 @@ def create_new_model_like(
add_copied_from: bool = True,
frameworks: Optional[List[str]] = None,
old_checkpoint: Optional[str] = None,
create_fast_image_processor: bool = False,
):
"""
Creates a new model module like a given model of the Transformers library.
@ -1295,6 +1326,8 @@ def create_new_model_like(
old_checkpoint (`str`, *optional*):
The name of the base checkpoint for the old model. Should be passed along when it can't be automatically
recovered from the `model_type`.
create_fast_image_processor (`bool`, *optional*, defaults to `False`):
Whether or not to add a fast image processor to the new model, if the old model had only a slow one.
"""
# Retrieve all the old model info.
model_info = retrieve_info_for_model(model_type, frameworks=frameworks)
@ -1309,7 +1342,13 @@ def create_new_model_like(
)
keep_old_processing = True
for processing_attr in ["image_processor_class", "feature_extractor_class", "processor_class", "tokenizer_class"]:
for processing_attr in [
"image_processor_class",
"image_processor_fast_class",
"feature_extractor_class",
"processor_class",
"tokenizer_class",
]:
if getattr(old_model_patterns, processing_attr) != getattr(new_model_patterns, processing_attr):
keep_old_processing = False
@ -1416,7 +1455,11 @@ def create_new_model_like(
duplicate_doc_file(doc_file, old_model_patterns, new_model_patterns, frameworks=frameworks)
insert_model_in_doc_toc(old_model_patterns, new_model_patterns)
# 6. Warn the user for duplicate patterns
# 6. Add fast image processor if necessary
if create_fast_image_processor:
add_fast_image_processor(model_name=new_model_patterns.model_lower_cased)
# 7. Warn the user for duplicate patterns
if old_model_patterns.model_type == old_model_patterns.checkpoint:
print(
"The model you picked has the same name for the model type and the checkpoint name "
@ -1484,6 +1527,7 @@ class AddNewModelLikeCommand(BaseTransformersCLICommand):
self.add_copied_from,
self.frameworks,
self.old_checkpoint,
self.create_fast_image_processor,
) = get_user_input()
self.path_to_repo = path_to_repo
@ -1503,6 +1547,7 @@ class AddNewModelLikeCommand(BaseTransformersCLICommand):
add_copied_from=self.add_copied_from,
frameworks=self.frameworks,
old_checkpoint=self.old_checkpoint,
create_fast_image_processor=self.create_fast_image_processor,
)
@ -1594,6 +1639,7 @@ def get_user_input():
old_model_info = retrieve_info_for_model(old_model_type)
old_tokenizer_class = old_model_info["model_patterns"].tokenizer_class
old_image_processor_class = old_model_info["model_patterns"].image_processor_class
old_image_processor_fast_class = old_model_info["model_patterns"].image_processor_fast_class
old_feature_extractor_class = old_model_info["model_patterns"].feature_extractor_class
old_processor_class = old_model_info["model_patterns"].processor_class
old_frameworks = old_model_info["frameworks"]
@ -1634,7 +1680,13 @@ def get_user_input():
old_processing_classes = [
c if not isinstance(c, tuple) else c[0]
for c in [old_image_processor_class, old_feature_extractor_class, old_tokenizer_class, old_processor_class]
for c in [
old_image_processor_class,
old_image_processor_fast_class,
old_feature_extractor_class,
old_tokenizer_class,
old_processor_class,
]
if c is not None
]
old_processing_classes = ", ".join(old_processing_classes)
@ -1645,9 +1697,11 @@ def get_user_input():
)
if keep_processing:
image_processor_class = old_image_processor_class
image_processor_fast_class = old_image_processor_fast_class
feature_extractor_class = old_feature_extractor_class
processor_class = old_processor_class
tokenizer_class = old_tokenizer_class
create_fast_image_processor = False
else:
if old_tokenizer_class is not None:
tokenizer_class = get_user_field(
@ -1663,6 +1717,13 @@ def get_user_input():
)
else:
image_processor_class = None
if old_image_processor_fast_class is not None:
image_processor_fast_class = get_user_field(
"What will be the name of the fast image processor class for this model? ",
default_value=f"{model_camel_cased}ImageProcessorFast",
)
else:
image_processor_fast_class = None
if old_feature_extractor_class is not None:
feature_extractor_class = get_user_field(
"What will be the name of the feature extractor class for this model? ",
@ -1677,6 +1738,16 @@ def get_user_input():
)
else:
processor_class = None
if old_image_processor_class is not None and old_image_processor_fast_class is None:
create_fast_image_processor = get_user_field(
"A fast image processor can be created from the slow one, but modifications might be needed. "
"Should we add a fast image processor class for this model (recommended, yes/no)? ",
convert_to=convert_to_bool,
default_value="yes",
fallback_message="Please answer yes/no, y/n, true/false or 1/0.",
)
else:
create_fast_image_processor = False
model_patterns = ModelPatterns(
model_name,
@ -1688,6 +1759,7 @@ def get_user_input():
config_class=config_class,
tokenizer_class=tokenizer_class,
image_processor_class=image_processor_class,
image_processor_fast_class=image_processor_fast_class,
feature_extractor_class=feature_extractor_class,
processor_class=processor_class,
)
@ -1706,6 +1778,7 @@ def get_user_input():
default_value="yes",
fallback_message="Please answer yes/no, y/n, true/false or 1/0.",
)
if all_frameworks:
frameworks = None
else:
@ -1715,4 +1788,4 @@ def get_user_input():
)
frameworks = list(set(frameworks.split(" ")))
return (old_model_type, model_patterns, add_copied_from, frameworks, old_checkpoint)
return (old_model_type, model_patterns, add_copied_from, frameworks, old_checkpoint, create_fast_image_processor)