Add support for fast image processors in add-new-model-like CLI (#36313)

* add support for fast image processors in add-new-model-like * fix header not found add-fast-image-processor-cli * Encourage adding fast image processor * nit * start improve doc * update docs * make requested modifs
2025-07-31 02:02:21 +06:00 · 2025-03-13 14:16:37 -04:00 · 2025-03-13 14:16:37 -04:00 · 69bc848480
commit 69bc848480
parent 48ef468c74
3 changed files with 157 additions and 18 deletions
--- a/docs/source/en/add_new_model.md
+++ b/docs/source/en/add_new_model.md
@ -476,7 +476,7 @@ When both implementations produce the same output, verify the outputs are within
 torch.allclose(original_output, output, atol=1e-3)
 ```

-This is typically the most difficult part of the process. Congratulations if you've made it this far! 
+This is typically the most difficult part of the process. Congratulations if you've made it this far!

 And if you're stuck or struggling with this step, don't hesitate to ask for help on your pull request.

@ -541,6 +541,48 @@ input_ids = tokenizer(input_str).input_ids

 When both implementations have the same `input_ids`, add a tokenizer test file. This file is analogous to the modeling test files. The tokenizer test files should contain a couple of hardcoded integration tests.

+## Implement image processor
+
+> [!TIP]
+> Fast image processors use the [torchvision](https://pytorch.org/vision/stable/index.html) library and can perform image processing on the GPU, significantly improving processing speed.
+> We recommend adding a fast image processor ([`BaseImageProcessorFast`]) in addition to the "slow" image processor ([`BaseImageProcessor`]) to provide users with the best performance. Feel free to tag [@yonigozlan](https://github.com/yonigozlan) for help adding a [`BaseImageProcessorFast`].
+
+While this example doesn't include an image processor, you may need to implement one if your model requires image inputs. The image processor is responsible for converting images into a format suitable for your model. Before implementing a new one, check whether an existing image processor in the Transformers library can be reused, as many models share similar image processing techniques. Note that you can also use [modular](./modular_transformers) for image processors to reuse existing components.
+
+If you do need to implement a new image processor, refer to an existing image processor to understand the expected structure. Slow image processors ([`BaseImageProcessor`]) and fast image processors ([`BaseImageProcessorFast`]) are designed differently, so make sure you follow the correct structure based on the processor type you're implementing.
+
+Run the following command (only if you haven't already created the fast image processor with the `transformers-cli add-new-model-like` command) to generate the necessary imports and to create a prefilled template for the fast image processor. Modify the template to fit your model.
+
+```bash
+transformers-cli add-fast-image-processor --model-name your_model_name
+```
+
+This command will generate the necessary imports and provide a pre-filled template for the fast image processor. You can then modify it to fit your model's needs.
+
+Add tests for the image processor in `tests/models/your_model_name/test_image_processing_your_model_name.py`. These tests should be similar to those for other image processors and should verify that the image processor correctly handles image inputs. If your image processor includes unique features or processing methods, ensure you add specific tests for those as well.
+
+## Implement processor
+
+If your model accepts multiple modalities, like text and images, you need to add a processor. The processor centralizes the preprocessing of different modalities before passing them to the model.
+
+The processor should call the appropriate modality-specific processors within its `__call__` function to handle each type of input correctly. Be sure to check existing processors in the library to understand their expected structure. Transformers uses the following convention in the `__call__` function signature.
+
+```python
+def __call__(
+    self,
+    images: ImageInput = None,
+    text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
+    audio=None,
+    videos=None,
+    **kwargs: Unpack[YourModelProcessorKwargs],
+) -> BatchFeature:
+    ...
+```
+
+`YourModelProcessorKwargs` is a `TypedDict` that includes all the typical processing arguments and any extra arguments a specific processor may require.
+
+Add tests for the processor in `tests/models/your_model_name/test_processor_your_model_name.py`. These tests should be similar to those for other processors and should verify that the processor correctly handles the different modalities.
+
 ## Integration tests

 Now that you have a model and tokenizer, add end-to-end integration tests for the model and tokenizer to `tests/models/brand_new_llama/test_modeling_brand_new_llama.py`.
@ -620,4 +662,4 @@ There are four timelines for model additions depending on the model contributor

 - **Hub-first release**: Transformers [remote-code](./models#custom-models) feature allows Transformers-based projects to be shared directly on the Hub. This is a good option if you don't have the bandwidth to add a model directly to Transformers.

-  If a model ends up being very popular, then it's very likely that we'll integrate it in Transformers ourselves to enable better support (documentation, maintenance, optimization, etc.) for it. A Hub-first release is the most frictionless way to add a model.
+  If a model ends up being very popular, then it's very likely that we'll integrate it in Transformers ourselves to enable better support (documentation, maintenance, optimization, etc.) for it. A Hub-first release is the most frictionless way to add a model.
--- a/src/transformers/commands/add_fast_image_processor.py
+++ b/src/transformers/commands/add_fast_image_processor.py
@ -414,11 +414,35 @@ def get_fast_image_processing_content_header(content: str) -> str:
    """
    Get the header of the slow image processor file.
    """
-    # get all lines before and including the line containing """Image processor
-    content_header = re.search(r"^(.*?\n)*?\"\"\"Image processor.*", content)
+    # get all the commented lines at the beginning of the file
+    content_header = re.search(r"^# coding=utf-8\n(#[^\n]*\n)*", content, re.MULTILINE)
+    if not content_header:
+        logger.warning("Couldn't find the content header in the slow image processor file. Using a default header.")
+        return (
+            f"# coding=utf-8\n"
+            f"# Copyright {CURRENT_YEAR} The HuggingFace Team. All rights reserved.\n"
+            f"#\n"
+            f'# Licensed under the Apache License, Version 2.0 (the "License");\n'
+            f"# you may not use this file except in compliance with the License.\n"
+            f"# You may obtain a copy of the License at\n"
+            f"#\n"
+            f"#     http://www.apache.org/licenses/LICENSE-2.0\n"
+            f"#\n"
+            f"# Unless required by applicable law or agreed to in writing, software\n"
+            f'# distributed under the License is distributed on an "AS IS" BASIS,\n'
+            f"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n"
+            f"# See the License for the specific language governing permissions and\n"
+            f"# limitations under the License.\n"
+            f"\n"
+        )
    content_header = content_header.group(0)
+    # replace the year in the copyright
    content_header = re.sub(r"# Copyright (\d+)\s", f"# Copyright {CURRENT_YEAR} ", content_header)
-    content_header = content_header.replace("Image processor", "Fast Image processor")
+    # get the line starting with """Image processor in content if it exists
+    match = re.search(r'^"""Image processor.*$', content, re.MULTILINE)
+    if match:
+        content_header += match.group(0).replace("Image processor", "Fast Image processor")
+
    return content_header


--- a/src/transformers/commands/add_new_model_like.py
+++ b/src/transformers/commands/add_new_model_like.py
@ -29,6 +29,7 @@ from ..models import auto as auto_module
 from ..models.auto.configuration_auto import model_type_to_module_name
 from ..utils import is_flax_available, is_tf_available, is_torch_available, logging
 from . import BaseTransformersCLICommand
+from .add_fast_image_processor import add_fast_image_processor


 logger = logging.get_logger(__name__)  # pylint: disable=invalid-name
@ -66,6 +67,9 @@ class ModelPatterns:
        image_processor_class (`str`, *optional*):
            The image processor class associated with this model (leave to `None` for models that don't use an image
            processor).
+        image_processor_fast_class (`str`, *optional*):
+            The fast image processor class associated with this model (leave to `None` for models that don't use a fast
+            image processor).
        feature_extractor_class (`str`, *optional*):
            The feature extractor class associated with this model (leave to `None` for models that don't use a feature
            extractor).
@ -82,6 +86,7 @@ class ModelPatterns:
    config_class: Optional[str] = None
    tokenizer_class: Optional[str] = None
    image_processor_class: Optional[str] = None
+    image_processor_fast_class: Optional[str] = None
    feature_extractor_class: Optional[str] = None
    processor_class: Optional[str] = None

@ -107,6 +112,7 @@ ATTRIBUTE_TO_PLACEHOLDER = {
    "config_class": "[CONFIG_CLASS]",
    "tokenizer_class": "[TOKENIZER_CLASS]",
    "image_processor_class": "[IMAGE_PROCESSOR_CLASS]",
+    "image_processor_fast_class": "[IMAGE_PROCESSOR_FAST_CLASS]",
    "feature_extractor_class": "[FEATURE_EXTRACTOR_CLASS]",
    "processor_class": "[PROCESSOR_CLASS]",
    "checkpoint": "[CHECKPOINT]",
@ -339,7 +345,13 @@ def replace_model_patterns(
    # contains the camel-cased named, but will be treated before.
    attributes_to_check = ["config_class"]
    # Add relevant preprocessing classes
-    for attr in ["tokenizer_class", "image_processor_class", "feature_extractor_class", "processor_class"]:
+    for attr in [
+        "tokenizer_class",
+        "image_processor_class",
+        "image_processor_fast_class",
+        "feature_extractor_class",
+        "processor_class",
+    ]:
        if getattr(old_model_patterns, attr) is not None and getattr(new_model_patterns, attr) is not None:
            attributes_to_check.append(attr)

@ -763,10 +775,10 @@ def retrieve_info_for_model(model_type, frameworks: Optional[List[str]] = None):
        tokenizer_class = None
    image_processor_classes = auto_module.image_processing_auto.IMAGE_PROCESSOR_MAPPING_NAMES.get(model_type, None)
    if isinstance(image_processor_classes, tuple):
-        image_processor_class = image_processor_classes[0]  # we take the slow image processor class.
+        image_processor_class, image_processor_fast_class = image_processor_classes
    else:
        image_processor_class = image_processor_classes
-
+        image_processor_fast_class = None
    feature_extractor_class = auto_module.feature_extraction_auto.FEATURE_EXTRACTOR_MAPPING_NAMES.get(model_type, None)
    processor_class = auto_module.processing_auto.PROCESSOR_MAPPING_NAMES.get(model_type, None)

@ -800,6 +812,7 @@ def retrieve_info_for_model(model_type, frameworks: Optional[List[str]] = None):
        config_class=config_class,
        tokenizer_class=tokenizer_class,
        image_processor_class=image_processor_class,
+        image_processor_fast_class=image_processor_fast_class,
        feature_extractor_class=feature_extractor_class,
        processor_class=processor_class,
    )
@ -957,6 +970,7 @@ def add_model_to_main_init(
                processing_classes = [
                    old_model_patterns.tokenizer_class,
                    old_model_patterns.image_processor_class,
+                    old_model_patterns.image_processor_fast_class,
                    old_model_patterns.feature_extractor_class,
                    old_model_patterns.processor_class,
                ]
@ -1034,7 +1048,7 @@ AUTO_CLASSES_PATTERNS = {
        '        ("{model_type}", "{pretrained_archive_map}"),',
    ],
    "feature_extraction_auto.py": ['        ("{model_type}", "{feature_extractor_class}"),'],
-    "image_processing_auto.py": ['        ("{model_type}", "{image_processor_class}"),'],
+    "image_processing_auto.py": ['        ("{model_type}", "{image_processor_classes}"),'],
    "modeling_auto.py": ['        ("{model_type}", "{any_pt_class}"),'],
    "modeling_tf_auto.py": ['        ("{model_type}", "{any_tf_class}"),'],
    "modeling_flax_auto.py": ['        ("{model_type}", "{any_flax_class}"),'],
@ -1068,14 +1082,27 @@ def add_model_to_auto_classes(
                    )
            elif "{config_class}" in pattern:
                new_patterns.append(pattern.replace("{config_class}", old_model_patterns.config_class))
-            elif "{image_processor_class}" in pattern:
+            elif "{image_processor_classes}" in pattern:
                if (
                    old_model_patterns.image_processor_class is not None
                    and new_model_patterns.image_processor_class is not None
                ):
-                    new_patterns.append(
-                        pattern.replace("{image_processor_class}", old_model_patterns.image_processor_class)
-                    )
+                    if (
+                        old_model_patterns.image_processor_fast_class is not None
+                        and new_model_patterns.image_processor_fast_class is not None
+                    ):
+                        new_patterns.append(
+                            pattern.replace(
+                                '"{image_processor_classes}"',
+                                f'("{old_model_patterns.image_processor_class}", "{old_model_patterns.image_processor_fast_class}")',
+                            )
+                        )
+                    else:
+                        new_patterns.append(
+                            pattern.replace(
+                                '"{image_processor_classes}"', f'("{old_model_patterns.image_processor_class}",)'
+                            )
+                        )
            elif "{feature_extractor_class}" in pattern:
                if (
                    old_model_patterns.feature_extractor_class is not None
@ -1101,7 +1128,6 @@ def add_model_to_auto_classes(
            new_model_line = new_model_line.replace(
                old_model_patterns.model_camel_cased, new_model_patterns.model_camel_cased
            )
-
            add_content_to_file(full_name, new_model_line, add_after=old_model_line)

    # Tokenizers require special handling
@ -1198,6 +1224,10 @@ def duplicate_doc_file(
                # We only add the image processor if necessary
                if old_model_patterns.image_processor_class != new_model_patterns.image_processor_class:
                    new_blocks.append(new_block)
+            elif "ImageProcessorFast" in block_class:
+                # We only add the image processor if necessary
+                if old_model_patterns.image_processor_fast_class != new_model_patterns.image_processor_fast_class:
+                    new_blocks.append(new_block)
            elif "FeatureExtractor" in block_class:
                # We only add the feature extractor if necessary
                if old_model_patterns.feature_extractor_class != new_model_patterns.feature_extractor_class:
@ -1281,6 +1311,7 @@ def create_new_model_like(
    add_copied_from: bool = True,
    frameworks: Optional[List[str]] = None,
    old_checkpoint: Optional[str] = None,
+    create_fast_image_processor: bool = False,
 ):
    """
    Creates a new model module like a given model of the Transformers library.
@ -1295,6 +1326,8 @@ def create_new_model_like(
        old_checkpoint (`str`, *optional*):
            The name of the base checkpoint for the old model. Should be passed along when it can't be automatically
            recovered from the `model_type`.
+        create_fast_image_processor (`bool`, *optional*, defaults to `False`):
+            Whether or not to add a fast image processor to the new model, if the old model had only a slow one.
    """
    # Retrieve all the old model info.
    model_info = retrieve_info_for_model(model_type, frameworks=frameworks)
@ -1309,7 +1342,13 @@ def create_new_model_like(
        )

    keep_old_processing = True
-    for processing_attr in ["image_processor_class", "feature_extractor_class", "processor_class", "tokenizer_class"]:
+    for processing_attr in [
+        "image_processor_class",
+        "image_processor_fast_class",
+        "feature_extractor_class",
+        "processor_class",
+        "tokenizer_class",
+    ]:
        if getattr(old_model_patterns, processing_attr) != getattr(new_model_patterns, processing_attr):
            keep_old_processing = False

@ -1416,7 +1455,11 @@ def create_new_model_like(
    duplicate_doc_file(doc_file, old_model_patterns, new_model_patterns, frameworks=frameworks)
    insert_model_in_doc_toc(old_model_patterns, new_model_patterns)

-    # 6. Warn the user for duplicate patterns
+    # 6. Add fast image processor if necessary
+    if create_fast_image_processor:
+        add_fast_image_processor(model_name=new_model_patterns.model_lower_cased)
+
+    # 7. Warn the user for duplicate patterns
    if old_model_patterns.model_type == old_model_patterns.checkpoint:
        print(
            "The model you picked has the same name for the model type and the checkpoint name "
@ -1484,6 +1527,7 @@ class AddNewModelLikeCommand(BaseTransformersCLICommand):
                self.add_copied_from,
                self.frameworks,
                self.old_checkpoint,
+                self.create_fast_image_processor,
            ) = get_user_input()

        self.path_to_repo = path_to_repo
@ -1503,6 +1547,7 @@ class AddNewModelLikeCommand(BaseTransformersCLICommand):
            add_copied_from=self.add_copied_from,
            frameworks=self.frameworks,
            old_checkpoint=self.old_checkpoint,
+            create_fast_image_processor=self.create_fast_image_processor,
        )


@ -1594,6 +1639,7 @@ def get_user_input():
    old_model_info = retrieve_info_for_model(old_model_type)
    old_tokenizer_class = old_model_info["model_patterns"].tokenizer_class
    old_image_processor_class = old_model_info["model_patterns"].image_processor_class
+    old_image_processor_fast_class = old_model_info["model_patterns"].image_processor_fast_class
    old_feature_extractor_class = old_model_info["model_patterns"].feature_extractor_class
    old_processor_class = old_model_info["model_patterns"].processor_class
    old_frameworks = old_model_info["frameworks"]
@ -1634,7 +1680,13 @@ def get_user_input():

    old_processing_classes = [
        c if not isinstance(c, tuple) else c[0]
-        for c in [old_image_processor_class, old_feature_extractor_class, old_tokenizer_class, old_processor_class]
+        for c in [
+            old_image_processor_class,
+            old_image_processor_fast_class,
+            old_feature_extractor_class,
+            old_tokenizer_class,
+            old_processor_class,
+        ]
        if c is not None
    ]
    old_processing_classes = ", ".join(old_processing_classes)
@ -1645,9 +1697,11 @@ def get_user_input():
    )
    if keep_processing:
        image_processor_class = old_image_processor_class
+        image_processor_fast_class = old_image_processor_fast_class
        feature_extractor_class = old_feature_extractor_class
        processor_class = old_processor_class
        tokenizer_class = old_tokenizer_class
+        create_fast_image_processor = False
    else:
        if old_tokenizer_class is not None:
            tokenizer_class = get_user_field(
@ -1663,6 +1717,13 @@ def get_user_input():
            )
        else:
            image_processor_class = None
+        if old_image_processor_fast_class is not None:
+            image_processor_fast_class = get_user_field(
+                "What will be the name of the fast image processor class for this model? ",
+                default_value=f"{model_camel_cased}ImageProcessorFast",
+            )
+        else:
+            image_processor_fast_class = None
        if old_feature_extractor_class is not None:
            feature_extractor_class = get_user_field(
                "What will be the name of the feature extractor class for this model? ",
@ -1677,6 +1738,16 @@ def get_user_input():
            )
        else:
            processor_class = None
+        if old_image_processor_class is not None and old_image_processor_fast_class is None:
+            create_fast_image_processor = get_user_field(
+                "A fast image processor can be created from the slow one, but modifications might be needed. "
+                "Should we add a fast image processor class for this model (recommended, yes/no)? ",
+                convert_to=convert_to_bool,
+                default_value="yes",
+                fallback_message="Please answer yes/no, y/n, true/false or 1/0.",
+            )
+        else:
+            create_fast_image_processor = False

    model_patterns = ModelPatterns(
        model_name,
@ -1688,6 +1759,7 @@ def get_user_input():
        config_class=config_class,
        tokenizer_class=tokenizer_class,
        image_processor_class=image_processor_class,
+        image_processor_fast_class=image_processor_fast_class,
        feature_extractor_class=feature_extractor_class,
        processor_class=processor_class,
    )
@ -1706,6 +1778,7 @@ def get_user_input():
        default_value="yes",
        fallback_message="Please answer yes/no, y/n, true/false or 1/0.",
    )
+
    if all_frameworks:
        frameworks = None
    else:
@ -1715,4 +1788,4 @@ def get_user_input():
        )
        frameworks = list(set(frameworks.split(" ")))

-    return (old_model_type, model_patterns, add_copied_from, frameworks, old_checkpoint)
+    return (old_model_type, model_patterns, add_copied_from, frameworks, old_checkpoint, create_fast_image_processor)