
* refactor image_processing_auto logic * fix fast image processor tests * Fix tests fast vit image processor * Add safeguard when use_fast True and torchvision not available * change default use_fast back to None, add warnings * remove debugging print * call get_image_processor_class_from_name once
3.7 KiB
Image Processor
An image processor is in charge of preparing input features for vision models and post processing their outputs. This includes transformations such as resizing, normalization, and conversion to PyTorch, TensorFlow, Flax and Numpy tensors. It may also include model specific post-processing such as converting logits to segmentation masks.
Fast image processors are available for a few models and more will be added in the future. They are based on the torchvision library and provide a significant speed-up, especially when processing on GPU.
They have the same API as the base image processors and can be used as drop-in replacements.
To use a fast image processor, you need to install the torchvision
library, and set the use_fast
argument to True
when instantiating the image processor:
from transformers import AutoImageProcessor
processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)
Note that use_fast
will be set to True
by default in a future release.
When using a fast image processor, you can also set the device
argument to specify the device on which the processing should be done. By default, the processing is done on the same device as the inputs if the inputs are tensors, or on the CPU otherwise.
from torchvision.io import read_image
from transformers import DetrImageProcessorFast
images = read_image("image.jpg")
processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
images_processed = processor(images, return_tensors="pt", device="cuda")
Here are some speed comparisons between the base and fast image processors for the DETR
and RT-DETR
models, and how they impact overall inference time:




These benchmarks were run on an AWS EC2 g5.2xlarge instance, utilizing an NVIDIA A10G Tensor Core GPU.
ImageProcessingMixin
autodoc image_processing_utils.ImageProcessingMixin - from_pretrained - save_pretrained
BatchFeature
autodoc BatchFeature
BaseImageProcessor
autodoc image_processing_utils.BaseImageProcessor
BaseImageProcessorFast
autodoc image_processing_utils_fast.BaseImageProcessorFast