Update old existing feature extractor references (#24552)

* Update old existing feature extractor references * Typo * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Address comments from review - update 'feature extractor' Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-31 02:02:21 +06:00 · 2023-06-29 10:17:36 +01:00 · 2023-06-29 10:17:36 +01:00 · ae454f41d4
commit ae454f41d4
parent 10c2ac7bc6
138 changed files with 762 additions and 743 deletions
--- a/docs/source/de/preprocessing.md
+++ b/docs/source/de/preprocessing.md
@ -354,12 +354,12 @@ Als Nächstes sehen Sie sich das Bild mit dem Merkmal 🤗 Datensätze [Bild] (h

 ### Merkmalsextraktor

-Laden Sie den Merkmalsextraktor mit [`AutoFeatureExtractor.from_pretrained`]:
+Laden Sie den Merkmalsextraktor mit [`AutoImageProcessor.from_pretrained`]:

 ```py
->>> from transformers import AutoFeatureExtractor
+>>> from transformers import AutoImageProcessor

->>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
+>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
 ```

 ### Datenerweiterung
@ -371,9 +371,9 @@ Bei Bildverarbeitungsaufgaben ist es üblich, den Bildern als Teil der Vorverarb
 ```py
 >>> from torchvision.transforms import Compose, Normalize, RandomResizedCrop, ColorJitter, ToTensor

->>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
+>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
 >>> _transforms = Compose(
-...     [RandomResizedCrop(feature_extractor.size), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
+...     [RandomResizedCrop(image_processor.size["height"]), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
 ... )
 ```

--- a/docs/source/en/create_a_model.md
+++ b/docs/source/en/create_a_model.md
@ -263,7 +263,7 @@ To use, create an image processor associated with the model you're using. For ex
 ViTImageProcessor {
  "do_normalize": true,
  "do_resize": true,
-  "feature_extractor_type": "ViTImageProcessor",
+  "image_processor_type": "ViTImageProcessor",
  "image_mean": [
    0.5,
    0.5,
@ -295,7 +295,7 @@ Modify any of the [`ViTImageProcessor`] parameters to create your custom image p
 ViTImageProcessor {
  "do_normalize": false,
  "do_resize": true,
-  "feature_extractor_type": "ViTImageProcessor",
+  "image_processor_type": "ViTImageProcessor",
  "image_mean": [
    0.3,
    0.3,
--- a/docs/source/en/model_doc/clip.md
+++ b/docs/source/en/model_doc/clip.md
@ -50,10 +50,10 @@ product between the projected image and text features is then used as a similar
 To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
 which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
 also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
-The [`CLIPFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model.
+The [`CLIPImageProcessor`] can be used to resize (or rescale) and normalize images for the model.

 The [`CLIPTokenizer`] is used to encode the text. The [`CLIPProcessor`] wraps
-[`CLIPFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both
+[`CLIPImageProcessor`] and [`CLIPTokenizer`] into a single instance to both
 encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
 [`CLIPProcessor`] and [`CLIPModel`].

--- a/docs/source/en/model_doc/donut.md
+++ b/docs/source/en/model_doc/donut.md
@ -46,9 +46,9 @@ Tips:
 Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
 [`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.

-The [`DonutFeatureExtractor`] class is responsible for preprocessing the input image and
+The [`DonutImageProcessor`] class is responsible for preprocessing the input image and
 [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
-[`DonutProcessor`] wraps [`DonutFeatureExtractor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
+[`DonutProcessor`] wraps [`DonutImageProcessor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
 into a single instance to both extract the input features and decode the predicted token ids.

 - Step-by-step Document Image Classification
--- a/docs/source/en/model_doc/layoutlmv2.md
+++ b/docs/source/en/model_doc/layoutlmv2.md
@ -150,23 +150,23 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 ## Usage: LayoutLMv2Processor

 The easiest way to prepare data for the model is to use [`LayoutLMv2Processor`], which internally
-combines a feature extractor ([`LayoutLMv2FeatureExtractor`]) and a tokenizer
-([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The feature extractor
+combines a image processor ([`LayoutLMv2ImageProcessor`]) and a tokenizer
+([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The image processor
 handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
 for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
 modality.

 ```python
-from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
+from transformers import LayoutLMv2ImageProcessor, LayoutLMv2TokenizerFast, LayoutLMv2Processor

-feature_extractor = LayoutLMv2FeatureExtractor()  # apply_ocr is set to True by default
+image_processor = LayoutLMv2ImageProcessor()  # apply_ocr is set to True by default
 tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
-processor = LayoutLMv2Processor(feature_extractor, tokenizer)
+processor = LayoutLMv2Processor(image_processor, tokenizer)
 ```

 In short, one can provide a document image (and possibly additional data) to [`LayoutLMv2Processor`],
 and it will create the inputs expected by the model. Internally, the processor first uses
-[`LayoutLMv2FeatureExtractor`] to apply OCR on the image to get a list of words and normalized
+[`LayoutLMv2ImageProcessor`] to apply OCR on the image to get a list of words and normalized
 bounding boxes, as well to resize the image to a given size in order to get the `image` input. The words and
 normalized bounding boxes are then provided to [`LayoutLMv2Tokenizer`] or
 [`LayoutLMv2TokenizerFast`], which converts them to token-level `input_ids`,
@ -176,7 +176,7 @@ which are turned into token-level `labels`.
 [`LayoutLMv2Processor`] uses [PyTesseract](https://pypi.org/project/pytesseract/), a Python
 wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
 choice, and provide the words and normalized boxes yourself. This requires initializing
-[`LayoutLMv2FeatureExtractor`] with `apply_ocr` set to `False`.
+[`LayoutLMv2ImageProcessor`] with `apply_ocr` set to `False`.

 In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
 use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
@ -184,7 +184,7 @@ use cases work for both batched and non-batched inputs (we illustrate them for n
 **Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
 True**

-This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
+This is the simplest case, in which the processor (actually the image processor) will perform OCR on the image to get
 the words and normalized bounding boxes.

 ```python
@ -205,7 +205,7 @@ print(encoding.keys())

 **Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**

-In case one wants to do OCR themselves, one can initialize the feature extractor with `apply_ocr` set to
+In case one wants to do OCR themselves, one can initialize the image processor with `apply_ocr` set to
 `False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
 the processor.

--- a/docs/source/en/model_doc/layoutlmv3.md
+++ b/docs/source/en/model_doc/layoutlmv3.md
@ -31,7 +31,7 @@ Tips:
 - In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
    - images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
    - text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
-  Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
+  Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3ImageProcessor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
 - Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
 - Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
 - Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
--- a/docs/source/en/model_doc/layoutxlm.md
+++ b/docs/source/en/model_doc/layoutxlm.md
@ -52,7 +52,7 @@ tokenizer = LayoutXLMTokenizer.from_pretrained("microsoft/layoutxlm-base")
 ```

 Similar to LayoutLMv2, you can use [`LayoutXLMProcessor`] (which internally applies
-[`LayoutLMv2FeatureExtractor`] and
+[`LayoutLMv2ImageProcessor`] and
 [`LayoutXLMTokenizer`]/[`LayoutXLMTokenizerFast`] in sequence) to prepare all
 data for the model.

--- a/docs/source/en/model_doc/owlvit.md
+++ b/docs/source/en/model_doc/owlvit.md
@ -28,7 +28,7 @@ The abstract from the paper is the following:

 OWL-ViT is a zero-shot text-conditioned object detection model. OWL-ViT uses [CLIP](clip) as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection.

-[`OwlViTFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].
+[`OwlViTImageProcessor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTImageProcessor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].


 ```python
--- a/docs/source/en/model_doc/vilt.md
+++ b/docs/source/en/model_doc/vilt.md
@ -39,7 +39,7 @@ Tips:
 - The quickest way to get started with ViLT is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ViLT)
  (which showcase both inference and fine-tuning on custom data).
 - ViLT is a model that takes both `pixel_values` and `input_ids` as input. One can use [`ViltProcessor`] to prepare data for the model.
-  This processor wraps a feature extractor (for the image modality) and a tokenizer (for the language modality) into one.
+  This processor wraps a image processor (for the image modality) and a tokenizer (for the language modality) into one.
 - ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to
  under 640 while preserving the aspect ratio. To make batching of images possible, the authors use a `pixel_mask` that indicates
  which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you.
--- a/docs/source/en/tasks/object_detection.md
+++ b/docs/source/en/tasks/object_detection.md
@ -462,9 +462,9 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco


 >>> class CocoDetection(torchvision.datasets.CocoDetection):
-...     def __init__(self, img_folder, feature_extractor, ann_file):
+...     def __init__(self, img_folder, image_processor, ann_file):
 ...         super().__init__(img_folder, ann_file)
-...         self.feature_extractor = feature_extractor
+...         self.image_processor = image_processor

 ...     def __getitem__(self, idx):
 ...         # read in PIL image and target in COCO format
@ -474,7 +474,7 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco
 ...         # resizing + normalization of both image and target)
 ...         image_id = self.ids[idx]
 ...         target = {"image_id": image_id, "annotations": target}
-...         encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
+...         encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
 ...         pixel_values = encoding["pixel_values"].squeeze()  # remove batch dimension
 ...         target = encoding["labels"][0]  # remove batch dimension

@ -591,4 +591,3 @@ Let's plot the result:
 <div class="flex justify-center">
    <img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
 </div>
-
--- a/docs/source/es/tasks/image_classification.md
+++ b/docs/source/es/tasks/image_classification.md
@ -73,12 +73,12 @@ Cada clase de alimento - o label - corresponde a un número; `79` indica una cos

 ## Preprocesa

-Carga el feature extractor de ViT para procesar la imagen en un tensor:
+Carga el image processor de ViT para procesar la imagen en un tensor:

 ```py
->>> from transformers import AutoFeatureExtractor
+>>> from transformers import AutoImageProcessor

->>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
+>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
 ```

 Aplica varias transformaciones de imagen al dataset para hacer el modelo más robusto contra el overfitting. En este caso se utilizará el módulo [`transforms`](https://pytorch.org/vision/stable/transforms.html) de torchvision. Recorta una parte aleatoria de la imagen, cambia su tamaño y normalízala con la media y la desviación estándar de la imagen:
@ -86,8 +86,8 @@ Aplica varias transformaciones de imagen al dataset para hacer el modelo más ro
 ```py
 >>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

->>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
->>> _transforms = Compose([RandomResizedCrop(feature_extractor.size), ToTensor(), normalize])
+>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
+>>> _transforms = Compose([RandomResizedCrop(image_processor.size["height"]), ToTensor(), normalize])
 ```

 Crea una función de preprocesamiento que aplique las transformaciones y devuelva los `pixel_values` - los inputs al modelo - de la imagen:
@ -160,7 +160,7 @@ Al llegar a este punto, solo quedan tres pasos:
 ...     data_collator=data_collator,
 ...     train_dataset=food["train"],
 ...     eval_dataset=food["test"],
-...     tokenizer=feature_extractor,
+...     tokenizer=image_processor,
 ... )

 >>> trainer.train()
--- a/docs/source/ko/tasks/object_detection.md
+++ b/docs/source/ko/tasks/object_detection.md
@ -454,9 +454,9 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로


 >>> class CocoDetection(torchvision.datasets.CocoDetection):
-...     def __init__(self, img_folder, feature_extractor, ann_file):
+...     def __init__(self, img_folder, image_processor, ann_file):
 ...         super().__init__(img_folder, ann_file)
-...         self.feature_extractor = feature_extractor
+...         self.image_processor = image_processor

 ...     def __getitem__(self, idx):
 ...         # read in PIL image and target in COCO format
@ -466,7 +466,7 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로
 ...         # resizing + normalization of both image and target)
 ...         image_id = self.ids[idx]
 ...         target = {"image_id": image_id, "annotations": target}
-...         encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
+...         encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
 ...         pixel_values = encoding["pixel_values"].squeeze()  # remove batch dimension
 ...         target = encoding["labels"][0]  # remove batch dimension

@ -586,4 +586,3 @@ Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.
 <div class="flex justify-center">
    <img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
 </div>
-
--- a/src/transformers/models/align/convert_align_tf_to_hf.py
+++ b/src/transformers/models/align/convert_align_tf_to_hf.py
@ -354,12 +354,12 @@ def convert_align_checkpoint(checkpoint_path, pytorch_dump_folder_path, save_mod
        # Create folder to save model
        if not os.path.isdir(pytorch_dump_folder_path):
            os.mkdir(pytorch_dump_folder_path)
-        # Save converted model and feature extractor
+        # Save converted model and image processor
        hf_model.save_pretrained(pytorch_dump_folder_path)
        processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
-        # Push model and feature extractor to hub
+        # Push model and image processor to hub
        print("Pushing converted ALIGN to the hub...")
        processor.push_to_hub("align-base")
        hf_model.push_to_hub("align-base")
@ -381,7 +381,7 @@ if __name__ == "__main__":
        help="Path to the output PyTorch model directory.",
    )
    parser.add_argument("--save_model", action="store_true", help="Save model to local")
-    parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
+    parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")

    args = parser.parse_args()
    convert_align_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
--- a/src/transformers/models/beit/convert_beit_unilm_to_pytorch.py
+++ b/src/transformers/models/beit/convert_beit_unilm_to_pytorch.py
@ -27,10 +27,10 @@ from PIL import Image

 from transformers import (
    BeitConfig,
-    BeitFeatureExtractor,
    BeitForImageClassification,
    BeitForMaskedImageModeling,
    BeitForSemanticSegmentation,
+    BeitImageProcessor,
 )
 from transformers.image_utils import PILImageResampling
 from transformers.utils import logging
@ -266,16 +266,16 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):

    # Check outputs on an image
    if is_semantic:
-        feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
+        image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
        ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
        image = Image.open(ds[0]["file"])
    else:
-        feature_extractor = BeitFeatureExtractor(
+        image_processor = BeitImageProcessor(
            size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
        )
        image = prepare_img()

-    encoding = feature_extractor(images=image, return_tensors="pt")
+    encoding = image_processor(images=image, return_tensors="pt")
    pixel_values = encoding["pixel_values"]

    outputs = model(pixel_values)
@ -353,8 +353,8 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/chinese_clip/configuration_chinese_clip.py
+++ b/src/transformers/models/chinese_clip/configuration_chinese_clip.py
@ -468,7 +468,7 @@ class ChineseCLIPOnnxConfig(OnnxConfig):
            processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
        )
        image_input_dict = super().generate_dummy_inputs(
-            processor.feature_extractor, batch_size=batch_size, framework=framework
+            processor.image_processor, batch_size=batch_size, framework=framework
        )
        return {**text_input_dict, **image_input_dict}

--- a/src/transformers/models/clip/configuration_clip.py
+++ b/src/transformers/models/clip/configuration_clip.py
@ -449,7 +449,7 @@ class CLIPOnnxConfig(OnnxConfig):
            processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
        )
        image_input_dict = super().generate_dummy_inputs(
-            processor.feature_extractor, batch_size=batch_size, framework=framework
+            processor.image_processor, batch_size=batch_size, framework=framework
        )
        return {**text_input_dict, **image_input_dict}

--- a/src/transformers/models/clipseg/convert_clipseg_original_pytorch_to_hf.py
+++ b/src/transformers/models/clipseg/convert_clipseg_original_pytorch_to_hf.py
@ -28,7 +28,7 @@ from transformers import (
    CLIPSegTextConfig,
    CLIPSegVisionConfig,
    CLIPTokenizer,
-    ViTFeatureExtractor,
+    ViTImageProcessor,
 )


@ -185,9 +185,9 @@ def convert_clipseg_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_
    if unexpected_keys != ["decoder.reduce.weight", "decoder.reduce.bias"]:
        raise ValueError(f"Unexpected keys: {unexpected_keys}")

-    feature_extractor = ViTFeatureExtractor(size=352)
+    image_processor = ViTImageProcessor(size=352)
    tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
-    processor = CLIPSegProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
+    processor = CLIPSegProcessor(image_processor=image_processor, tokenizer=tokenizer)

    image = prepare_img()
    text = ["a glass", "something to fill", "wood", "a jar"]
--- a/src/transformers/models/conditional_detr/convert_conditional_detr_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/conditional_detr/convert_conditional_detr_original_pytorch_checkpoint_to_pytorch.py
@ -27,9 +27,9 @@ from PIL import Image

 from transformers import (
    ConditionalDetrConfig,
-    ConditionalDetrFeatureExtractor,
    ConditionalDetrForObjectDetection,
    ConditionalDetrForSegmentation,
+    ConditionalDetrImageProcessor,
 )
 from transformers.utils import logging

@ -244,13 +244,13 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
        config.id2label = id2label
        config.label2id = {v: k for k, v in id2label.items()}

-    # load feature extractor
+    # load image processor
    format = "coco_panoptic" if is_panoptic else "coco_detection"
-    feature_extractor = ConditionalDetrFeatureExtractor(format=format)
+    image_processor = ConditionalDetrImageProcessor(format=format)

    # prepare image
    img = prepare_img()
-    encoding = feature_extractor(images=img, return_tensors="pt")
+    encoding = image_processor(images=img, return_tensors="pt")
    pixel_values = encoding["pixel_values"]

    logger.info(f"Converting model {model_name}...")
@ -302,11 +302,11 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
    if is_panoptic:
        assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)

-    # Save model and feature extractor
-    logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
+    # Save model and image processor
+    logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    model.save_pretrained(pytorch_dump_folder_path)
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/convnext/convert_convnext_to_pytorch.py
+++ b/src/transformers/models/convnext/convert_convnext_to_pytorch.py
@ -26,7 +26,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import ConvNextConfig, ConvNextFeatureExtractor, ConvNextForImageClassification
+from transformers import ConvNextConfig, ConvNextForImageClassification, ConvNextImageProcessor
 from transformers.utils import logging


@ -144,10 +144,10 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    model.load_state_dict(state_dict)
    model.eval()

-    # Check outputs on an image, prepared by ConvNextFeatureExtractor
+    # Check outputs on an image, prepared by ConvNextImageProcessor
    size = 224 if "224" in checkpoint_url else 384
-    feature_extractor = ConvNextFeatureExtractor(size=size)
-    pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
+    image_processor = ConvNextImageProcessor(size=size)
+    pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values

    logits = model(pixel_values).logits

@ -191,8 +191,8 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    print("Pushing model to the hub...")
    model_name = "convnext"
--- a/src/transformers/models/cvt/convert_cvt_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/cvt/convert_cvt_original_pytorch_checkpoint_to_pytorch.py
@ -24,7 +24,7 @@ from collections import OrderedDict
 import torch
 from huggingface_hub import cached_download, hf_hub_url

-from transformers import AutoFeatureExtractor, CvtConfig, CvtForImageClassification
+from transformers import AutoImageProcessor, CvtConfig, CvtForImageClassification


 def embeddings(idx):
@ -307,8 +307,8 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo
        config.embed_dim = [192, 768, 1024]

    model = CvtForImageClassification(config)
-    feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
-    feature_extractor.size["shortest_edge"] = image_size
+    image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
+    image_processor.size["shortest_edge"] = image_size
    original_weights = torch.load(cvt_file_name, map_location=torch.device("cpu"))

    huggingface_weights = OrderedDict()
@ -329,7 +329,7 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo

    model.load_state_dict(huggingface_weights)
    model.save_pretrained(pytorch_dump_folder)
-    feature_extractor.save_pretrained(pytorch_dump_folder)
+    image_processor.save_pretrained(pytorch_dump_folder)


 # Download the weights from zoo: https://1drv.ms/u/s!AhIXJn_J-blW9RzF3rMW7SsLHa8h?e=blQ0Al
--- a/src/transformers/models/data2vec/convert_data2vec_vision_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/data2vec/convert_data2vec_vision_original_pytorch_checkpoint_to_pytorch.py
@ -8,7 +8,7 @@ from PIL import Image
 from timm.models import create_model

 from transformers import (
-    BeitFeatureExtractor,
+    BeitImageProcessor,
    Data2VecVisionConfig,
    Data2VecVisionForImageClassification,
    Data2VecVisionModel,
@ -304,9 +304,9 @@ def main():
    orig_model.eval()

    # 3. Forward Beit model
-    feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
+    image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
    image = Image.open("../../../../tests/fixtures/tests_samples/COCO/000000039769.png")
-    encoding = feature_extractor(images=image, return_tensors="pt")
+    encoding = image_processor(images=image, return_tensors="pt")
    pixel_values = encoding["pixel_values"]

    orig_args = (pixel_values,) if is_finetuned else (pixel_values, None)
@ -354,7 +354,7 @@ def main():
    # 7. Save
    print(f"Saving to {args.hf_checkpoint_name}")
    hf_model.save_pretrained(args.hf_checkpoint_name)
-    feature_extractor.save_pretrained(args.hf_checkpoint_name)
+    image_processor.save_pretrained(args.hf_checkpoint_name)


 if __name__ == "__main__":
--- a/src/transformers/models/deformable_detr/convert_deformable_detr_to_pytorch.py
+++ b/src/transformers/models/deformable_detr/convert_deformable_detr_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import cached_download, hf_hub_url
 from PIL import Image

-from transformers import DeformableDetrConfig, DeformableDetrFeatureExtractor, DeformableDetrForObjectDetection
+from transformers import DeformableDetrConfig, DeformableDetrForObjectDetection, DeformableDetrImageProcessor
 from transformers.utils import logging


@ -115,12 +115,12 @@ def convert_deformable_detr_checkpoint(
    config.id2label = id2label
    config.label2id = {v: k for k, v in id2label.items()}

-    # load feature extractor
-    feature_extractor = DeformableDetrFeatureExtractor(format="coco_detection")
+    # load image processor
+    image_processor = DeformableDetrImageProcessor(format="coco_detection")

    # prepare image
    img = prepare_img()
-    encoding = feature_extractor(images=img, return_tensors="pt")
+    encoding = image_processor(images=img, return_tensors="pt")
    pixel_values = encoding["pixel_values"]

    logger.info("Converting model...")
@ -185,11 +185,11 @@ def convert_deformable_detr_checkpoint(

    print("Everything ok!")

-    # Save model and feature extractor
-    logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
+    # Save model and image processor
+    logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    model.save_pretrained(pytorch_dump_folder_path)
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    # Push to hub
    if push_to_hub:
--- a/src/transformers/models/deit/convert_deit_timm_to_pytorch.py
+++ b/src/transformers/models/deit/convert_deit_timm_to_pytorch.py
@ -25,7 +25,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import DeiTConfig, DeiTFeatureExtractor, DeiTForImageClassificationWithTeacher
+from transformers import DeiTConfig, DeiTForImageClassificationWithTeacher, DeiTImageProcessor
 from transformers.utils import logging


@ -182,12 +182,12 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
    model = DeiTForImageClassificationWithTeacher(config).eval()
    model.load_state_dict(state_dict)

-    # Check outputs on an image, prepared by DeiTFeatureExtractor
+    # Check outputs on an image, prepared by DeiTImageProcessor
    size = int(
        (256 / 224) * config.image_size
    )  # to maintain same ratio w.r.t. 224 images, see https://github.com/facebookresearch/deit/blob/ab5715372db8c6cad5740714b2216d55aeae052e/datasets.py#L103
-    feature_extractor = DeiTFeatureExtractor(size=size, crop_size=config.image_size)
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+    image_processor = DeiTImageProcessor(size=size, crop_size=config.image_size)
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    pixel_values = encoding["pixel_values"]
    outputs = model(pixel_values)

@ -198,8 +198,8 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {deit_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/detr/convert_detr_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/detr/convert_detr_original_pytorch_checkpoint_to_pytorch.py
@ -25,7 +25,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import DetrConfig, DetrFeatureExtractor, DetrForObjectDetection, DetrForSegmentation
+from transformers import DetrConfig, DetrForObjectDetection, DetrForSegmentation, DetrImageProcessor
 from transformers.utils import logging


@ -201,13 +201,13 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
        config.id2label = id2label
        config.label2id = {v: k for k, v in id2label.items()}

-    # load feature extractor
+    # load image processor
    format = "coco_panoptic" if is_panoptic else "coco_detection"
-    feature_extractor = DetrFeatureExtractor(format=format)
+    image_processor = DetrImageProcessor(format=format)

    # prepare image
    img = prepare_img()
-    encoding = feature_extractor(images=img, return_tensors="pt")
+    encoding = image_processor(images=img, return_tensors="pt")
    pixel_values = encoding["pixel_values"]

    logger.info(f"Converting model {model_name}...")
@ -258,11 +258,11 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
    if is_panoptic:
        assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)

-    # Save model and feature extractor
-    logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
+    # Save model and image processor
+    logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    model.save_pretrained(pytorch_dump_folder_path)
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/detr/image_processing_detr.py
+++ b/src/transformers/models/detr/image_processing_detr.py
@ -1341,8 +1341,7 @@ class DetrImageProcessor(BaseImageProcessor):

        Args:
            results (`List[Dict]`):
-                Results list obtained by [`~DetrFeatureExtractor.post_process`], to which "masks" results will be
-                added.
+                Results list obtained by [`~DetrImageProcessor.post_process`], to which "masks" results will be added.
            outputs ([`DetrSegmentationOutput`]):
                Raw outputs of the model.
            orig_target_sizes (`torch.Tensor` of shape `(batch_size, 2)`):
--- a/src/transformers/models/dit/convert_dit_unilm_to_pytorch.py
+++ b/src/transformers/models/dit/convert_dit_unilm_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import BeitConfig, BeitFeatureExtractor, BeitForImageClassification, BeitForMaskedImageModeling
+from transformers import BeitConfig, BeitForImageClassification, BeitForMaskedImageModeling, BeitImageProcessor
 from transformers.image_utils import PILImageResampling
 from transformers.utils import logging

@ -171,12 +171,12 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
    model.load_state_dict(state_dict)

    # Check outputs on an image
-    feature_extractor = BeitFeatureExtractor(
+    image_processor = BeitImageProcessor(
        size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
    )
    image = prepare_img()

-    encoding = feature_extractor(images=image, return_tensors="pt")
+    encoding = image_processor(images=image, return_tensors="pt")
    pixel_values = encoding["pixel_values"]

    outputs = model(pixel_values)
@ -189,18 +189,18 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        if has_lm_head:
            model_name = "dit-base" if "base" in checkpoint_url else "dit-large"
        else:
            model_name = "dit-base-finetuned-rvlcdip" if "dit-b" in checkpoint_url else "dit-large-finetuned-rvlcdip"
-        feature_extractor.push_to_hub(
+        image_processor.push_to_hub(
            repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
            organization="nielsr",
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            use_temp_dir=True,
        )
        model.push_to_hub(
--- a/src/transformers/models/donut/convert_donut_to_pytorch.py
+++ b/src/transformers/models/donut/convert_donut_to_pytorch.py
@ -21,7 +21,7 @@ from datasets import load_dataset
 from donut import DonutModel

 from transformers import (
-    DonutFeatureExtractor,
+    DonutImageProcessor,
    DonutProcessor,
    DonutSwinConfig,
    DonutSwinModel,
@ -152,10 +152,10 @@ def convert_donut_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
    image = dataset["test"][0]["image"].convert("RGB")

    tokenizer = XLMRobertaTokenizerFast.from_pretrained(model_name, from_slow=True)
-    feature_extractor = DonutFeatureExtractor(
+    image_processor = DonutImageProcessor(
        do_align_long_axis=original_model.config.align_long_axis, size=original_model.config.input_size[::-1]
    )
-    processor = DonutProcessor(feature_extractor, tokenizer)
+    processor = DonutProcessor(image_processor, tokenizer)
    pixel_values = processor(image, return_tensors="pt").pixel_values

    if model_name == "naver-clova-ix/donut-base-finetuned-docvqa":
--- a/src/transformers/models/dpt/convert_dpt_hybrid_to_pytorch.py
+++ b/src/transformers/models/dpt/convert_dpt_hybrid_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import cached_download, hf_hub_url
 from PIL import Image

-from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
+from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
 from transformers.utils import logging


@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub

    # Check outputs on an image
    size = 480 if "ade" in checkpoint_url else 384
-    feature_extractor = DPTFeatureExtractor(size=size)
+    image_processor = DPTImageProcessor(size=size)

    image = prepare_img()
-    encoding = feature_extractor(image, return_tensors="pt")
+    encoding = image_processor(image, return_tensors="pt")

    # forward pass
    outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
@ -271,12 +271,12 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
        Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
        print(f"Saving model to {pytorch_dump_folder_path}")
        model.save_pretrained(pytorch_dump_folder_path)
-        print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-        feature_extractor.save_pretrained(pytorch_dump_folder_path)
+        print(f"Saving image processor to {pytorch_dump_folder_path}")
+        image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        model.push_to_hub("ybelkada/dpt-hybrid-midas")
-        feature_extractor.push_to_hub("ybelkada/dpt-hybrid-midas")
+        image_processor.push_to_hub("ybelkada/dpt-hybrid-midas")


 if __name__ == "__main__":
--- a/src/transformers/models/dpt/convert_dpt_to_pytorch.py
+++ b/src/transformers/models/dpt/convert_dpt_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import cached_download, hf_hub_url
 from PIL import Image

-from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
+from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
 from transformers.utils import logging


@ -211,10 +211,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub

    # Check outputs on an image
    size = 480 if "ade" in checkpoint_url else 384
-    feature_extractor = DPTFeatureExtractor(size=size)
+    image_processor = DPTImageProcessor(size=size)

    image = prepare_img()
-    encoding = feature_extractor(image, return_tensors="pt")
+    encoding = image_processor(image, return_tensors="pt")

    # forward pass
    outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
@ -233,8 +233,8 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        print("Pushing model to hub...")
@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
            commit_message="Add model",
            use_temp_dir=True,
        )
-        feature_extractor.push_to_hub(
+        image_processor.push_to_hub(
            repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
            organization="nielsr",
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            use_temp_dir=True,
        )

--- a/src/transformers/models/efficientformer/convert_efficientformer_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/efficientformer/convert_efficientformer_original_pytorch_checkpoint_to_pytorch.py
@ -208,7 +208,7 @@ def convert_efficientformer_checkpoint(
        )
        processor.push_to_hub(
            repo_id=f"Bearnardd/{pytorch_dump_path}",
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            use_temp_dir=True,
        )

@ -234,12 +234,12 @@ if __name__ == "__main__":
        "--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
    )

-    parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
+    parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
    parser.add_argument(
        "--no-push_to_hub",
        dest="push_to_hub",
        action="store_false",
-        help="Do not push model and feature extractor to the hub",
+        help="Do not push model and image processor to the hub",
    )
    parser.set_defaults(push_to_hub=True)

--- a/src/transformers/models/efficientformer/modeling_efficientformer.py
+++ b/src/transformers/models/efficientformer/modeling_efficientformer.py
@ -537,8 +537,8 @@ EFFICIENTFORMER_START_DOCSTRING = r"""
 EFFICIENTFORMER_INPUTS_DOCSTRING = r"""
    Args:
        pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
-            Pixel values. Pixel values can be obtained using [`ViTFeatureExtractor`]. See
-            [`ViTFeatureExtractor.__call__`] for details.
+            Pixel values. Pixel values can be obtained using [`ViTImageProcessor`]. See
+            [`ViTImageProcessor.preprocess`] for details.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
            tensors for more detail.
--- a/src/transformers/models/efficientnet/convert_efficientnet_to_pytorch.py
+++ b/src/transformers/models/efficientnet/convert_efficientnet_to_pytorch.py
@ -305,12 +305,12 @@ def convert_efficientnet_checkpoint(model_name, pytorch_dump_folder_path, save_m
        # Create folder to save model
        if not os.path.isdir(pytorch_dump_folder_path):
            os.mkdir(pytorch_dump_folder_path)
-        # Save converted model and feature extractor
+        # Save converted model and image processor
        hf_model.save_pretrained(pytorch_dump_folder_path)
        preprocessor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
-        # Push model and feature extractor to hub
+        # Push model and image processor to hub
        print(f"Pushing converted {model_name} to the hub...")
        model_name = f"efficientnet-{model_name}"
        preprocessor.push_to_hub(model_name)
@ -333,7 +333,7 @@ if __name__ == "__main__":
        help="Path to the output PyTorch model directory.",
    )
    parser.add_argument("--save_model", action="store_true", help="Save model to local")
-    parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
+    parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")

    args = parser.parse_args()
    convert_efficientnet_checkpoint(args.model_name, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
--- a/src/transformers/models/glpn/convert_glpn_to_pytorch.py
+++ b/src/transformers/models/glpn/convert_glpn_to_pytorch.py
@ -23,7 +23,7 @@ import requests
 import torch
 from PIL import Image

-from transformers import GLPNConfig, GLPNFeatureExtractor, GLPNForDepthEstimation
+from transformers import GLPNConfig, GLPNForDepthEstimation, GLPNImageProcessor
 from transformers.utils import logging


@ -131,12 +131,12 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h
    # load GLPN configuration (Segformer-B4 size)
    config = GLPNConfig(hidden_sizes=[64, 128, 320, 512], decoder_hidden_size=64, depths=[3, 8, 27, 3])

-    # load feature extractor (only resize + rescale)
-    feature_extractor = GLPNFeatureExtractor()
+    # load image processor (only resize + rescale)
+    image_processor = GLPNImageProcessor()

    # prepare image
    image = prepare_img()
-    pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
+    pixel_values = image_processor(images=image, return_tensors="pt").pixel_values

    logger.info("Converting model...")

@ -179,17 +179,17 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h

    # finally, push to hub if required
    if push_to_hub:
-        logger.info("Pushing model and feature extractor to the hub...")
+        logger.info("Pushing model and image processor to the hub...")
        model.push_to_hub(
            repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
            organization="nielsr",
            commit_message="Add model",
            use_temp_dir=True,
        )
-        feature_extractor.push_to_hub(
+        image_processor.push_to_hub(
            repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
            organization="nielsr",
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            use_temp_dir=True,
        )

--- a/src/transformers/models/groupvit/configuration_groupvit.py
+++ b/src/transformers/models/groupvit/configuration_groupvit.py
@ -458,7 +458,7 @@ class GroupViTOnnxConfig(OnnxConfig):
            processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
        )
        image_input_dict = super().generate_dummy_inputs(
-            processor.feature_extractor, batch_size=batch_size, framework=framework
+            processor.image_processor, batch_size=batch_size, framework=framework
        )
        return {**text_input_dict, **image_input_dict}

--- a/src/transformers/models/imagegpt/image_processing_imagegpt.py
+++ b/src/transformers/models/imagegpt/image_processing_imagegpt.py
@ -81,7 +81,7 @@ class ImageGPTImageProcessor(BaseImageProcessor):

    def __init__(
        self,
-        # clusters is a first argument to maintain backwards compatibility with the old ImageGPTFeatureExtractor
+        # clusters is a first argument to maintain backwards compatibility with the old ImageGPTImageProcessor
        clusters: Optional[Union[List[List[int]], np.ndarray]] = None,
        do_resize: bool = True,
        size: Dict[str, int] = None,
--- a/src/transformers/models/layoutlmv3/configuration_layoutlmv3.py
+++ b/src/transformers/models/layoutlmv3/configuration_layoutlmv3.py
@ -260,7 +260,7 @@ class LayoutLMv3OnnxConfig(OnnxConfig):
        """

        # A dummy image is used so OCR should not be applied
-        setattr(processor.feature_extractor, "apply_ocr", False)
+        setattr(processor.image_processor, "apply_ocr", False)

        # If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
        batch_size = compute_effective_axis_dimension(
--- a/src/transformers/models/layoutxlm/processing_layoutxlm.py
+++ b/src/transformers/models/layoutxlm/processing_layoutxlm.py
@ -15,6 +15,7 @@
 """
 Processor class for LayoutXLM.
 """
+import warnings
 from typing import List, Optional, Union

 from ...processing_utils import ProcessorMixin
@ -24,26 +25,45 @@ from ...utils import TensorType

 class LayoutXLMProcessor(ProcessorMixin):
    r"""
-    Constructs a LayoutXLM processor which combines a LayoutXLM feature extractor and a LayoutXLM tokenizer into a
-    single processor.
+    Constructs a LayoutXLM processor which combines a LayoutXLM image processor and a LayoutXLM tokenizer into a single
+    processor.

    [`LayoutXLMProcessor`] offers all the functionalities you need to prepare data for the model.

-    It first uses [`LayoutLMv2FeatureExtractor`] to resize document images to a fixed size, and optionally applies OCR
-    to get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
+    It first uses [`LayoutLMv2ImageProcessor`] to resize document images to a fixed size, and optionally applies OCR to
+    get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
    [`LayoutXLMTokenizerFast`], which turns the words and bounding boxes into token-level `input_ids`,
    `attention_mask`, `token_type_ids`, `bbox`. Optionally, one can provide integer `word_labels`, which are turned
    into token-level `labels` for token classification tasks (such as FUNSD, CORD).

    Args:
-        feature_extractor (`LayoutLMv2FeatureExtractor`):
-            An instance of [`LayoutLMv2FeatureExtractor`]. The feature extractor is a required input.
+        image_processor (`LayoutLMv2ImageProcessor`):
+            An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input.
        tokenizer (`LayoutXLMTokenizer` or `LayoutXLMTokenizerFast`):
            An instance of [`LayoutXLMTokenizer`] or [`LayoutXLMTokenizerFast`]. The tokenizer is a required input.
    """
-    feature_extractor_class = "LayoutLMv2FeatureExtractor"
+
+    attributes = ["image_processor", "tokenizer"]
+    image_processor_class = "LayoutLMv2ImageProcessor"
    tokenizer_class = ("LayoutXLMTokenizer", "LayoutXLMTokenizerFast")

+    def __init__(self, image_processor=None, tokenizer=None, **kwargs):
+        if "feature_extractor" in kwargs:
+            warnings.warn(
+                "The `feature_extractor` argument is deprecated and will be removed in v5, use `image_processor`"
+                " instead.",
+                FutureWarning,
+            )
+            feature_extractor = kwargs.pop("feature_extractor")
+
+        image_processor = image_processor if image_processor is not None else feature_extractor
+        if image_processor is None:
+            raise ValueError("You need to specify an `image_processor`.")
+        if tokenizer is None:
+            raise ValueError("You need to specify a `tokenizer`.")
+
+        super().__init__(image_processor, tokenizer)
+
    def __call__(
        self,
        images,
@ -68,37 +88,37 @@ class LayoutXLMProcessor(ProcessorMixin):
        **kwargs,
    ) -> BatchEncoding:
        """
-        This method first forwards the `images` argument to [`~LayoutLMv2FeatureExtractor.__call__`]. In case
-        [`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
+        This method first forwards the `images` argument to [`~LayoutLMv2ImagePrpcessor.__call__`]. In case
+        [`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
        bounding boxes along with the additional arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output,
-        together with resized `images`. In case [`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to
+        together with resized `images`. In case [`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to
        `False`, it passes the words (`text`/``text_pair`) and `boxes` specified by the user along with the additional
        arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output, together with resized `images``.

        Please refer to the docstring of the above two methods for more information.
        """
        # verify input
-        if self.feature_extractor.apply_ocr and (boxes is not None):
+        if self.image_processor.apply_ocr and (boxes is not None):
            raise ValueError(
                "You cannot provide bounding boxes "
-                "if you initialized the feature extractor with apply_ocr set to True."
+                "if you initialized the image processor with apply_ocr set to True."
            )

-        if self.feature_extractor.apply_ocr and (word_labels is not None):
+        if self.image_processor.apply_ocr and (word_labels is not None):
            raise ValueError(
-                "You cannot provide word labels if you initialized the feature extractor with apply_ocr set to True."
+                "You cannot provide word labels if you initialized the image processor with apply_ocr set to True."
            )

        if return_overflowing_tokens is True and return_offsets_mapping is False:
            raise ValueError("You cannot return overflowing tokens without returning the offsets mapping.")

-        # first, apply the feature extractor
-        features = self.feature_extractor(images=images, return_tensors=return_tensors)
+        # first, apply the image processor
+        features = self.image_processor(images=images, return_tensors=return_tensors)

        # second, apply the tokenizer
-        if text is not None and self.feature_extractor.apply_ocr and text_pair is None:
+        if text is not None and self.image_processor.apply_ocr and text_pair is None:
            if isinstance(text, str):
-                text = [text]  # add batch dimension (as the feature extractor always adds a batch dimension)
+                text = [text]  # add batch dimension (as the image processor always adds a batch dimension)
            text_pair = features["words"]

        encoded_inputs = self.tokenizer(
@ -162,3 +182,19 @@ class LayoutXLMProcessor(ProcessorMixin):
    @property
    def model_input_names(self):
        return ["input_ids", "bbox", "attention_mask", "image"]
+
+    @property
+    def feature_extractor_class(self):
+        warnings.warn(
+            "`feature_extractor_class` is deprecated and will be removed in v5. Use `image_processor_class` instead.",
+            FutureWarning,
+        )
+        return self.image_processor_class
+
+    @property
+    def feature_extractor(self):
+        warnings.warn(
+            "`feature_extractor` is deprecated and will be removed in v5. Use `image_processor` instead.",
+            FutureWarning,
+        )
+        return self.image_processor
--- a/src/transformers/models/levit/convert_levit_timm_to_pytorch.py
+++ b/src/transformers/models/levit/convert_levit_timm_to_pytorch.py
@ -25,7 +25,7 @@ import timm
 import torch
 from huggingface_hub import hf_hub_download

-from transformers import LevitConfig, LevitFeatureExtractor, LevitForImageClassificationWithTeacher
+from transformers import LevitConfig, LevitForImageClassificationWithTeacher, LevitImageProcessor
 from transformers.utils import logging


@ -74,8 +74,8 @@ def convert_weight_and_push(

    if push_to_hub:
        our_model.save_pretrained(save_directory / checkpoint_name)
-        feature_extractor = LevitFeatureExtractor()
-        feature_extractor.save_pretrained(save_directory / checkpoint_name)
+        image_processor = LevitImageProcessor()
+        image_processor.save_pretrained(save_directory / checkpoint_name)

        print(f"Pushed {checkpoint_name}")

@ -167,12 +167,12 @@ if __name__ == "__main__":
        required=False,
        help="Path to the output PyTorch model directory.",
    )
-    parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
+    parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
    parser.add_argument(
        "--no-push_to_hub",
        dest="push_to_hub",
        action="store_false",
-        help="Do not push model and feature extractor to the hub",
+        help="Do not push model and image processor to the hub",
    )

    args = parser.parse_args()
--- a/src/transformers/models/mask2former/convert_mask2former_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/mask2former/convert_mask2former_original_pytorch_checkpoint_to_pytorch.py
@ -192,7 +192,7 @@ class OriginalMask2FormerConfigToOursConverter:
        return config


-class OriginalMask2FormerConfigToFeatureExtractorConverter:
+class OriginalMask2FormerConfigToImageProcessorConverter:
    def __call__(self, original_config: object) -> Mask2FormerImageProcessor:
        model = original_config.MODEL
        model_input = original_config.INPUT
@ -846,7 +846,7 @@ class OriginalMask2FormerCheckpointToOursConverter:
 def test(
    original_model,
    our_model: Mask2FormerForUniversalSegmentation,
-    feature_extractor: Mask2FormerImageProcessor,
+    image_processor: Mask2FormerImageProcessor,
    tolerance: float,
 ):
    with torch.no_grad():
@ -854,7 +854,7 @@ def test(
        our_model = our_model.eval()

        im = prepare_img()
-        x = feature_extractor(images=im, return_tensors="pt")["pixel_values"]
+        x = image_processor(images=im, return_tensors="pt")["pixel_values"]

        original_model_backbone_features = original_model.backbone(x.clone())
        our_model_output: Mask2FormerModelOutput = our_model.model(x.clone(), output_hidden_states=True)
@ -979,10 +979,10 @@ if __name__ == "__main__":
        checkpoints_dir, config_dir
    ):
        model_name = get_model_name(checkpoint_file)
-        feature_extractor = OriginalMask2FormerConfigToFeatureExtractorConverter()(
+        image_processor = OriginalMask2FormerConfigToImageProcessorConverter()(
            setup_cfg(Args(config_file=config_file))
        )
-        feature_extractor.size = {"height": 384, "width": 384}
+        image_processor.size = {"height": 384, "width": 384}

        original_config = setup_cfg(Args(config_file=config_file))
        mask2former_kwargs = OriginalMask2Former.from_config(original_config)
@ -1012,8 +1012,8 @@ if __name__ == "__main__":
            tolerance = 3e-1

        logger.info(f"🪄 Testing {model_name}...")
-        test(original_model, mask2former_for_segmentation, feature_extractor, tolerance)
+        test(original_model, mask2former_for_segmentation, image_processor, tolerance)
        logger.info(f"🪄 Pushing {model_name} to hub...")

-        feature_extractor.push_to_hub(model_name)
+        image_processor.push_to_hub(model_name)
        mask2former_for_segmentation.push_to_hub(model_name)
--- a/src/transformers/models/mask2former/modeling_mask2former.py
+++ b/src/transformers/models/mask2former/modeling_mask2former.py
@ -2106,8 +2106,8 @@ MASK2FORMER_START_DOCSTRING = r"""
 MASK2FORMER_INPUTS_DOCSTRING = r"""
    Args:
        pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
-            Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
-            [`AutoFeatureExtractor.__call__`] for details.
+            Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
+            [`AutoImageProcessor.preprocess`] for details.
        pixel_mask (`torch.LongTensor` of shape `(batch_size, height, width)`, *optional*):
            Mask to avoid performing attention on padding pixel values. Mask values selected in `[0, 1]`:

--- a/src/transformers/models/maskformer/convert_maskformer_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/maskformer/convert_maskformer_original_pytorch_checkpoint_to_pytorch.py
@ -29,7 +29,7 @@ from detectron2.projects.deeplab import add_deeplab_config
 from PIL import Image
 from torch import Tensor, nn

-from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerFeatureExtractor
+from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerImageProcessor
 from transformers.models.maskformer.modeling_maskformer import (
    MaskFormerConfig,
    MaskFormerForInstanceSegmentation,
@ -164,13 +164,13 @@ class OriginalMaskFormerConfigToOursConverter:
        return config


-class OriginalMaskFormerConfigToFeatureExtractorConverter:
-    def __call__(self, original_config: object) -> MaskFormerFeatureExtractor:
+class OriginalMaskFormerConfigToImageProcessorConverter:
+    def __call__(self, original_config: object) -> MaskFormerImageProcessor:
        model = original_config.MODEL
        model_input = original_config.INPUT
        dataset_catalog = MetadataCatalog.get(original_config.DATASETS.TEST[0])

-        return MaskFormerFeatureExtractor(
+        return MaskFormerImageProcessor(
            image_mean=(torch.tensor(model.PIXEL_MEAN) / 255).tolist(),
            image_std=(torch.tensor(model.PIXEL_STD) / 255).tolist(),
            size=model_input.MIN_SIZE_TEST,
@ -554,7 +554,7 @@ class OriginalMaskFormerCheckpointToOursConverter:
            yield config, checkpoint


-def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_extractor: MaskFormerFeatureExtractor):
+def test(original_model, our_model: MaskFormerForInstanceSegmentation, image_processor: MaskFormerImageProcessor):
    with torch.no_grad():
        original_model = original_model.eval()
        our_model = our_model.eval()
@ -600,7 +600,7 @@ def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_e

        our_model_out: MaskFormerForInstanceSegmentationOutput = our_model(x)

-        our_segmentation = feature_extractor.post_process_segmentation(our_model_out, target_size=(384, 384))
+        our_segmentation = image_processor.post_process_segmentation(our_model_out, target_size=(384, 384))

        assert torch.allclose(
            original_segmentation, our_segmentation, atol=1e-3
@ -686,9 +686,7 @@ if __name__ == "__main__":
    for config_file, checkpoint_file in OriginalMaskFormerCheckpointToOursConverter.using_dirs(
        checkpoints_dir, config_dir
    ):
-        feature_extractor = OriginalMaskFormerConfigToFeatureExtractorConverter()(
-            setup_cfg(Args(config_file=config_file))
-        )
+        image_processor = OriginalMaskFormerConfigToImageProcessorConverter()(setup_cfg(Args(config_file=config_file)))

        original_config = setup_cfg(Args(config_file=config_file))
        mask_former_kwargs = OriginalMaskFormer.from_config(original_config)
@ -712,15 +710,15 @@ if __name__ == "__main__":
            mask_former_for_instance_segmentation
        )

-        test(original_model, mask_former_for_instance_segmentation, feature_extractor)
+        test(original_model, mask_former_for_instance_segmentation, image_processor)

        model_name = get_name(checkpoint_file)
        logger.info(f"🪄 Saving {model_name}")

-        feature_extractor.save_pretrained(save_directory / model_name)
+        image_processor.save_pretrained(save_directory / model_name)
        mask_former_for_instance_segmentation.save_pretrained(save_directory / model_name)

-        feature_extractor.push_to_hub(
+        image_processor.push_to_hub(
            repo_path_or_name=save_directory / model_name,
            commit_message="Add model",
            use_temp_dir=True,
--- a/src/transformers/models/maskformer/convert_maskformer_resnet_to_pytorch.py
+++ b/src/transformers/models/maskformer/convert_maskformer_resnet_to_pytorch.py
@ -26,7 +26,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, ResNetConfig
+from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, ResNetConfig
 from transformers.utils import logging


@ -297,9 +297,9 @@ def convert_maskformer_checkpoint(
    else:
        ignore_index = 255
    reduce_labels = True if "ade" in model_name else False
-    feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
+    image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)

-    inputs = feature_extractor(image, return_tensors="pt")
+    inputs = image_processor(image, return_tensors="pt")

    outputs = model(**inputs)

@ -340,15 +340,15 @@ def convert_maskformer_checkpoint(
    print("Looks ok!")

    if pytorch_dump_folder_path is not None:
-        print(f"Saving model and feature extractor of {model_name} to {pytorch_dump_folder_path}")
+        print(f"Saving model and image processor of {model_name} to {pytorch_dump_folder_path}")
        Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
        model.save_pretrained(pytorch_dump_folder_path)
-        feature_extractor.save_pretrained(pytorch_dump_folder_path)
+        image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
-        print(f"Pushing model and feature extractor of {model_name} to the hub...")
+        print(f"Pushing model and image processor of {model_name} to the hub...")
        model.push_to_hub(f"facebook/{model_name}")
-        feature_extractor.push_to_hub(f"facebook/{model_name}")
+        image_processor.push_to_hub(f"facebook/{model_name}")


 if __name__ == "__main__":
--- a/src/transformers/models/maskformer/convert_maskformer_swin_to_pytorch.py
+++ b/src/transformers/models/maskformer/convert_maskformer_swin_to_pytorch.py
@ -26,7 +26,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, SwinConfig
+from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, SwinConfig
 from transformers.utils import logging


@ -278,9 +278,9 @@ def convert_maskformer_checkpoint(
    else:
        ignore_index = 255
    reduce_labels = True if "ade" in model_name else False
-    feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
+    image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)

-    inputs = feature_extractor(image, return_tensors="pt")
+    inputs = image_processor(image, return_tensors="pt")

    outputs = model(**inputs)

@ -294,15 +294,15 @@ def convert_maskformer_checkpoint(
    print("Looks ok!")

    if pytorch_dump_folder_path is not None:
-        print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
+        print(f"Saving model and image processor to {pytorch_dump_folder_path}")
        Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
        model.save_pretrained(pytorch_dump_folder_path)
-        feature_extractor.save_pretrained(pytorch_dump_folder_path)
+        image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
-        print("Pushing model and feature extractor to the hub...")
+        print("Pushing model and image processor to the hub...")
        model.push_to_hub(f"nielsr/{model_name}")
-        feature_extractor.push_to_hub(f"nielsr/{model_name}")
+        image_processor.push_to_hub(f"nielsr/{model_name}")


 if __name__ == "__main__":
--- a/src/transformers/models/mobilenet_v1/convert_original_tf_checkpoint_to_pytorch.py
+++ b/src/transformers/models/mobilenet_v1/convert_original_tf_checkpoint_to_pytorch.py
@ -27,8 +27,8 @@ from PIL import Image

 from transformers import (
    MobileNetV1Config,
-    MobileNetV1FeatureExtractor,
    MobileNetV1ForImageClassification,
+    MobileNetV1ImageProcessor,
    load_tf_weights_in_mobilenet_v1,
 )
 from transformers.utils import logging
@ -83,12 +83,12 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
    # Load weights from TensorFlow checkpoint
    load_tf_weights_in_mobilenet_v1(model, config, checkpoint_path)

-    # Check outputs on an image, prepared by MobileNetV1FeatureExtractor
-    feature_extractor = MobileNetV1FeatureExtractor(
+    # Check outputs on an image, prepared by MobileNetV1ImageProcessor
+    image_processor = MobileNetV1ImageProcessor(
        crop_size={"width": config.image_size, "height": config.image_size},
        size={"shortest_edge": config.image_size + 32},
    )
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    outputs = model(**encoding)
    logits = outputs.logits

@ -107,13 +107,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        print("Pushing to the hub...")
        repo_id = "google/" + model_name
-        feature_extractor.push_to_hub(repo_id)
+        image_processor.push_to_hub(repo_id)
        model.push_to_hub(repo_id)


--- a/src/transformers/models/mobilenet_v2/convert_original_tf_checkpoint_to_pytorch.py
+++ b/src/transformers/models/mobilenet_v2/convert_original_tf_checkpoint_to_pytorch.py
@ -99,11 +99,11 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
    load_tf_weights_in_mobilenet_v2(model, config, checkpoint_path)

    # Check outputs on an image, prepared by MobileNetV2ImageProcessor
-    feature_extractor = MobileNetV2ImageProcessor(
+    image_processor = MobileNetV2ImageProcessor(
        crop_size={"width": config.image_size, "height": config.image_size},
        size={"shortest_edge": config.image_size + 32},
    )
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    outputs = model(**encoding)
    logits = outputs.logits

@ -143,13 +143,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        print("Pushing to the hub...")
        repo_id = "google/" + model_name
-        feature_extractor.push_to_hub(repo_id)
+        image_processor.push_to_hub(repo_id)
        model.push_to_hub(repo_id)


--- a/src/transformers/models/mobilevit/convert_mlcvnets_to_pytorch.py
+++ b/src/transformers/models/mobilevit/convert_mlcvnets_to_pytorch.py
@ -26,9 +26,9 @@ from PIL import Image

 from transformers import (
    MobileViTConfig,
-    MobileViTFeatureExtractor,
    MobileViTForImageClassification,
    MobileViTForSemanticSegmentation,
+    MobileViTImageProcessor,
 )
 from transformers.utils import logging

@ -211,9 +211,9 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
    new_state_dict = convert_state_dict(state_dict, model)
    model.load_state_dict(new_state_dict)

-    # Check outputs on an image, prepared by MobileViTFeatureExtractor
-    feature_extractor = MobileViTFeatureExtractor(crop_size=config.image_size, size=config.image_size + 32)
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+    # Check outputs on an image, prepared by MobileViTImageProcessor
+    image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    outputs = model(**encoding)
    logits = outputs.logits

@ -265,8 +265,8 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {mobilevit_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        model_mapping = {
@ -280,7 +280,7 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f

        print("Pushing to the hub...")
        model_name = model_mapping[mobilevit_name]
-        feature_extractor.push_to_hub(model_name, organization="apple")
+        image_processor.push_to_hub(model_name, organization="apple")
        model.push_to_hub(model_name, organization="apple")


--- a/src/transformers/models/mobilevitv2/convert_mlcvnets_to_pytorch.py
+++ b/src/transformers/models/mobilevitv2/convert_mlcvnets_to_pytorch.py
@ -259,8 +259,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
    model.load_state_dict(state_dict)

    # Check outputs on an image, prepared by MobileViTImageProcessor
-    feature_extractor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+    image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    outputs = model(**encoding)

    # verify classification model
@ -276,8 +276,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {task_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/owlvit/configuration_owlvit.py
+++ b/src/transformers/models/owlvit/configuration_owlvit.py
@ -383,7 +383,7 @@ class OwlViTOnnxConfig(OnnxConfig):
            processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
        )
        image_input_dict = super().generate_dummy_inputs(
-            processor.feature_extractor, batch_size=batch_size, framework=framework
+            processor.image_processor, batch_size=batch_size, framework=framework
        )
        return {**text_input_dict, **image_input_dict}

--- a/src/transformers/models/owlvit/convert_owlvit_original_flax_to_hf.py
+++ b/src/transformers/models/owlvit/convert_owlvit_original_flax_to_hf.py
@ -29,8 +29,8 @@ from huggingface_hub import Repository
 from transformers import (
    CLIPTokenizer,
    OwlViTConfig,
-    OwlViTFeatureExtractor,
    OwlViTForObjectDetection,
+    OwlViTImageProcessor,
    OwlViTModel,
    OwlViTProcessor,
 )
@ -350,16 +350,16 @@ def convert_owlvit_checkpoint(pt_backbone, flax_params, attn_params, pytorch_dum
    # Save HF model
    hf_model.save_pretrained(repo.local_dir)

-    # Initialize feature extractor
-    feature_extractor = OwlViTFeatureExtractor(
+    # Initialize image processor
+    image_processor = OwlViTImageProcessor(
        size=config.vision_config.image_size, crop_size=config.vision_config.image_size
    )
    # Initialize tokenizer
    tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32", pad_token="!", model_max_length=16)

    # Initialize processor
-    processor = OwlViTProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
-    feature_extractor.save_pretrained(repo.local_dir)
+    processor = OwlViTProcessor(image_processor=image_processor, tokenizer=tokenizer)
+    image_processor.save_pretrained(repo.local_dir)
    processor.save_pretrained(repo.local_dir)

    repo.git_add()
--- a/src/transformers/models/perceiver/convert_perceiver_haiku_to_pytorch.py
+++ b/src/transformers/models/perceiver/convert_perceiver_haiku_to_pytorch.py
@ -29,13 +29,13 @@ from PIL import Image

 from transformers import (
    PerceiverConfig,
-    PerceiverFeatureExtractor,
    PerceiverForImageClassificationConvProcessing,
    PerceiverForImageClassificationFourier,
    PerceiverForImageClassificationLearned,
    PerceiverForMaskedLM,
    PerceiverForMultimodalAutoencoding,
    PerceiverForOpticalFlow,
+    PerceiverImageProcessor,
    PerceiverTokenizer,
 )
 from transformers.utils import logging
@ -389,9 +389,9 @@ def convert_perceiver_checkpoint(pickle_file, pytorch_dump_folder_path, architec
        inputs = encoding.input_ids
        input_mask = encoding.attention_mask
    elif architecture in ["image_classification", "image_classification_fourier", "image_classification_conv"]:
-        feature_extractor = PerceiverFeatureExtractor()
+        image_processor = PerceiverImageProcessor()
        image = prepare_img()
-        encoding = feature_extractor(image, return_tensors="pt")
+        encoding = image_processor(image, return_tensors="pt")
        inputs = encoding.pixel_values
    elif architecture == "optical_flow":
        inputs = torch.randn(1, 2, 27, 368, 496)
--- a/src/transformers/models/poolformer/convert_poolformer_original_to_pytorch.py
+++ b/src/transformers/models/poolformer/convert_poolformer_original_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import PoolFormerConfig, PoolFormerFeatureExtractor, PoolFormerForImageClassification
+from transformers import PoolFormerConfig, PoolFormerForImageClassification, PoolFormerImageProcessor
 from transformers.utils import logging


@ -141,12 +141,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
    else:
        raise ValueError(f"Size {size} not supported")

-    # load feature extractor
-    feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
+    # load image processor
+    image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)

    # Prepare image
    image = prepare_img()
-    pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
+    pixel_values = image_processor(images=image, return_tensors="pt").pixel_values

    logger.info(f"Converting model {model_name}...")

@ -161,9 +161,9 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
    model.load_state_dict(state_dict)
    model.eval()

-    # Define feature extractor
-    feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
-    pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
+    # Define image processor
+    image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)
+    pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values

    # forward pass
    outputs = model(pixel_values)
@ -187,12 +187,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
    assert logits.shape == expected_shape
    assert torch.allclose(logits[0, :3], expected_slice, atol=1e-2)

-    # finally, save model and feature extractor
-    logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
+    # finally, save model and image processor
+    logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/regnet/convert_regnet_seer_10b_to_pytorch.py
+++ b/src/transformers/models/regnet/convert_regnet_seer_10b_to_pytorch.py
@ -34,7 +34,7 @@ from huggingface_hub import cached_download, hf_hub_url
 from torch import Tensor
 from vissl.models.model_helpers import get_trunk_forward_outputs

-from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
+from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
 from transformers.modeling_utils import PreTrainedModel
 from transformers.utils import logging

@ -262,10 +262,10 @@ def convert_weights_and_push(save_directory: Path, model_name: str = None, push_
        )
        size = 384
        # we can use the convnext one
-        feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
-        feature_extractor.push_to_hub(
+        image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
+        image_processor.push_to_hub(
            repo_path_or_name=save_directory / model_name,
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            output_dir=save_directory / model_name,
        )

@ -294,7 +294,7 @@ if __name__ == "__main__":
        default=True,
        type=bool,
        required=False,
-        help="If True, push model and feature extractor to the hub.",
+        help="If True, push model and image processor to the hub.",
    )

    args = parser.parse_args()
--- a/src/transformers/models/regnet/convert_regnet_to_pytorch.py
+++ b/src/transformers/models/regnet/convert_regnet_to_pytorch.py
@ -30,7 +30,7 @@ from huggingface_hub import cached_download, hf_hub_url
 from torch import Tensor
 from vissl.models.model_helpers import get_trunk_forward_outputs

-from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
+from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
 from transformers.utils import logging


@ -209,10 +209,10 @@ def convert_weight_and_push(

        size = 224 if "seer" not in name else 384
        # we can use the convnext one
-        feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
-        feature_extractor.push_to_hub(
+        image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
+        image_processor.push_to_hub(
            repo_path_or_name=save_directory / name,
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            use_temp_dir=True,
        )

@ -449,7 +449,7 @@ if __name__ == "__main__":
        default=True,
        type=bool,
        required=False,
-        help="If True, push model and feature extractor to the hub.",
+        help="If True, push model and image processor to the hub.",
    )

    args = parser.parse_args()
--- a/src/transformers/models/resnet/convert_resnet_to_pytorch.py
+++ b/src/transformers/models/resnet/convert_resnet_to_pytorch.py
@ -28,7 +28,7 @@ import torch.nn as nn
 from huggingface_hub import hf_hub_download
 from torch import Tensor

-from transformers import AutoFeatureExtractor, ResNetConfig, ResNetForImageClassification
+from transformers import AutoImageProcessor, ResNetConfig, ResNetForImageClassification
 from transformers.utils import logging


@ -113,10 +113,10 @@ def convert_weight_and_push(name: str, config: ResNetConfig, save_directory: Pat
        )

        # we can use the convnext one
-        feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
-        feature_extractor.push_to_hub(
+        image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
+        image_processor.push_to_hub(
            repo_path_or_name=save_directory / checkpoint_name,
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            use_temp_dir=True,
        )

@ -191,7 +191,7 @@ if __name__ == "__main__":
        default=True,
        type=bool,
        required=False,
-        help="If True, push model and feature extractor to the hub.",
+        help="If True, push model and image processor to the hub.",
    )

    args = parser.parse_args()
--- a/src/transformers/models/segformer/convert_segformer_original_to_pytorch.py
+++ b/src/transformers/models/segformer/convert_segformer_original_to_pytorch.py
@ -27,9 +27,9 @@ from PIL import Image

 from transformers import (
    SegformerConfig,
-    SegformerFeatureExtractor,
    SegformerForImageClassification,
    SegformerForSemanticSegmentation,
+    SegformerImageProcessor,
 )
 from transformers.utils import logging

@ -179,14 +179,14 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
    else:
        raise ValueError(f"Size {size} not supported")

-    # load feature extractor (only resize + normalize)
-    feature_extractor = SegformerFeatureExtractor(
+    # load image processor (only resize + normalize)
+    image_processor = SegformerImageProcessor(
        image_scale=(512, 512), keep_ratio=False, align=False, do_random_crop=False
    )

    # prepare image
    image = prepare_img()
-    pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
+    pixel_values = image_processor(images=image, return_tensors="pt").pixel_values

    logger.info(f"Converting model {model_name}...")

@ -362,11 +362,11 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
        assert logits.shape == expected_shape
        assert torch.allclose(logits[0, :3, :3, :3], expected_slice, atol=1e-2)

-    # finally, save model and feature extractor
-    logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
+    # finally, save model and image processor
+    logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    model.save_pretrained(pytorch_dump_folder_path)
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/swin/convert_swin_simmim_to_pytorch.py
+++ b/src/transformers/models/swin/convert_swin_simmim_to_pytorch.py
@ -22,7 +22,7 @@ import requests
 import torch
 from PIL import Image

-from transformers import SwinConfig, SwinForMaskedImageModeling, ViTFeatureExtractor
+from transformers import SwinConfig, SwinForMaskedImageModeling, ViTImageProcessor


 def get_swin_config(model_name):
@ -132,9 +132,9 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat

    url = "http://images.cocodataset.org/val2017/000000039769.jpg"

-    feature_extractor = ViTFeatureExtractor(size={"height": 192, "width": 192})
+    image_processor = ViTImageProcessor(size={"height": 192, "width": 192})
    image = Image.open(requests.get(url, stream=True).raw)
-    inputs = feature_extractor(images=image, return_tensors="pt")
+    inputs = image_processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs).logits
@ -146,13 +146,13 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat
        print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
        model.save_pretrained(pytorch_dump_folder_path)

-        print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-        feature_extractor.save_pretrained(pytorch_dump_folder_path)
+        print(f"Saving image processor to {pytorch_dump_folder_path}")
+        image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
-        print(f"Pushing model and feature extractor for {model_name} to hub")
+        print(f"Pushing model and image processor for {model_name} to hub")
        model.push_to_hub(f"microsoft/{model_name}")
-        feature_extractor.push_to_hub(f"microsoft/{model_name}")
+        image_processor.push_to_hub(f"microsoft/{model_name}")


 if __name__ == "__main__":
--- a/src/transformers/models/swin/convert_swin_timm_to_pytorch.py
+++ b/src/transformers/models/swin/convert_swin_timm_to_pytorch.py
@ -7,7 +7,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import AutoFeatureExtractor, SwinConfig, SwinForImageClassification
+from transformers import AutoImageProcessor, SwinConfig, SwinForImageClassification


 def get_swin_config(swin_name):
@ -140,9 +140,9 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):

    url = "http://images.cocodataset.org/val2017/000000039769.jpg"

-    feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
+    image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
    image = Image.open(requests.get(url, stream=True).raw)
-    inputs = feature_extractor(images=image, return_tensors="pt")
+    inputs = image_processor(images=image, return_tensors="pt")

    timm_outs = timm_model(inputs["pixel_values"])
    hf_outs = model(**inputs).logits
@ -152,8 +152,8 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):
    print(f"Saving model {swin_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)

-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/swinv2/convert_swinv2_timm_to_pytorch.py
+++ b/src/transformers/models/swinv2/convert_swinv2_timm_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import AutoFeatureExtractor, Swinv2Config, Swinv2ForImageClassification
+from transformers import AutoImageProcessor, Swinv2Config, Swinv2ForImageClassification


 def get_swinv2_config(swinv2_name):
@ -180,9 +180,9 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):

    url = "http://images.cocodataset.org/val2017/000000039769.jpg"

-    feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
+    image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
    image = Image.open(requests.get(url, stream=True).raw)
-    inputs = feature_extractor(images=image, return_tensors="pt")
+    inputs = image_processor(images=image, return_tensors="pt")

    timm_outs = timm_model(inputs["pixel_values"])
    hf_outs = model(**inputs).logits
@ -192,8 +192,8 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):
    print(f"Saving model {swinv2_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)

-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    model.push_to_hub(
        repo_path_or_name=Path(pytorch_dump_folder_path, swinv2_name),
--- a/src/transformers/models/table_transformer/convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/table_transformer/convert_table_transformer_original_pytorch_checkpoint_to_pytorch.py
@ -27,7 +27,7 @@ from huggingface_hub import hf_hub_download
 from PIL import Image
 from torchvision.transforms import functional as F

-from transformers import DetrFeatureExtractor, TableTransformerConfig, TableTransformerForObjectDetection
+from transformers import DetrImageProcessor, TableTransformerConfig, TableTransformerForObjectDetection
 from transformers.utils import logging


@ -242,7 +242,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
        config.id2label = id2label
        config.label2id = {v: k for k, v in id2label.items()}

-    feature_extractor = DetrFeatureExtractor(
+    image_processor = DetrImageProcessor(
        format="coco_detection", max_size=800 if "detection" in checkpoint_url else 1000
    )
    model = TableTransformerForObjectDetection(config)
@ -277,11 +277,11 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
    print("Looks ok!")

    if pytorch_dump_folder_path is not None:
-        # Save model and feature extractor
-        logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
+        # Save model and image processor
+        logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
        Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
        model.save_pretrained(pytorch_dump_folder_path)
-        feature_extractor.save_pretrained(pytorch_dump_folder_path)
+        image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        # Push model to HF hub
@ -292,7 +292,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
            else "microsoft/table-transformer-structure-recognition"
        )
        model.push_to_hub(model_name)
-        feature_extractor.push_to_hub(model_name)
+        image_processor.push_to_hub(model_name)


 if __name__ == "__main__":
--- a/src/transformers/models/timesformer/convert_timesformer_to_pytorch.py
+++ b/src/transformers/models/timesformer/convert_timesformer_to_pytorch.py
@ -22,7 +22,7 @@ import numpy as np
 import torch
 from huggingface_hub import hf_hub_download

-from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEFeatureExtractor
+from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEImageProcessor


 def get_timesformer_config(model_name):
@ -156,9 +156,9 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
    model.eval()

    # verify model on basic input
-    feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
+    image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
    video = prepare_video()
-    inputs = feature_extractor(video[:8], return_tensors="pt")
+    inputs = image_processor(video[:8], return_tensors="pt")

    outputs = model(**inputs)
    logits = outputs.logits
@ -215,8 +215,8 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
    print("Logits ok!")

    if pytorch_dump_folder_path is not None:
-        print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
-        feature_extractor.save_pretrained(pytorch_dump_folder_path)
+        print(f"Saving model and image processor to {pytorch_dump_folder_path}")
+        image_processor.save_pretrained(pytorch_dump_folder_path)
        model.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
--- a/src/transformers/models/timesformer/modeling_timesformer.py
+++ b/src/transformers/models/timesformer/modeling_timesformer.py
@ -513,8 +513,8 @@ TIMESFORMER_START_DOCSTRING = r"""
 TIMESFORMER_INPUTS_DOCSTRING = r"""
    Args:
        pixel_values (`torch.FloatTensor` of shape `(batch_size, num_frames, num_channels, height, width)`):
-            Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
-            [`VideoMAEFeatureExtractor.__call__`] for details.
+            Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
+            [`VideoMAEImageProcessor.preprocess`] for details.

        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
--- a/src/transformers/models/trocr/convert_trocr_unilm_to_pytorch.py
+++ b/src/transformers/models/trocr/convert_trocr_unilm_to_pytorch.py
@ -29,7 +29,7 @@ from transformers import (
    TrOCRProcessor,
    VisionEncoderDecoderModel,
    ViTConfig,
-    ViTFeatureExtractor,
+    ViTImageProcessor,
    ViTModel,
 )
 from transformers.utils import logging
@ -182,9 +182,9 @@ def convert_tr_ocr_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    model.load_state_dict(state_dict)

    # Check outputs on an image
-    feature_extractor = ViTFeatureExtractor(size=encoder_config.image_size)
+    image_processor = ViTImageProcessor(size=encoder_config.image_size)
    tokenizer = RobertaTokenizer.from_pretrained("roberta-large")
-    processor = TrOCRProcessor(feature_extractor, tokenizer)
+    processor = TrOCRProcessor(image_processor, tokenizer)

    pixel_values = processor(images=prepare_img(checkpoint_url), return_tensors="pt").pixel_values

--- a/src/transformers/models/van/convert_van_to_pytorch.py
+++ b/src/transformers/models/van/convert_van_to_pytorch.py
@ -30,7 +30,7 @@ import torch.nn as nn
 from huggingface_hub import cached_download, hf_hub_download
 from torch import Tensor

-from transformers import AutoFeatureExtractor, VanConfig, VanForImageClassification
+from transformers import AutoImageProcessor, VanConfig, VanForImageClassification
 from transformers.models.van.modeling_van import VanLayerScaling
 from transformers.utils import logging

@ -154,10 +154,10 @@ def convert_weight_and_push(
        )

        # we can use the convnext one
-        feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
-        feature_extractor.push_to_hub(
+        image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
+        image_processor.push_to_hub(
            repo_path_or_name=save_directory / checkpoint_name,
-            commit_message="Add feature extractor",
+            commit_message="Add image processor",
            use_temp_dir=True,
        )

@ -277,7 +277,7 @@ if __name__ == "__main__":
        default=True,
        type=bool,
        required=False,
-        help="If True, push model and feature extractor to the hub.",
+        help="If True, push model and image processor to the hub.",
    )

    args = parser.parse_args()
--- a/src/transformers/models/videomae/convert_videomae_to_pytorch.py
+++ b/src/transformers/models/videomae/convert_videomae_to_pytorch.py
@ -24,9 +24,9 @@ from huggingface_hub import hf_hub_download

 from transformers import (
    VideoMAEConfig,
-    VideoMAEFeatureExtractor,
    VideoMAEForPreTraining,
    VideoMAEForVideoClassification,
+    VideoMAEImageProcessor,
 )


@ -198,9 +198,9 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
    model.eval()

    # verify model on basic input
-    feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
+    image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
    video = prepare_video()
-    inputs = feature_extractor(video, return_tensors="pt")
+    inputs = image_processor(video, return_tensors="pt")

    if "finetuned" not in model_name:
        local_path = hf_hub_download(repo_id="hf-internal-testing/bool-masked-pos", filename="bool_masked_pos.pt")
@ -288,8 +288,8 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
        print("Loss ok!")

    if pytorch_dump_folder_path is not None:
-        print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
-        feature_extractor.save_pretrained(pytorch_dump_folder_path)
+        print(f"Saving model and image processor to {pytorch_dump_folder_path}")
+        image_processor.save_pretrained(pytorch_dump_folder_path)
        model.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
--- a/src/transformers/models/vilt/convert_vilt_original_to_pytorch.py
+++ b/src/transformers/models/vilt/convert_vilt_original_to_pytorch.py
@ -27,11 +27,11 @@ from PIL import Image
 from transformers import (
    BertTokenizer,
    ViltConfig,
-    ViltFeatureExtractor,
    ViltForImageAndTextRetrieval,
    ViltForImagesAndTextClassification,
    ViltForMaskedLM,
    ViltForQuestionAnswering,
+    ViltImageProcessor,
    ViltProcessor,
 )
 from transformers.utils import logging
@ -223,9 +223,9 @@ def convert_vilt_checkpoint(checkpoint_url, pytorch_dump_folder_path):
        model.load_state_dict(state_dict)

    # Define processor
-    feature_extractor = ViltFeatureExtractor(size=384)
+    image_processor = ViltImageProcessor(size=384)
    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
-    processor = ViltProcessor(feature_extractor, tokenizer)
+    processor = ViltProcessor(image_processor, tokenizer)

    # Forward pass on example inputs (image + text)
    if nlvr_model:
--- a/src/transformers/models/vit/convert_dino_to_pytorch.py
+++ b/src/transformers/models/vit/convert_dino_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
+from transformers import ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
 from transformers.utils import logging


@ -175,9 +175,9 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
        model = ViTForImageClassification(config).eval()
    model.load_state_dict(state_dict)

-    # Check outputs on an image, prepared by ViTFeatureExtractor
-    feature_extractor = ViTFeatureExtractor()
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+    # Check outputs on an image, prepared by ViTImageProcessor
+    image_processor = ViTImageProcessor()
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    pixel_values = encoding["pixel_values"]
    outputs = model(pixel_values)

@ -192,8 +192,8 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/vit/convert_vit_timm_to_pytorch.py
+++ b/src/transformers/models/vit/convert_vit_timm_to_pytorch.py
@ -25,7 +25,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import DeiTFeatureExtractor, ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
+from transformers import DeiTImageProcessor, ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
 from transformers.utils import logging


@ -208,12 +208,12 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
        model = ViTForImageClassification(config).eval()
    model.load_state_dict(state_dict)

-    # Check outputs on an image, prepared by ViTFeatureExtractor/DeiTFeatureExtractor
+    # Check outputs on an image, prepared by ViTImageProcessor/DeiTImageProcessor
    if "deit" in vit_name:
-        feature_extractor = DeiTFeatureExtractor(size=config.image_size)
+        image_processor = DeiTImageProcessor(size=config.image_size)
    else:
-        feature_extractor = ViTFeatureExtractor(size=config.image_size)
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+        image_processor = ViTImageProcessor(size=config.image_size)
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    pixel_values = encoding["pixel_values"]
    outputs = model(pixel_values)

@ -229,8 +229,8 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {vit_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/vit_mae/convert_vit_mae_to_pytorch.py
+++ b/src/transformers/models/vit_mae/convert_vit_mae_to_pytorch.py
@ -20,7 +20,7 @@ import requests
 import torch
 from PIL import Image

-from transformers import ViTMAEConfig, ViTMAEFeatureExtractor, ViTMAEForPreTraining
+from transformers import ViTMAEConfig, ViTMAEForPreTraining, ViTMAEImageProcessor


 def rename_key(name):
@ -120,7 +120,7 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):

    state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["model"]

-    feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
+    image_processor = ViTMAEImageProcessor(size=config.image_size)

    new_state_dict = convert_state_dict(state_dict, config)

@ -130,8 +130,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    url = "https://user-images.githubusercontent.com/11435359/147738734-196fd92f-9260-48d5-ba7e-bf103d29364d.jpg"

    image = Image.open(requests.get(url, stream=True).raw)
-    feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
-    inputs = feature_extractor(images=image, return_tensors="pt")
+    image_processor = ViTMAEImageProcessor(size=config.image_size)
+    inputs = image_processor(images=image, return_tensors="pt")

    # forward pass
    torch.manual_seed(2)
@ -157,8 +157,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    print(f"Saving model to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)

-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/vit_msn/convert_msn_to_pytorch.py
+++ b/src/transformers/models/vit_msn/convert_msn_to_pytorch.py
@ -22,7 +22,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import ViTFeatureExtractor, ViTMSNConfig, ViTMSNModel
+from transformers import ViTImageProcessor, ViTMSNConfig, ViTMSNModel
 from transformers.image_utils import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD


@ -180,7 +180,7 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):

    state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["target_encoder"]

-    feature_extractor = ViTFeatureExtractor(size=config.image_size)
+    image_processor = ViTImageProcessor(size=config.image_size)

    remove_projection_head(state_dict)
    rename_keys = create_rename_keys(config, base_model=True)
@ -195,10 +195,10 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"

    image = Image.open(requests.get(url, stream=True).raw)
-    feature_extractor = ViTFeatureExtractor(
+    image_processor = ViTImageProcessor(
        size=config.image_size, image_mean=IMAGENET_DEFAULT_MEAN, image_std=IMAGENET_DEFAULT_STD
    )
-    inputs = feature_extractor(images=image, return_tensors="pt")
+    inputs = image_processor(images=image, return_tensors="pt")

    # forward pass
    torch.manual_seed(2)
@ -224,8 +224,8 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
    print(f"Saving model to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)

-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)


 if __name__ == "__main__":
--- a/src/transformers/models/x_clip/convert_x_clip_original_pytorch_to_hf.py
+++ b/src/transformers/models/x_clip/convert_x_clip_original_pytorch_to_hf.py
@ -23,7 +23,7 @@ from huggingface_hub import hf_hub_download
 from transformers import (
    CLIPTokenizer,
    CLIPTokenizerFast,
-    VideoMAEFeatureExtractor,
+    VideoMAEImageProcessor,
    XCLIPConfig,
    XCLIPModel,
    XCLIPProcessor,
@ -291,10 +291,10 @@ def convert_xclip_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
    model.eval()

    size = 336 if model_name == "xclip-large-patch14-16-frames" else 224
-    feature_extractor = VideoMAEFeatureExtractor(size=size)
+    image_processor = VideoMAEImageProcessor(size=size)
    slow_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
    fast_tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-base-patch32")
-    processor = XCLIPProcessor(feature_extractor=feature_extractor, tokenizer=fast_tokenizer)
+    processor = XCLIPProcessor(image_processor=image_processor, tokenizer=fast_tokenizer)

    video = prepare_video(num_frames)
    inputs = processor(
--- a/src/transformers/models/yolos/convert_yolos_to_pytorch.py
+++ b/src/transformers/models/yolos/convert_yolos_to_pytorch.py
@ -24,7 +24,7 @@ import torch
 from huggingface_hub import hf_hub_download
 from PIL import Image

-from transformers import YolosConfig, YolosFeatureExtractor, YolosForObjectDetection
+from transformers import YolosConfig, YolosForObjectDetection, YolosImageProcessor
 from transformers.utils import logging


@ -172,10 +172,10 @@ def convert_yolos_checkpoint(
    new_state_dict = convert_state_dict(state_dict, model)
    model.load_state_dict(new_state_dict)

-    # Check outputs on an image, prepared by YolosFeatureExtractor
+    # Check outputs on an image, prepared by YolosImageProcessor
    size = 800 if yolos_name != "yolos_ti" else 512
-    feature_extractor = YolosFeatureExtractor(format="coco_detection", size=size)
-    encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
+    image_processor = YolosImageProcessor(format="coco_detection", size=size)
+    encoding = image_processor(images=prepare_img(), return_tensors="pt")
    outputs = model(**encoding)
    logits, pred_boxes = outputs.logits, outputs.pred_boxes

@ -224,8 +224,8 @@ def convert_yolos_checkpoint(
    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    print(f"Saving model {yolos_name} to {pytorch_dump_folder_path}")
    model.save_pretrained(pytorch_dump_folder_path)
-    print(f"Saving feature extractor to {pytorch_dump_folder_path}")
-    feature_extractor.save_pretrained(pytorch_dump_folder_path)
+    print(f"Saving image processor to {pytorch_dump_folder_path}")
+    image_processor.save_pretrained(pytorch_dump_folder_path)

    if push_to_hub:
        model_mapping = {
@ -238,7 +238,7 @@ def convert_yolos_checkpoint(

        print("Pushing to the hub...")
        model_name = model_mapping[yolos_name]
-        feature_extractor.push_to_hub(model_name, organization="hustvl")
+        image_processor.push_to_hub(model_name, organization="hustvl")
        model.push_to_hub(model_name, organization="hustvl")


--- a/src/transformers/onnx/main.py
+++ b/src/transformers/onnx/main.py
@ -19,7 +19,7 @@ from pathlib import Path

 from packaging import version

-from .. import AutoFeatureExtractor, AutoProcessor, AutoTokenizer
+from .. import AutoFeatureExtractor, AutoImageProcessor, AutoProcessor, AutoTokenizer
 from ..utils import logging
 from ..utils.import_utils import is_optimum_available
 from .convert import export, validate_model_outputs
@ -145,6 +145,8 @@ def export_with_transformers(args):
            preprocessor = get_preprocessor(args.model)
        elif args.preprocessor == "tokenizer":
            preprocessor = AutoTokenizer.from_pretrained(args.model)
+        elif args.preprocessor == "image_processor":
+            preprocessor = AutoImageProcessor.from_pretrained(args.model)
        elif args.preprocessor == "feature_extractor":
            preprocessor = AutoFeatureExtractor.from_pretrained(args.model)
        elif args.preprocessor == "processor":
@ -213,7 +215,7 @@ def main():
    parser.add_argument(
        "--preprocessor",
        type=str,
-        choices=["auto", "tokenizer", "feature_extractor", "processor"],
+        choices=["auto", "tokenizer", "feature_extractor", "image_processor", "processor"],
        default="auto",
        help="Which type of preprocessor to use. 'auto' tries to automatically detect it.",
    )
--- a/tests/models/beit/test_modeling_beit.py
+++ b/tests/models/beit/test_modeling_beit.py
@ -49,7 +49,7 @@ if is_vision_available():
    import PIL
    from PIL import Image

-    from transformers import BeitFeatureExtractor
+    from transformers import BeitImageProcessor


 class BeitModelTester:
@ -342,18 +342,16 @@ def prepare_img():
@require_vision
 class BeitModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return (
-            BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
-        )
+    def default_image_processor(self):
+        return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None

    @slow
    def test_inference_masked_image_modeling_head(self):
        model = BeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
+        pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)

        # prepare bool_masked_pos
        bool_masked_pos = torch.ones((1, 196), dtype=torch.bool).to(torch_device)
@ -377,9 +375,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
    def test_inference_image_classification_head_imagenet_1k(self):
        model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -403,9 +401,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
            torch_device
        )

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -428,11 +426,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
        model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
        model = model.to(torch_device)

-        feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
+        image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)

        ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
        image = Image.open(ds[0]["file"])
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -471,11 +469,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
        model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
        model = model.to(torch_device)

-        feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
+        image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)

        ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
        image = Image.open(ds[0]["file"])
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -483,10 +481,10 @@ class BeitModelIntegrationTest(unittest.TestCase):

        outputs.logits = outputs.logits.detach().cpu()

-        segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
+        segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
        expected_shape = torch.Size((500, 300))
        self.assertEqual(segmentation[0].shape, expected_shape)

-        segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
+        segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
        expected_shape = torch.Size((160, 160))
        self.assertEqual(segmentation[0].shape, expected_shape)
--- a/tests/models/beit/test_modeling_flax_beit.py
+++ b/tests/models/beit/test_modeling_flax_beit.py
@ -33,7 +33,7 @@ if is_flax_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import BeitFeatureExtractor
+    from transformers import BeitImageProcessor


 class FlaxBeitModelTester(unittest.TestCase):
@ -219,18 +219,16 @@ def prepare_img():
@require_flax
 class FlaxBeitModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return (
-            BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
-        )
+    def default_image_processor(self):
+        return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None

    @slow
    def test_inference_masked_image_modeling_head(self):
        model = FlaxBeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k")

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        pixel_values = feature_extractor(images=image, return_tensors="np").pixel_values
+        pixel_values = image_processor(images=image, return_tensors="np").pixel_values

        # prepare bool_masked_pos
        bool_masked_pos = np.ones((1, 196), dtype=bool)
@ -253,9 +251,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
    def test_inference_image_classification_head_imagenet_1k(self):
        model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="np")
+        inputs = image_processor(images=image, return_tensors="np")

        # forward pass
        outputs = model(**inputs)
@ -276,9 +274,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
    def test_inference_image_classification_head_imagenet_22k(self):
        model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-large-patch16-224-pt22k-ft22k")

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="np")
+        inputs = image_processor(images=image, return_tensors="np")

        # forward pass
        outputs = model(**inputs)
--- a/tests/models/bit/test_modeling_bit.py
+++ b/tests/models/bit/test_modeling_bit.py
@ -297,7 +297,7 @@ def prepare_img():
@require_vision
 class BitModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
            BitImageProcessor.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]) if is_vision_available() else None
        )
@ -306,9 +306,9 @@ class BitModelIntegrationTest(unittest.TestCase):
    def test_inference_image_classification_head(self):
        model = BitForImageClassification.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/bridgetower/test_image_processing_bridgetower.py
+++ b/tests/models/bridgetower/test_image_processing_bridgetower.py
@ -145,7 +145,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
        pass

    def test_call_pil(self):
-        # Initialize feature_extractor
+        # Initialize image processor
        image_processing = self.image_processing_class(**self.image_processor_dict)
        # create random PIL images
        image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False)
@ -176,7 +176,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
        )

    def test_call_numpy(self):
-        # Initialize feature_extractor
+        # Initialize image processor
        image_processing = self.image_processing_class(**self.image_processor_dict)
        # create random numpy tensors
        image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, numpify=True)
@ -207,7 +207,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
        )

    def test_call_pytorch(self):
-        # Initialize feature_extractor
+        # Initialize image processor
        image_processing = self.image_processing_class(**self.image_processor_dict)
        # create random PyTorch tensors
        image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, torchify=True)
@ -238,7 +238,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
        )

    def test_equivalence_pad_and_create_pixel_mask(self):
-        # Initialize feature_extractors
+        # Initialize image processors
        image_processing_1 = self.image_processing_class(**self.image_processor_dict)
        image_processing_2 = self.image_processing_class(do_resize=False, do_normalize=False, do_rescale=False)
        # create random PyTorch tensors
--- a/tests/models/conditional_detr/test_modeling_conditional_detr.py
+++ b/tests/models/conditional_detr/test_modeling_conditional_detr.py
@ -43,7 +43,7 @@ if is_timm_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import ConditionalDetrFeatureExtractor
+    from transformers import ConditionalDetrImageProcessor


 class ConditionalDetrModelTester:
@ -493,9 +493,9 @@ def prepare_img():
@slow
 class ConditionalDetrModelIntegrationTests(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
-            ConditionalDetrFeatureExtractor.from_pretrained("microsoft/conditional-detr-resnet-50")
+            ConditionalDetrImageProcessor.from_pretrained("microsoft/conditional-detr-resnet-50")
            if is_vision_available()
            else None
        )
@ -503,9 +503,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
    def test_inference_no_head(self):
        model = ConditionalDetrModel.from_pretrained("microsoft/conditional-detr-resnet-50").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)

        with torch.no_grad():
            outputs = model(**encoding)
@ -522,9 +522,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
            torch_device
        )

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
        pixel_values = encoding["pixel_values"].to(torch_device)
        pixel_mask = encoding["pixel_mask"].to(torch_device)

@ -547,7 +547,7 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
        self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))

        # verify postprocessing
-        results = feature_extractor.post_process_object_detection(
+        results = image_processor.post_process_object_detection(
            outputs, threshold=0.3, target_sizes=[image.size[::-1]]
        )[0]
        expected_scores = torch.tensor([0.8330, 0.8313, 0.8039, 0.6829, 0.5355]).to(torch_device)
--- a/tests/models/convnext/test_modeling_convnext.py
+++ b/tests/models/convnext/test_modeling_convnext.py
@ -38,7 +38,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import AutoFeatureExtractor
+    from transformers import AutoImageProcessor


 class ConvNextModelTester:
@ -285,16 +285,16 @@ def prepare_img():
@require_vision
 class ConvNextModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return AutoFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
+    def default_image_processor(self):
+        return AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None

    @slow
    def test_inference_image_classification_head(self):
        model = ConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/convnext/test_modeling_tf_convnext.py
+++ b/tests/models/convnext/test_modeling_tf_convnext.py
@ -38,7 +38,7 @@ if is_tf_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import ConvNextFeatureExtractor
+    from transformers import ConvNextImageProcessor


 class TFConvNextModelTester:
@ -279,18 +279,16 @@ def prepare_img():
@require_vision
 class TFConvNextModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return (
-            ConvNextFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
-        )
+    def default_image_processor(self):
+        return ConvNextImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None

    @slow
    def test_inference_image_classification_head(self):
        model = TFConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224")

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="tf")
+        inputs = image_processor(images=image, return_tensors="tf")

        # forward pass
        outputs = model(**inputs)
--- a/tests/models/cvt/test_modeling_cvt.py
+++ b/tests/models/cvt/test_modeling_cvt.py
@ -38,7 +38,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import AutoFeatureExtractor
+    from transformers import AutoImageProcessor


 class CvtConfigTester(ConfigTester):
@ -264,16 +264,16 @@ def prepare_img():
@require_vision
 class CvtModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return AutoFeatureExtractor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
+    def default_image_processor(self):
+        return AutoImageProcessor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])

    @slow
    def test_inference_image_classification_head(self):
        model = CvtForImageClassification.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/cvt/test_modeling_tf_cvt.py
+++ b/tests/models/cvt/test_modeling_tf_cvt.py
@ -28,7 +28,7 @@ if is_tf_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import AutoFeatureExtractor
+    from transformers import AutoImageProcessor


 class TFCvtConfigTester(ConfigTester):
@ -265,16 +265,16 @@ def prepare_img():
@require_vision
 class TFCvtModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return AutoFeatureExtractor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
+    def default_image_processor(self):
+        return AutoImageProcessor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])

    @slow
    def test_inference_image_classification_head(self):
        model = TFCvtForImageClassification.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="tf")
+        inputs = image_processor(images=image, return_tensors="tf")

        # forward pass
        outputs = model(**inputs)
--- a/tests/models/data2vec/test_modeling_data2vec_vision.py
+++ b/tests/models/data2vec/test_modeling_data2vec_vision.py
@ -44,7 +44,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import BeitFeatureExtractor
+    from transformers import BeitImageProcessor


 class Data2VecVisionModelTester:
@ -327,11 +327,9 @@ def prepare_img():
@require_vision
 class Data2VecVisionModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
-            BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
-            if is_vision_available()
-            else None
+            BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
        )

    @slow
@ -340,9 +338,9 @@ class Data2VecVisionModelIntegrationTest(unittest.TestCase):
            torch_device
        )

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/data2vec/test_modeling_tf_data2vec_vision.py
+++ b/tests/models/data2vec/test_modeling_tf_data2vec_vision.py
@ -46,7 +46,7 @@ if is_tf_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import BeitFeatureExtractor
+    from transformers import BeitImageProcessor


 class TFData2VecVisionModelTester:
@ -469,20 +469,18 @@ def prepare_img():
@require_vision
 class TFData2VecVisionModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
-            BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
-            if is_vision_available()
-            else None
+            BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
        )

    @slow
    def test_inference_image_classification_head_imagenet_1k(self):
        model = TFData2VecVisionForImageClassification.from_pretrained("facebook/data2vec-vision-base-ft1k")

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="tf")
+        inputs = image_processor(images=image, return_tensors="tf")

        # forward pass
        outputs = model(**inputs)
--- a/tests/models/deformable_detr/test_modeling_deformable_detr.py
+++ b/tests/models/deformable_detr/test_modeling_deformable_detr.py
@ -39,7 +39,7 @@ if is_timm_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import AutoFeatureExtractor
+    from transformers import AutoImageProcessor


 class DeformableDetrModelTester:
@ -563,15 +563,15 @@ def prepare_img():
@slow
 class DeformableDetrModelIntegrationTests(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return AutoFeatureExtractor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None
+    def default_image_processor(self):
+        return AutoImageProcessor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None

    def test_inference_object_detection_head(self):
        model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
        pixel_values = encoding["pixel_values"].to(torch_device)
        pixel_mask = encoding["pixel_mask"].to(torch_device)

@ -595,7 +595,7 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
        self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))

        # verify postprocessing
-        results = feature_extractor.post_process_object_detection(
+        results = image_processor.post_process_object_detection(
            outputs, threshold=0.3, target_sizes=[image.size[::-1]]
        )[0]
        expected_scores = torch.tensor([0.7999, 0.7894, 0.6331, 0.4720, 0.4382]).to(torch_device)
@ -612,9 +612,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
            "SenseTime/deformable-detr-with-box-refine-two-stage"
        ).to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
        pixel_values = encoding["pixel_values"].to(torch_device)
        pixel_mask = encoding["pixel_mask"].to(torch_device)

@ -639,9 +639,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):

    @require_torch_gpu
    def test_inference_object_detection_head_equivalence_cpu_gpu(self):
-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt")
+        encoding = image_processor(images=image, return_tensors="pt")
        pixel_values = encoding["pixel_values"]
        pixel_mask = encoding["pixel_mask"]

--- a/tests/models/deit/test_modeling_deit.py
+++ b/tests/models/deit/test_modeling_deit.py
@ -55,7 +55,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import DeiTFeatureExtractor
+    from transformers import DeiTImageProcessor


 class DeiTModelTester:
@ -381,9 +381,9 @@ def prepare_img():
@require_vision
 class DeiTModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
-            DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
+            DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
            if is_vision_available()
            else None
        )
@ -394,9 +394,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
            torch_device
        )

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -420,10 +420,10 @@ class DeiTModelIntegrationTest(unittest.TestCase):
        model = DeiTModel.from_pretrained(
            "facebook/deit-base-distilled-patch16-224", torch_dtype=torch.float16, device_map="auto"
        )
-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor

        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt")
+        inputs = image_processor(images=image, return_tensors="pt")
        pixel_values = inputs.pixel_values.to(torch_device)

        # forward pass to make sure inference works in fp16
--- a/tests/models/deit/test_modeling_tf_deit.py
+++ b/tests/models/deit/test_modeling_tf_deit.py
@ -46,7 +46,7 @@ if is_tf_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import DeiTFeatureExtractor
+    from transformers import DeiTImageProcessor


 class TFDeiTModelTester:
@ -266,9 +266,9 @@ def prepare_img():
@require_vision
 class DeiTModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
-            DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
+            DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
            if is_vision_available()
            else None
        )
@ -277,9 +277,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
    def test_inference_image_classification_head(self):
        model = TFDeiTForImageClassificationWithTeacher.from_pretrained("facebook/deit-base-distilled-patch16-224")

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="tf")
+        inputs = image_processor(images=image, return_tensors="tf")

        # forward pass
        outputs = model(**inputs)
--- a/tests/models/detr/test_modeling_detr.py
+++ b/tests/models/detr/test_modeling_detr.py
@ -38,7 +38,7 @@ if is_timm_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import DetrFeatureExtractor
+    from transformers import DetrImageProcessor


 class DetrModelTester:
@ -512,15 +512,15 @@ def prepare_img():
@slow
 class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None
+    def default_image_processor(self):
+        return DetrImageProcessor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None

    def test_inference_no_head(self):
        model = DetrModel.from_pretrained("facebook/detr-resnet-50").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)

        with torch.no_grad():
            outputs = model(**encoding)
@ -535,9 +535,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
    def test_inference_object_detection_head(self):
        model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
        pixel_values = encoding["pixel_values"].to(torch_device)
        pixel_mask = encoding["pixel_mask"].to(torch_device)

@ -560,7 +560,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
        self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))

        # verify postprocessing
-        results = feature_extractor.post_process_object_detection(
+        results = image_processor.post_process_object_detection(
            outputs, threshold=0.3, target_sizes=[image.size[::-1]]
        )[0]
        expected_scores = torch.tensor([0.9982, 0.9960, 0.9955, 0.9988, 0.9987]).to(torch_device)
@ -575,9 +575,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
    def test_inference_panoptic_segmentation_head(self):
        model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50-panoptic").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
        pixel_values = encoding["pixel_values"].to(torch_device)
        pixel_mask = encoding["pixel_mask"].to(torch_device)

@ -607,7 +607,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
        self.assertTrue(torch.allclose(outputs.pred_masks[0, 0, :3, :3], expected_slice_masks, atol=1e-3))

        # verify postprocessing
-        results = feature_extractor.post_process_panoptic_segmentation(
+        results = image_processor.post_process_panoptic_segmentation(
            outputs, threshold=0.3, target_sizes=[image.size[::-1]]
        )[0]

@ -633,9 +633,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
@slow
 class DetrModelIntegrationTests(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
-            DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
+            DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
            if is_vision_available()
            else None
        )
@ -643,9 +643,9 @@ class DetrModelIntegrationTests(unittest.TestCase):
    def test_inference_no_head(self):
        model = DetrModel.from_pretrained("facebook/detr-resnet-50", revision="no_timm").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        encoding = image_processor(images=image, return_tensors="pt").to(torch_device)

        with torch.no_grad():
            outputs = model(**encoding)
--- a/tests/models/dinat/test_modeling_dinat.py
+++ b/tests/models/dinat/test_modeling_dinat.py
@ -367,16 +367,16 @@ class DinatModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
@require_torch
 class DinatModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return AutoImageProcessor.from_pretrained("shi-labs/dinat-mini-in1k-224") if is_vision_available() else None

    @slow
    def test_inference_image_classification_head(self):
        model = DinatForImageClassification.from_pretrained("shi-labs/dinat-mini-in1k-224").to(torch_device)
-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor

        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/dit/test_modeling_dit.py
+++ b/tests/models/dit/test_modeling_dit.py
@ -25,7 +25,7 @@ if is_torch_available():
    from transformers import AutoModelForImageClassification

 if is_vision_available():
-    from transformers import AutoFeatureExtractor
+    from transformers import AutoImageProcessor


@require_torch
@ -33,7 +33,7 @@ if is_vision_available():
 class DiTIntegrationTest(unittest.TestCase):
    @slow
    def test_for_image_classification(self):
-        feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
+        image_processor = AutoImageProcessor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
        model = AutoModelForImageClassification.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
        model.to(torch_device)

@ -43,7 +43,7 @@ class DiTIntegrationTest(unittest.TestCase):

        image = dataset["train"][0]["image"].convert("RGB")

-        inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/dpt/test_modeling_dpt.py
+++ b/tests/models/dpt/test_modeling_dpt.py
@ -39,7 +39,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import DPTFeatureExtractor
+    from transformers import DPTImageProcessor


 class DPTModelTester:
@ -293,11 +293,11 @@ def prepare_img():
@slow
 class DPTModelIntegrationTest(unittest.TestCase):
    def test_inference_depth_estimation(self):
-        feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large")
+        image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large")
        model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large").to(torch_device)

        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -315,11 +315,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
        self.assertTrue(torch.allclose(outputs.predicted_depth[0, :3, :3], expected_slice, atol=1e-4))

    def test_inference_semantic_segmentation(self):
-        feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
+        image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
        model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)

        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -336,11 +336,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
        self.assertTrue(torch.allclose(outputs.logits[0, 0, :3, :3], expected_slice, atol=1e-4))

    def test_post_processing_semantic_segmentation(self):
-        feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
+        image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
        model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)

        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -348,10 +348,10 @@ class DPTModelIntegrationTest(unittest.TestCase):

        outputs.logits = outputs.logits.detach().cpu()

-        segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
+        segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
        expected_shape = torch.Size((500, 300))
        self.assertEqual(segmentation[0].shape, expected_shape)

-        segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
+        segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
        expected_shape = torch.Size((480, 480))
        self.assertEqual(segmentation[0].shape, expected_shape)
--- a/tests/models/dpt/test_modeling_dpt_hybrid.py
+++ b/tests/models/dpt/test_modeling_dpt_hybrid.py
@ -39,7 +39,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import DPTFeatureExtractor
+    from transformers import DPTImageProcessor


 class DPTModelTester:
@ -314,11 +314,11 @@ def prepare_img():
@slow
 class DPTModelIntegrationTest(unittest.TestCase):
    def test_inference_depth_estimation(self):
-        feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas")
+        image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
        model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(torch_device)

        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/efficientformer/test_modeling_efficientformer.py
+++ b/tests/models/efficientformer/test_modeling_efficientformer.py
@ -444,7 +444,7 @@ def prepare_img():
@require_vision
 class EfficientFormerModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return (
            EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
            if is_vision_available()
@ -457,9 +457,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
            torch_device
        )

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
@ -478,9 +478,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
            "snap-research/efficientformer-l1-300"
        ).to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/glpn/test_modeling_glpn.py
+++ b/tests/models/glpn/test_modeling_glpn.py
@ -37,7 +37,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import GLPNFeatureExtractor
+    from transformers import GLPNImageProcessor


 class GLPNConfigTester(ConfigTester):
@ -337,11 +337,11 @@ def prepare_img():
 class GLPNModelIntegrationTest(unittest.TestCase):
    @slow
    def test_inference_depth_estimation(self):
-        feature_extractor = GLPNFeatureExtractor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
+        image_processor = GLPNImageProcessor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
        model = GLPNForDepthEstimation.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)

        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/imagegpt/test_modeling_imagegpt.py
+++ b/tests/models/imagegpt/test_modeling_imagegpt.py
@ -49,7 +49,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import ImageGPTFeatureExtractor
+    from transformers import ImageGPTImageProcessor


 class ImageGPTModelTester:
@ -535,16 +535,16 @@ def prepare_img():
@require_vision
 class ImageGPTModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return ImageGPTFeatureExtractor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None
+    def default_image_processor(self):
+        return ImageGPTImageProcessor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None

    @slow
    def test_inference_causal_lm_head(self):
        model = ImageGPTForCausalImageModeling.from_pretrained("openai/imagegpt-small").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/layoutlmv3/test_modeling_layoutlmv3.py
+++ b/tests/models/layoutlmv3/test_modeling_layoutlmv3.py
@ -45,7 +45,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import LayoutLMv3FeatureExtractor
+    from transformers import LayoutLMv3ImageProcessor


 class LayoutLMv3ModelTester:
@ -382,16 +382,16 @@ def prepare_img():
@require_torch
 class LayoutLMv3ModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
+    def default_image_processor(self):
+        return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None

    @slow
    def test_inference_no_head(self):
        model = LayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base").to(torch_device)

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
+        pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)

        input_ids = torch.tensor([[1, 2]])
        bbox = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]]).unsqueeze(0)
--- a/tests/models/layoutlmv3/test_modeling_tf_layoutlmv3.py
+++ b/tests/models/layoutlmv3/test_modeling_tf_layoutlmv3.py
@ -51,7 +51,7 @@ if is_tf_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import LayoutLMv3FeatureExtractor
+    from transformers import LayoutLMv3ImageProcessor


 class TFLayoutLMv3ModelTester:
@ -482,16 +482,16 @@ def prepare_img():
@require_tf
 class TFLayoutLMv3ModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
+    def default_image_processor(self):
+        return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None

    @slow
    def test_inference_no_head(self):
        model = TFLayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base")

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        pixel_values = feature_extractor(images=image, return_tensors="tf").pixel_values
+        pixel_values = image_processor(images=image, return_tensors="tf").pixel_values

        input_ids = tf.constant([[1, 2]])
        bbox = tf.expand_dims(tf.constant([[1, 2, 3, 4], [5, 6, 7, 8]]), axis=0)
--- a/tests/models/layoutxlm/test_processor_layoutxlm.py
+++ b/tests/models/layoutxlm/test_processor_layoutxlm.py
@ -36,7 +36,7 @@ from transformers.utils import FEATURE_EXTRACTOR_NAME, cached_property, is_pytes
 if is_pytesseract_available():
    from PIL import Image

-    from transformers import LayoutLMv2FeatureExtractor, LayoutXLMProcessor
+    from transformers import LayoutLMv2ImageProcessor, LayoutXLMProcessor


@require_pytesseract
@ -47,7 +47,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
    rust_tokenizer_class = LayoutXLMTokenizerFast

    def setUp(self):
-        feature_extractor_map = {
+        image_processor_map = {
            "do_resize": True,
            "size": 224,
            "apply_ocr": True,
@ -56,7 +56,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
        self.tmpdirname = tempfile.mkdtemp()
        self.feature_extraction_file = os.path.join(self.tmpdirname, FEATURE_EXTRACTOR_NAME)
        with open(self.feature_extraction_file, "w", encoding="utf-8") as fp:
-            fp.write(json.dumps(feature_extractor_map) + "\n")
+            fp.write(json.dumps(image_processor_map) + "\n")

        # taken from `test_tokenization_layoutxlm.LayoutXLMTokenizationTest.test_save_pretrained`
        self.tokenizer_pretrained_name = "hf-internal-testing/tiny-random-layoutxlm"
@ -70,8 +70,8 @@ class LayoutXLMProcessorTest(unittest.TestCase):
    def get_tokenizers(self, **kwargs) -> List[PreTrainedTokenizerBase]:
        return [self.get_tokenizer(**kwargs), self.get_rust_tokenizer(**kwargs)]

-    def get_feature_extractor(self, **kwargs):
-        return LayoutLMv2FeatureExtractor.from_pretrained(self.tmpdirname, **kwargs)
+    def get_image_processor(self, **kwargs):
+        return LayoutLMv2ImageProcessor.from_pretrained(self.tmpdirname, **kwargs)

    def tearDown(self):
        shutil.rmtree(self.tmpdirname)
@ -88,10 +88,10 @@ class LayoutXLMProcessorTest(unittest.TestCase):
        return image_inputs

    def test_save_load_pretrained_default(self):
-        feature_extractor = self.get_feature_extractor()
+        image_processor = self.get_image_processor()
        tokenizers = self.get_tokenizers()
        for tokenizer in tokenizers:
-            processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
+            processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)

            processor.save_pretrained(self.tmpdirname)
            processor = LayoutXLMProcessor.from_pretrained(self.tmpdirname)
@ -99,16 +99,16 @@ class LayoutXLMProcessorTest(unittest.TestCase):
            self.assertEqual(processor.tokenizer.get_vocab(), tokenizer.get_vocab())
            self.assertIsInstance(processor.tokenizer, (LayoutXLMTokenizer, LayoutXLMTokenizerFast))

-            self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor.to_json_string())
-            self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
+            self.assertEqual(processor.image_processor.to_json_string(), image_processor.to_json_string())
+            self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)

    def test_save_load_pretrained_additional_features(self):
-        processor = LayoutXLMProcessor(feature_extractor=self.get_feature_extractor(), tokenizer=self.get_tokenizer())
+        processor = LayoutXLMProcessor(image_processor=self.get_image_processor(), tokenizer=self.get_tokenizer())
        processor.save_pretrained(self.tmpdirname)

        # slow tokenizer
        tokenizer_add_kwargs = self.get_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
-        feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
+        image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)

        processor = LayoutXLMProcessor.from_pretrained(
            self.tmpdirname,
@ -122,12 +122,12 @@ class LayoutXLMProcessorTest(unittest.TestCase):
        self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
        self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizer)

-        self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
-        self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
+        self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
+        self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)

        # fast tokenizer
        tokenizer_add_kwargs = self.get_rust_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
-        feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
+        image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)

        processor = LayoutXLMProcessor.from_pretrained(
            self.tmpdirname, use_xlm=True, bos_token="(BOS)", eos_token="(EOS)", do_resize=False, size=30
@ -136,14 +136,14 @@ class LayoutXLMProcessorTest(unittest.TestCase):
        self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
        self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizerFast)

-        self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
-        self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
+        self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
+        self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)

    def test_model_input_names(self):
-        feature_extractor = self.get_feature_extractor()
+        image_processor = self.get_image_processor()
        tokenizer = self.get_tokenizer()

-        processor = LayoutXLMProcessor(tokenizer=tokenizer, feature_extractor=feature_extractor)
+        processor = LayoutXLMProcessor(tokenizer=tokenizer, image_processor=image_processor)

        input_str = "lower newer"
        image_input = self.prepare_image_inputs()
@ -215,15 +215,15 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
    def test_processor_case_1(self):
        # case 1: document image classification (training, inference) + token classification (inference), apply_ocr = True

-        feature_extractor = LayoutLMv2FeatureExtractor()
+        image_processor = LayoutLMv2ImageProcessor()
        tokenizers = self.get_tokenizers
        images = self.get_images

        for tokenizer in tokenizers:
-            processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
+            processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)

            # not batched
-            input_feat_extract = feature_extractor(images[0], return_tensors="pt")
+            input_feat_extract = image_processor(images[0], return_tensors="pt")
            input_processor = processor(images[0], return_tensors="pt")

            # verify keys
@ -245,7 +245,7 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
            self.assertSequenceEqual(decoding, expected_decoding)

            # batched
-            input_feat_extract = feature_extractor(images, return_tensors="pt")
+            input_feat_extract = image_processor(images, return_tensors="pt")
            input_processor = processor(images, padding=True, return_tensors="pt")

            # verify keys
@ -270,12 +270,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
    def test_processor_case_2(self):
        # case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False

-        feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
+        image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
        tokenizers = self.get_tokenizers
        images = self.get_images

        for tokenizer in tokenizers:
-            processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
+            processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)

            # not batched
            words = ["hello", "world"]
@ -324,12 +324,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
    def test_processor_case_3(self):
        # case 3: token classification (training), apply_ocr=False

-        feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
+        image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
        tokenizers = self.get_tokenizers
        images = self.get_images

        for tokenizer in tokenizers:
-            processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
+            processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)

            # not batched
            words = ["weirdly", "world"]
@ -389,12 +389,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
    def test_processor_case_4(self):
        # case 4: visual question answering (inference), apply_ocr=True

-        feature_extractor = LayoutLMv2FeatureExtractor()
+        image_processor = LayoutLMv2ImageProcessor()
        tokenizers = self.get_tokenizers
        images = self.get_images

        for tokenizer in tokenizers:
-            processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
+            processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)

            # not batched
            question = "What's his name?"
@ -440,12 +440,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
    def test_processor_case_5(self):
        # case 5: visual question answering (inference), apply_ocr=False

-        feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
+        image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
        tokenizers = self.get_tokenizers
        images = self.get_images

        for tokenizer in tokenizers:
-            processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
+            processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)

            # not batched
            question = "What's his name?"
--- a/tests/models/levit/test_modeling_levit.py
+++ b/tests/models/levit/test_modeling_levit.py
@ -46,7 +46,7 @@ if is_torch_available():
 if is_vision_available():
    from PIL import Image

-    from transformers import LevitFeatureExtractor
+    from transformers import LevitImageProcessor


 class LevitConfigTester(ConfigTester):
@ -409,8 +409,8 @@ def prepare_img():
@require_vision
 class LevitModelIntegrationTest(unittest.TestCase):
    @cached_property
-    def default_feature_extractor(self):
-        return LevitFeatureExtractor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
+    def default_image_processor(self):
+        return LevitImageProcessor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])

    @slow
    def test_inference_image_classification_head(self):
@ -418,9 +418,9 @@ class LevitModelIntegrationTest(unittest.TestCase):
            torch_device
        )

-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)

        # forward pass
        with torch.no_grad():
--- a/tests/models/mask2former/test_image_processing_mask2former.py
+++ b/tests/models/mask2former/test_image_processing_mask2former.py
@ -545,9 +545,9 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
        self.assertEqual(segmentation[0].shape, target_sizes[0])

    def test_post_process_instance_segmentation(self):
-        feature_extractor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
+        image_processor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
        outputs = self.image_processor_tester.get_fake_mask2former_outputs()
-        segmentation = feature_extractor.post_process_instance_segmentation(outputs, threshold=0)
+        segmentation = image_processor.post_process_instance_segmentation(outputs, threshold=0)

        self.assertTrue(len(segmentation) == self.image_processor_tester.batch_size)
        for el in segmentation:
@ -556,7 +556,7 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
            self.assertEqual(type(el["segments_info"]), list)
            self.assertEqual(el["segmentation"].shape, (384, 384))

-        segmentation = feature_extractor.post_process_instance_segmentation(
+        segmentation = image_processor.post_process_instance_segmentation(
            outputs, threshold=0, return_binary_maps=True
        )

--- a/tests/models/mask2former/test_modeling_mask2former.py
+++ b/tests/models/mask2former/test_modeling_mask2former.py
@ -325,14 +325,14 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
        return "facebook/mask2former-swin-small-coco-instance"

    @cached_property
-    def default_feature_extractor(self):
+    def default_image_processor(self):
        return Mask2FormerImageProcessor.from_pretrained(self.model_checkpoints) if is_vision_available() else None

    def test_inference_no_head(self):
        model = Mask2FormerModel.from_pretrained(self.model_checkpoints).to(torch_device)
-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(image, return_tensors="pt").to(torch_device)
        inputs_shape = inputs["pixel_values"].shape
        # check size is divisible by 32
        self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
@ -371,9 +371,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):

    def test_inference_universal_segmentation_head(self):
        model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor
        image = prepare_img()
-        inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
+        inputs = image_processor(image, return_tensors="pt").to(torch_device)
        inputs_shape = inputs["pixel_values"].shape
        # check size is divisible by 32
        self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
@ -408,9 +408,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):

    def test_with_segmentation_maps_and_loss(self):
        model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
-        feature_extractor = self.default_feature_extractor
+        image_processor = self.default_image_processor

-        inputs = feature_extractor(
+        inputs = image_processor(
            [np.zeros((3, 800, 1333)), np.zeros((3, 800, 1333))],
            segmentation_maps=[np.zeros((384, 384)).astype(np.float32), np.zeros((384, 384)).astype(np.float32)],
            return_tensors="pt",
--- a/Show More
+++ b/Show More