Update old existing feature extractor references (#24552)

* Update old existing feature extractor references

* Typo

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Address comments from review - update 'feature extractor'
Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
This commit is contained in:
amyeroberts 2023-06-29 10:17:36 +01:00 committed by GitHub
parent 10c2ac7bc6
commit ae454f41d4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
138 changed files with 762 additions and 743 deletions

View File

@ -354,12 +354,12 @@ Als Nächstes sehen Sie sich das Bild mit dem Merkmal 🤗 Datensätze [Bild] (h
### Merkmalsextraktor
Laden Sie den Merkmalsextraktor mit [`AutoFeatureExtractor.from_pretrained`]:
Laden Sie den Merkmalsextraktor mit [`AutoImageProcessor.from_pretrained`]:
```py
>>> from transformers import AutoFeatureExtractor
>>> from transformers import AutoImageProcessor
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
```
### Datenerweiterung
@ -371,9 +371,9 @@ Bei Bildverarbeitungsaufgaben ist es üblich, den Bildern als Teil der Vorverarb
```py
>>> from torchvision.transforms import Compose, Normalize, RandomResizedCrop, ColorJitter, ToTensor
>>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
>>> _transforms = Compose(
... [RandomResizedCrop(feature_extractor.size), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
... [RandomResizedCrop(image_processor.size["height"]), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
... )
```

View File

@ -263,7 +263,7 @@ To use, create an image processor associated with the model you're using. For ex
ViTImageProcessor {
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "ViTImageProcessor",
"image_processor_type": "ViTImageProcessor",
"image_mean": [
0.5,
0.5,
@ -295,7 +295,7 @@ Modify any of the [`ViTImageProcessor`] parameters to create your custom image p
ViTImageProcessor {
"do_normalize": false,
"do_resize": true,
"feature_extractor_type": "ViTImageProcessor",
"image_processor_type": "ViTImageProcessor",
"image_mean": [
0.3,
0.3,

View File

@ -50,10 +50,10 @@ product between the projected image and text features is then used as a similar
To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
The [`CLIPFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model.
The [`CLIPImageProcessor`] can be used to resize (or rescale) and normalize images for the model.
The [`CLIPTokenizer`] is used to encode the text. The [`CLIPProcessor`] wraps
[`CLIPFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both
[`CLIPImageProcessor`] and [`CLIPTokenizer`] into a single instance to both
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
[`CLIPProcessor`] and [`CLIPModel`].

View File

@ -46,9 +46,9 @@ Tips:
Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.
The [`DonutFeatureExtractor`] class is responsible for preprocessing the input image and
The [`DonutImageProcessor`] class is responsible for preprocessing the input image and
[`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
[`DonutProcessor`] wraps [`DonutFeatureExtractor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
[`DonutProcessor`] wraps [`DonutImageProcessor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
into a single instance to both extract the input features and decode the predicted token ids.
- Step-by-step Document Image Classification

View File

@ -150,23 +150,23 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
## Usage: LayoutLMv2Processor
The easiest way to prepare data for the model is to use [`LayoutLMv2Processor`], which internally
combines a feature extractor ([`LayoutLMv2FeatureExtractor`]) and a tokenizer
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The feature extractor
combines a image processor ([`LayoutLMv2ImageProcessor`]) and a tokenizer
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The image processor
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
modality.
```python
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
from transformers import LayoutLMv2ImageProcessor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
feature_extractor = LayoutLMv2FeatureExtractor() # apply_ocr is set to True by default
image_processor = LayoutLMv2ImageProcessor() # apply_ocr is set to True by default
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
processor = LayoutLMv2Processor(image_processor, tokenizer)
```
In short, one can provide a document image (and possibly additional data) to [`LayoutLMv2Processor`],
and it will create the inputs expected by the model. Internally, the processor first uses
[`LayoutLMv2FeatureExtractor`] to apply OCR on the image to get a list of words and normalized
[`LayoutLMv2ImageProcessor`] to apply OCR on the image to get a list of words and normalized
bounding boxes, as well to resize the image to a given size in order to get the `image` input. The words and
normalized bounding boxes are then provided to [`LayoutLMv2Tokenizer`] or
[`LayoutLMv2TokenizerFast`], which converts them to token-level `input_ids`,
@ -176,7 +176,7 @@ which are turned into token-level `labels`.
[`LayoutLMv2Processor`] uses [PyTesseract](https://pypi.org/project/pytesseract/), a Python
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
choice, and provide the words and normalized boxes yourself. This requires initializing
[`LayoutLMv2FeatureExtractor`] with `apply_ocr` set to `False`.
[`LayoutLMv2ImageProcessor`] with `apply_ocr` set to `False`.
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
@ -184,7 +184,7 @@ use cases work for both batched and non-batched inputs (we illustrate them for n
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
True**
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
This is the simplest case, in which the processor (actually the image processor) will perform OCR on the image to get
the words and normalized bounding boxes.
```python
@ -205,7 +205,7 @@ print(encoding.keys())
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
In case one wants to do OCR themselves, one can initialize the feature extractor with `apply_ocr` set to
In case one wants to do OCR themselves, one can initialize the image processor with `apply_ocr` set to
`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
the processor.

View File

@ -31,7 +31,7 @@ Tips:
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3ImageProcessor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).

View File

@ -52,7 +52,7 @@ tokenizer = LayoutXLMTokenizer.from_pretrained("microsoft/layoutxlm-base")
```
Similar to LayoutLMv2, you can use [`LayoutXLMProcessor`] (which internally applies
[`LayoutLMv2FeatureExtractor`] and
[`LayoutLMv2ImageProcessor`] and
[`LayoutXLMTokenizer`]/[`LayoutXLMTokenizerFast`] in sequence) to prepare all
data for the model.

View File

@ -28,7 +28,7 @@ The abstract from the paper is the following:
OWL-ViT is a zero-shot text-conditioned object detection model. OWL-ViT uses [CLIP](clip) as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection.
[`OwlViTFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].
[`OwlViTImageProcessor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTImageProcessor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].
```python

View File

@ -39,7 +39,7 @@ Tips:
- The quickest way to get started with ViLT is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ViLT)
(which showcase both inference and fine-tuning on custom data).
- ViLT is a model that takes both `pixel_values` and `input_ids` as input. One can use [`ViltProcessor`] to prepare data for the model.
This processor wraps a feature extractor (for the image modality) and a tokenizer (for the language modality) into one.
This processor wraps a image processor (for the image modality) and a tokenizer (for the language modality) into one.
- ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to
under 640 while preserving the aspect ratio. To make batching of images possible, the authors use a `pixel_mask` that indicates
which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you.

View File

@ -462,9 +462,9 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco
>>> class CocoDetection(torchvision.datasets.CocoDetection):
... def __init__(self, img_folder, feature_extractor, ann_file):
... def __init__(self, img_folder, image_processor, ann_file):
... super().__init__(img_folder, ann_file)
... self.feature_extractor = feature_extractor
... self.image_processor = image_processor
... def __getitem__(self, idx):
... # read in PIL image and target in COCO format
@ -474,7 +474,7 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco
... # resizing + normalization of both image and target)
... image_id = self.ids[idx]
... target = {"image_id": image_id, "annotations": target}
... encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
... target = encoding["labels"][0] # remove batch dimension
@ -591,4 +591,3 @@ Let's plot the result:
<div class="flex justify-center">
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
</div>

View File

@ -73,12 +73,12 @@ Cada clase de alimento - o label - corresponde a un número; `79` indica una cos
## Preprocesa
Carga el feature extractor de ViT para procesar la imagen en un tensor:
Carga el image processor de ViT para procesar la imagen en un tensor:
```py
>>> from transformers import AutoFeatureExtractor
>>> from transformers import AutoImageProcessor
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
```
Aplica varias transformaciones de imagen al dataset para hacer el modelo más robusto contra el overfitting. En este caso se utilizará el módulo [`transforms`](https://pytorch.org/vision/stable/transforms.html) de torchvision. Recorta una parte aleatoria de la imagen, cambia su tamaño y normalízala con la media y la desviación estándar de la imagen:
@ -86,8 +86,8 @@ Aplica varias transformaciones de imagen al dataset para hacer el modelo más ro
```py
>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor
>>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
>>> _transforms = Compose([RandomResizedCrop(feature_extractor.size), ToTensor(), normalize])
>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
>>> _transforms = Compose([RandomResizedCrop(image_processor.size["height"]), ToTensor(), normalize])
```
Crea una función de preprocesamiento que aplique las transformaciones y devuelva los `pixel_values` - los inputs al modelo - de la imagen:
@ -160,7 +160,7 @@ Al llegar a este punto, solo quedan tres pasos:
... data_collator=data_collator,
... train_dataset=food["train"],
... eval_dataset=food["test"],
... tokenizer=feature_extractor,
... tokenizer=image_processor,
... )
>>> trainer.train()

View File

@ -454,9 +454,9 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로
>>> class CocoDetection(torchvision.datasets.CocoDetection):
... def __init__(self, img_folder, feature_extractor, ann_file):
... def __init__(self, img_folder, image_processor, ann_file):
... super().__init__(img_folder, ann_file)
... self.feature_extractor = feature_extractor
... self.image_processor = image_processor
... def __getitem__(self, idx):
... # read in PIL image and target in COCO format
@ -466,7 +466,7 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로
... # resizing + normalization of both image and target)
... image_id = self.ids[idx]
... target = {"image_id": image_id, "annotations": target}
... encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
... target = encoding["labels"][0] # remove batch dimension
@ -586,4 +586,3 @@ Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.
<div class="flex justify-center">
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
</div>

View File

@ -354,12 +354,12 @@ def convert_align_checkpoint(checkpoint_path, pytorch_dump_folder_path, save_mod
# Create folder to save model
if not os.path.isdir(pytorch_dump_folder_path):
os.mkdir(pytorch_dump_folder_path)
# Save converted model and feature extractor
# Save converted model and image processor
hf_model.save_pretrained(pytorch_dump_folder_path)
processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
# Push model and feature extractor to hub
# Push model and image processor to hub
print("Pushing converted ALIGN to the hub...")
processor.push_to_hub("align-base")
hf_model.push_to_hub("align-base")
@ -381,7 +381,7 @@ if __name__ == "__main__":
help="Path to the output PyTorch model directory.",
)
parser.add_argument("--save_model", action="store_true", help="Save model to local")
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
args = parser.parse_args()
convert_align_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)

View File

@ -27,10 +27,10 @@ from PIL import Image
from transformers import (
BeitConfig,
BeitFeatureExtractor,
BeitForImageClassification,
BeitForMaskedImageModeling,
BeitForSemanticSegmentation,
BeitImageProcessor,
)
from transformers.image_utils import PILImageResampling
from transformers.utils import logging
@ -266,16 +266,16 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):
# Check outputs on an image
if is_semantic:
feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
image = Image.open(ds[0]["file"])
else:
feature_extractor = BeitFeatureExtractor(
image_processor = BeitImageProcessor(
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
)
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt")
encoding = image_processor(images=image, return_tensors="pt")
pixel_values = encoding["pixel_values"]
outputs = model(pixel_values)
@ -353,8 +353,8 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -468,7 +468,7 @@ class ChineseCLIPOnnxConfig(OnnxConfig):
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
)
image_input_dict = super().generate_dummy_inputs(
processor.feature_extractor, batch_size=batch_size, framework=framework
processor.image_processor, batch_size=batch_size, framework=framework
)
return {**text_input_dict, **image_input_dict}

View File

@ -449,7 +449,7 @@ class CLIPOnnxConfig(OnnxConfig):
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
)
image_input_dict = super().generate_dummy_inputs(
processor.feature_extractor, batch_size=batch_size, framework=framework
processor.image_processor, batch_size=batch_size, framework=framework
)
return {**text_input_dict, **image_input_dict}

View File

@ -28,7 +28,7 @@ from transformers import (
CLIPSegTextConfig,
CLIPSegVisionConfig,
CLIPTokenizer,
ViTFeatureExtractor,
ViTImageProcessor,
)
@ -185,9 +185,9 @@ def convert_clipseg_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_
if unexpected_keys != ["decoder.reduce.weight", "decoder.reduce.bias"]:
raise ValueError(f"Unexpected keys: {unexpected_keys}")
feature_extractor = ViTFeatureExtractor(size=352)
image_processor = ViTImageProcessor(size=352)
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPSegProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
processor = CLIPSegProcessor(image_processor=image_processor, tokenizer=tokenizer)
image = prepare_img()
text = ["a glass", "something to fill", "wood", "a jar"]

View File

@ -27,9 +27,9 @@ from PIL import Image
from transformers import (
ConditionalDetrConfig,
ConditionalDetrFeatureExtractor,
ConditionalDetrForObjectDetection,
ConditionalDetrForSegmentation,
ConditionalDetrImageProcessor,
)
from transformers.utils import logging
@ -244,13 +244,13 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
config.id2label = id2label
config.label2id = {v: k for k, v in id2label.items()}
# load feature extractor
# load image processor
format = "coco_panoptic" if is_panoptic else "coco_detection"
feature_extractor = ConditionalDetrFeatureExtractor(format=format)
image_processor = ConditionalDetrImageProcessor(format=format)
# prepare image
img = prepare_img()
encoding = feature_extractor(images=img, return_tensors="pt")
encoding = image_processor(images=img, return_tensors="pt")
pixel_values = encoding["pixel_values"]
logger.info(f"Converting model {model_name}...")
@ -302,11 +302,11 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
if is_panoptic:
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
# Save model and feature extractor
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
# Save model and image processor
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
feature_extractor.save_pretrained(pytorch_dump_folder_path)
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -26,7 +26,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import ConvNextConfig, ConvNextFeatureExtractor, ConvNextForImageClassification
from transformers import ConvNextConfig, ConvNextForImageClassification, ConvNextImageProcessor
from transformers.utils import logging
@ -144,10 +144,10 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
model.load_state_dict(state_dict)
model.eval()
# Check outputs on an image, prepared by ConvNextFeatureExtractor
# Check outputs on an image, prepared by ConvNextImageProcessor
size = 224 if "224" in checkpoint_url else 384
feature_extractor = ConvNextFeatureExtractor(size=size)
pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
image_processor = ConvNextImageProcessor(size=size)
pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values
logits = model(pixel_values).logits
@ -191,8 +191,8 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
print("Pushing model to the hub...")
model_name = "convnext"

View File

@ -24,7 +24,7 @@ from collections import OrderedDict
import torch
from huggingface_hub import cached_download, hf_hub_url
from transformers import AutoFeatureExtractor, CvtConfig, CvtForImageClassification
from transformers import AutoImageProcessor, CvtConfig, CvtForImageClassification
def embeddings(idx):
@ -307,8 +307,8 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo
config.embed_dim = [192, 768, 1024]
model = CvtForImageClassification(config)
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
feature_extractor.size["shortest_edge"] = image_size
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
image_processor.size["shortest_edge"] = image_size
original_weights = torch.load(cvt_file_name, map_location=torch.device("cpu"))
huggingface_weights = OrderedDict()
@ -329,7 +329,7 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo
model.load_state_dict(huggingface_weights)
model.save_pretrained(pytorch_dump_folder)
feature_extractor.save_pretrained(pytorch_dump_folder)
image_processor.save_pretrained(pytorch_dump_folder)
# Download the weights from zoo: https://1drv.ms/u/s!AhIXJn_J-blW9RzF3rMW7SsLHa8h?e=blQ0Al

View File

@ -8,7 +8,7 @@ from PIL import Image
from timm.models import create_model
from transformers import (
BeitFeatureExtractor,
BeitImageProcessor,
Data2VecVisionConfig,
Data2VecVisionForImageClassification,
Data2VecVisionModel,
@ -304,9 +304,9 @@ def main():
orig_model.eval()
# 3. Forward Beit model
feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
image = Image.open("../../../../tests/fixtures/tests_samples/COCO/000000039769.png")
encoding = feature_extractor(images=image, return_tensors="pt")
encoding = image_processor(images=image, return_tensors="pt")
pixel_values = encoding["pixel_values"]
orig_args = (pixel_values,) if is_finetuned else (pixel_values, None)
@ -354,7 +354,7 @@ def main():
# 7. Save
print(f"Saving to {args.hf_checkpoint_name}")
hf_model.save_pretrained(args.hf_checkpoint_name)
feature_extractor.save_pretrained(args.hf_checkpoint_name)
image_processor.save_pretrained(args.hf_checkpoint_name)
if __name__ == "__main__":

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import cached_download, hf_hub_url
from PIL import Image
from transformers import DeformableDetrConfig, DeformableDetrFeatureExtractor, DeformableDetrForObjectDetection
from transformers import DeformableDetrConfig, DeformableDetrForObjectDetection, DeformableDetrImageProcessor
from transformers.utils import logging
@ -115,12 +115,12 @@ def convert_deformable_detr_checkpoint(
config.id2label = id2label
config.label2id = {v: k for k, v in id2label.items()}
# load feature extractor
feature_extractor = DeformableDetrFeatureExtractor(format="coco_detection")
# load image processor
image_processor = DeformableDetrImageProcessor(format="coco_detection")
# prepare image
img = prepare_img()
encoding = feature_extractor(images=img, return_tensors="pt")
encoding = image_processor(images=img, return_tensors="pt")
pixel_values = encoding["pixel_values"]
logger.info("Converting model...")
@ -185,11 +185,11 @@ def convert_deformable_detr_checkpoint(
print("Everything ok!")
# Save model and feature extractor
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
# Save model and image processor
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
feature_extractor.save_pretrained(pytorch_dump_folder_path)
image_processor.save_pretrained(pytorch_dump_folder_path)
# Push to hub
if push_to_hub:

View File

@ -25,7 +25,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import DeiTConfig, DeiTFeatureExtractor, DeiTForImageClassificationWithTeacher
from transformers import DeiTConfig, DeiTForImageClassificationWithTeacher, DeiTImageProcessor
from transformers.utils import logging
@ -182,12 +182,12 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
model = DeiTForImageClassificationWithTeacher(config).eval()
model.load_state_dict(state_dict)
# Check outputs on an image, prepared by DeiTFeatureExtractor
# Check outputs on an image, prepared by DeiTImageProcessor
size = int(
(256 / 224) * config.image_size
) # to maintain same ratio w.r.t. 224 images, see https://github.com/facebookresearch/deit/blob/ab5715372db8c6cad5740714b2216d55aeae052e/datasets.py#L103
feature_extractor = DeiTFeatureExtractor(size=size, crop_size=config.image_size)
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
image_processor = DeiTImageProcessor(size=size, crop_size=config.image_size)
encoding = image_processor(images=prepare_img(), return_tensors="pt")
pixel_values = encoding["pixel_values"]
outputs = model(pixel_values)
@ -198,8 +198,8 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {deit_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -25,7 +25,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import DetrConfig, DetrFeatureExtractor, DetrForObjectDetection, DetrForSegmentation
from transformers import DetrConfig, DetrForObjectDetection, DetrForSegmentation, DetrImageProcessor
from transformers.utils import logging
@ -201,13 +201,13 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
config.id2label = id2label
config.label2id = {v: k for k, v in id2label.items()}
# load feature extractor
# load image processor
format = "coco_panoptic" if is_panoptic else "coco_detection"
feature_extractor = DetrFeatureExtractor(format=format)
image_processor = DetrImageProcessor(format=format)
# prepare image
img = prepare_img()
encoding = feature_extractor(images=img, return_tensors="pt")
encoding = image_processor(images=img, return_tensors="pt")
pixel_values = encoding["pixel_values"]
logger.info(f"Converting model {model_name}...")
@ -258,11 +258,11 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
if is_panoptic:
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
# Save model and feature extractor
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
# Save model and image processor
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
feature_extractor.save_pretrained(pytorch_dump_folder_path)
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -1341,8 +1341,7 @@ class DetrImageProcessor(BaseImageProcessor):
Args:
results (`List[Dict]`):
Results list obtained by [`~DetrFeatureExtractor.post_process`], to which "masks" results will be
added.
Results list obtained by [`~DetrImageProcessor.post_process`], to which "masks" results will be added.
outputs ([`DetrSegmentationOutput`]):
Raw outputs of the model.
orig_target_sizes (`torch.Tensor` of shape `(batch_size, 2)`):

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import BeitConfig, BeitFeatureExtractor, BeitForImageClassification, BeitForMaskedImageModeling
from transformers import BeitConfig, BeitForImageClassification, BeitForMaskedImageModeling, BeitImageProcessor
from transformers.image_utils import PILImageResampling
from transformers.utils import logging
@ -171,12 +171,12 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
model.load_state_dict(state_dict)
# Check outputs on an image
feature_extractor = BeitFeatureExtractor(
image_processor = BeitImageProcessor(
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
)
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt")
encoding = image_processor(images=image, return_tensors="pt")
pixel_values = encoding["pixel_values"]
outputs = model(pixel_values)
@ -189,18 +189,18 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
if has_lm_head:
model_name = "dit-base" if "base" in checkpoint_url else "dit-large"
else:
model_name = "dit-base-finetuned-rvlcdip" if "dit-b" in checkpoint_url else "dit-large-finetuned-rvlcdip"
feature_extractor.push_to_hub(
image_processor.push_to_hub(
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
organization="nielsr",
commit_message="Add feature extractor",
commit_message="Add image processor",
use_temp_dir=True,
)
model.push_to_hub(

View File

@ -21,7 +21,7 @@ from datasets import load_dataset
from donut import DonutModel
from transformers import (
DonutFeatureExtractor,
DonutImageProcessor,
DonutProcessor,
DonutSwinConfig,
DonutSwinModel,
@ -152,10 +152,10 @@ def convert_donut_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
image = dataset["test"][0]["image"].convert("RGB")
tokenizer = XLMRobertaTokenizerFast.from_pretrained(model_name, from_slow=True)
feature_extractor = DonutFeatureExtractor(
image_processor = DonutImageProcessor(
do_align_long_axis=original_model.config.align_long_axis, size=original_model.config.input_size[::-1]
)
processor = DonutProcessor(feature_extractor, tokenizer)
processor = DonutProcessor(image_processor, tokenizer)
pixel_values = processor(image, return_tensors="pt").pixel_values
if model_name == "naver-clova-ix/donut-base-finetuned-docvqa":

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import cached_download, hf_hub_url
from PIL import Image
from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
from transformers.utils import logging
@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
# Check outputs on an image
size = 480 if "ade" in checkpoint_url else 384
feature_extractor = DPTFeatureExtractor(size=size)
image_processor = DPTImageProcessor(size=size)
image = prepare_img()
encoding = feature_extractor(image, return_tensors="pt")
encoding = image_processor(image, return_tensors="pt")
# forward pass
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
@ -271,12 +271,12 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
model.push_to_hub("ybelkada/dpt-hybrid-midas")
feature_extractor.push_to_hub("ybelkada/dpt-hybrid-midas")
image_processor.push_to_hub("ybelkada/dpt-hybrid-midas")
if __name__ == "__main__":

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import cached_download, hf_hub_url
from PIL import Image
from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
from transformers.utils import logging
@ -211,10 +211,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
# Check outputs on an image
size = 480 if "ade" in checkpoint_url else 384
feature_extractor = DPTFeatureExtractor(size=size)
image_processor = DPTImageProcessor(size=size)
image = prepare_img()
encoding = feature_extractor(image, return_tensors="pt")
encoding = image_processor(image, return_tensors="pt")
# forward pass
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
@ -233,8 +233,8 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
print("Pushing model to hub...")
@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
commit_message="Add model",
use_temp_dir=True,
)
feature_extractor.push_to_hub(
image_processor.push_to_hub(
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
organization="nielsr",
commit_message="Add feature extractor",
commit_message="Add image processor",
use_temp_dir=True,
)

View File

@ -208,7 +208,7 @@ def convert_efficientformer_checkpoint(
)
processor.push_to_hub(
repo_id=f"Bearnardd/{pytorch_dump_path}",
commit_message="Add feature extractor",
commit_message="Add image processor",
use_temp_dir=True,
)
@ -234,12 +234,12 @@ if __name__ == "__main__":
"--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
)
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
parser.add_argument(
"--no-push_to_hub",
dest="push_to_hub",
action="store_false",
help="Do not push model and feature extractor to the hub",
help="Do not push model and image processor to the hub",
)
parser.set_defaults(push_to_hub=True)

View File

@ -537,8 +537,8 @@ EFFICIENTFORMER_START_DOCSTRING = r"""
EFFICIENTFORMER_INPUTS_DOCSTRING = r"""
Args:
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
Pixel values. Pixel values can be obtained using [`ViTFeatureExtractor`]. See
[`ViTFeatureExtractor.__call__`] for details.
Pixel values. Pixel values can be obtained using [`ViTImageProcessor`]. See
[`ViTImageProcessor.preprocess`] for details.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.

View File

@ -305,12 +305,12 @@ def convert_efficientnet_checkpoint(model_name, pytorch_dump_folder_path, save_m
# Create folder to save model
if not os.path.isdir(pytorch_dump_folder_path):
os.mkdir(pytorch_dump_folder_path)
# Save converted model and feature extractor
# Save converted model and image processor
hf_model.save_pretrained(pytorch_dump_folder_path)
preprocessor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
# Push model and feature extractor to hub
# Push model and image processor to hub
print(f"Pushing converted {model_name} to the hub...")
model_name = f"efficientnet-{model_name}"
preprocessor.push_to_hub(model_name)
@ -333,7 +333,7 @@ if __name__ == "__main__":
help="Path to the output PyTorch model directory.",
)
parser.add_argument("--save_model", action="store_true", help="Save model to local")
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
args = parser.parse_args()
convert_efficientnet_checkpoint(args.model_name, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)

View File

@ -23,7 +23,7 @@ import requests
import torch
from PIL import Image
from transformers import GLPNConfig, GLPNFeatureExtractor, GLPNForDepthEstimation
from transformers import GLPNConfig, GLPNForDepthEstimation, GLPNImageProcessor
from transformers.utils import logging
@ -131,12 +131,12 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h
# load GLPN configuration (Segformer-B4 size)
config = GLPNConfig(hidden_sizes=[64, 128, 320, 512], decoder_hidden_size=64, depths=[3, 8, 27, 3])
# load feature extractor (only resize + rescale)
feature_extractor = GLPNFeatureExtractor()
# load image processor (only resize + rescale)
image_processor = GLPNImageProcessor()
# prepare image
image = prepare_img()
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
logger.info("Converting model...")
@ -179,17 +179,17 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h
# finally, push to hub if required
if push_to_hub:
logger.info("Pushing model and feature extractor to the hub...")
logger.info("Pushing model and image processor to the hub...")
model.push_to_hub(
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
organization="nielsr",
commit_message="Add model",
use_temp_dir=True,
)
feature_extractor.push_to_hub(
image_processor.push_to_hub(
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
organization="nielsr",
commit_message="Add feature extractor",
commit_message="Add image processor",
use_temp_dir=True,
)

View File

@ -458,7 +458,7 @@ class GroupViTOnnxConfig(OnnxConfig):
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
)
image_input_dict = super().generate_dummy_inputs(
processor.feature_extractor, batch_size=batch_size, framework=framework
processor.image_processor, batch_size=batch_size, framework=framework
)
return {**text_input_dict, **image_input_dict}

View File

@ -81,7 +81,7 @@ class ImageGPTImageProcessor(BaseImageProcessor):
def __init__(
self,
# clusters is a first argument to maintain backwards compatibility with the old ImageGPTFeatureExtractor
# clusters is a first argument to maintain backwards compatibility with the old ImageGPTImageProcessor
clusters: Optional[Union[List[List[int]], np.ndarray]] = None,
do_resize: bool = True,
size: Dict[str, int] = None,

View File

@ -260,7 +260,7 @@ class LayoutLMv3OnnxConfig(OnnxConfig):
"""
# A dummy image is used so OCR should not be applied
setattr(processor.feature_extractor, "apply_ocr", False)
setattr(processor.image_processor, "apply_ocr", False)
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(

View File

@ -15,6 +15,7 @@
"""
Processor class for LayoutXLM.
"""
import warnings
from typing import List, Optional, Union
from ...processing_utils import ProcessorMixin
@ -24,26 +25,45 @@ from ...utils import TensorType
class LayoutXLMProcessor(ProcessorMixin):
r"""
Constructs a LayoutXLM processor which combines a LayoutXLM feature extractor and a LayoutXLM tokenizer into a
single processor.
Constructs a LayoutXLM processor which combines a LayoutXLM image processor and a LayoutXLM tokenizer into a single
processor.
[`LayoutXLMProcessor`] offers all the functionalities you need to prepare data for the model.
It first uses [`LayoutLMv2FeatureExtractor`] to resize document images to a fixed size, and optionally applies OCR
to get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
It first uses [`LayoutLMv2ImageProcessor`] to resize document images to a fixed size, and optionally applies OCR to
get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
[`LayoutXLMTokenizerFast`], which turns the words and bounding boxes into token-level `input_ids`,
`attention_mask`, `token_type_ids`, `bbox`. Optionally, one can provide integer `word_labels`, which are turned
into token-level `labels` for token classification tasks (such as FUNSD, CORD).
Args:
feature_extractor (`LayoutLMv2FeatureExtractor`):
An instance of [`LayoutLMv2FeatureExtractor`]. The feature extractor is a required input.
image_processor (`LayoutLMv2ImageProcessor`):
An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input.
tokenizer (`LayoutXLMTokenizer` or `LayoutXLMTokenizerFast`):
An instance of [`LayoutXLMTokenizer`] or [`LayoutXLMTokenizerFast`]. The tokenizer is a required input.
"""
feature_extractor_class = "LayoutLMv2FeatureExtractor"
attributes = ["image_processor", "tokenizer"]
image_processor_class = "LayoutLMv2ImageProcessor"
tokenizer_class = ("LayoutXLMTokenizer", "LayoutXLMTokenizerFast")
def __init__(self, image_processor=None, tokenizer=None, **kwargs):
if "feature_extractor" in kwargs:
warnings.warn(
"The `feature_extractor` argument is deprecated and will be removed in v5, use `image_processor`"
" instead.",
FutureWarning,
)
feature_extractor = kwargs.pop("feature_extractor")
image_processor = image_processor if image_processor is not None else feature_extractor
if image_processor is None:
raise ValueError("You need to specify an `image_processor`.")
if tokenizer is None:
raise ValueError("You need to specify a `tokenizer`.")
super().__init__(image_processor, tokenizer)
def __call__(
self,
images,
@ -68,37 +88,37 @@ class LayoutXLMProcessor(ProcessorMixin):
**kwargs,
) -> BatchEncoding:
"""
This method first forwards the `images` argument to [`~LayoutLMv2FeatureExtractor.__call__`]. In case
[`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
This method first forwards the `images` argument to [`~LayoutLMv2ImagePrpcessor.__call__`]. In case
[`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
bounding boxes along with the additional arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output,
together with resized `images`. In case [`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to
together with resized `images`. In case [`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to
`False`, it passes the words (`text`/``text_pair`) and `boxes` specified by the user along with the additional
arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output, together with resized `images``.
Please refer to the docstring of the above two methods for more information.
"""
# verify input
if self.feature_extractor.apply_ocr and (boxes is not None):
if self.image_processor.apply_ocr and (boxes is not None):
raise ValueError(
"You cannot provide bounding boxes "
"if you initialized the feature extractor with apply_ocr set to True."
"if you initialized the image processor with apply_ocr set to True."
)
if self.feature_extractor.apply_ocr and (word_labels is not None):
if self.image_processor.apply_ocr and (word_labels is not None):
raise ValueError(
"You cannot provide word labels if you initialized the feature extractor with apply_ocr set to True."
"You cannot provide word labels if you initialized the image processor with apply_ocr set to True."
)
if return_overflowing_tokens is True and return_offsets_mapping is False:
raise ValueError("You cannot return overflowing tokens without returning the offsets mapping.")
# first, apply the feature extractor
features = self.feature_extractor(images=images, return_tensors=return_tensors)
# first, apply the image processor
features = self.image_processor(images=images, return_tensors=return_tensors)
# second, apply the tokenizer
if text is not None and self.feature_extractor.apply_ocr and text_pair is None:
if text is not None and self.image_processor.apply_ocr and text_pair is None:
if isinstance(text, str):
text = [text] # add batch dimension (as the feature extractor always adds a batch dimension)
text = [text] # add batch dimension (as the image processor always adds a batch dimension)
text_pair = features["words"]
encoded_inputs = self.tokenizer(
@ -162,3 +182,19 @@ class LayoutXLMProcessor(ProcessorMixin):
@property
def model_input_names(self):
return ["input_ids", "bbox", "attention_mask", "image"]
@property
def feature_extractor_class(self):
warnings.warn(
"`feature_extractor_class` is deprecated and will be removed in v5. Use `image_processor_class` instead.",
FutureWarning,
)
return self.image_processor_class
@property
def feature_extractor(self):
warnings.warn(
"`feature_extractor` is deprecated and will be removed in v5. Use `image_processor` instead.",
FutureWarning,
)
return self.image_processor

View File

@ -25,7 +25,7 @@ import timm
import torch
from huggingface_hub import hf_hub_download
from transformers import LevitConfig, LevitFeatureExtractor, LevitForImageClassificationWithTeacher
from transformers import LevitConfig, LevitForImageClassificationWithTeacher, LevitImageProcessor
from transformers.utils import logging
@ -74,8 +74,8 @@ def convert_weight_and_push(
if push_to_hub:
our_model.save_pretrained(save_directory / checkpoint_name)
feature_extractor = LevitFeatureExtractor()
feature_extractor.save_pretrained(save_directory / checkpoint_name)
image_processor = LevitImageProcessor()
image_processor.save_pretrained(save_directory / checkpoint_name)
print(f"Pushed {checkpoint_name}")
@ -167,12 +167,12 @@ if __name__ == "__main__":
required=False,
help="Path to the output PyTorch model directory.",
)
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
parser.add_argument(
"--no-push_to_hub",
dest="push_to_hub",
action="store_false",
help="Do not push model and feature extractor to the hub",
help="Do not push model and image processor to the hub",
)
args = parser.parse_args()

View File

@ -192,7 +192,7 @@ class OriginalMask2FormerConfigToOursConverter:
return config
class OriginalMask2FormerConfigToFeatureExtractorConverter:
class OriginalMask2FormerConfigToImageProcessorConverter:
def __call__(self, original_config: object) -> Mask2FormerImageProcessor:
model = original_config.MODEL
model_input = original_config.INPUT
@ -846,7 +846,7 @@ class OriginalMask2FormerCheckpointToOursConverter:
def test(
original_model,
our_model: Mask2FormerForUniversalSegmentation,
feature_extractor: Mask2FormerImageProcessor,
image_processor: Mask2FormerImageProcessor,
tolerance: float,
):
with torch.no_grad():
@ -854,7 +854,7 @@ def test(
our_model = our_model.eval()
im = prepare_img()
x = feature_extractor(images=im, return_tensors="pt")["pixel_values"]
x = image_processor(images=im, return_tensors="pt")["pixel_values"]
original_model_backbone_features = original_model.backbone(x.clone())
our_model_output: Mask2FormerModelOutput = our_model.model(x.clone(), output_hidden_states=True)
@ -979,10 +979,10 @@ if __name__ == "__main__":
checkpoints_dir, config_dir
):
model_name = get_model_name(checkpoint_file)
feature_extractor = OriginalMask2FormerConfigToFeatureExtractorConverter()(
image_processor = OriginalMask2FormerConfigToImageProcessorConverter()(
setup_cfg(Args(config_file=config_file))
)
feature_extractor.size = {"height": 384, "width": 384}
image_processor.size = {"height": 384, "width": 384}
original_config = setup_cfg(Args(config_file=config_file))
mask2former_kwargs = OriginalMask2Former.from_config(original_config)
@ -1012,8 +1012,8 @@ if __name__ == "__main__":
tolerance = 3e-1
logger.info(f"🪄 Testing {model_name}...")
test(original_model, mask2former_for_segmentation, feature_extractor, tolerance)
test(original_model, mask2former_for_segmentation, image_processor, tolerance)
logger.info(f"🪄 Pushing {model_name} to hub...")
feature_extractor.push_to_hub(model_name)
image_processor.push_to_hub(model_name)
mask2former_for_segmentation.push_to_hub(model_name)

View File

@ -2106,8 +2106,8 @@ MASK2FORMER_START_DOCSTRING = r"""
MASK2FORMER_INPUTS_DOCSTRING = r"""
Args:
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
[`AutoFeatureExtractor.__call__`] for details.
Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
[`AutoImageProcessor.preprocess`] for details.
pixel_mask (`torch.LongTensor` of shape `(batch_size, height, width)`, *optional*):
Mask to avoid performing attention on padding pixel values. Mask values selected in `[0, 1]`:

View File

@ -29,7 +29,7 @@ from detectron2.projects.deeplab import add_deeplab_config
from PIL import Image
from torch import Tensor, nn
from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerFeatureExtractor
from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerImageProcessor
from transformers.models.maskformer.modeling_maskformer import (
MaskFormerConfig,
MaskFormerForInstanceSegmentation,
@ -164,13 +164,13 @@ class OriginalMaskFormerConfigToOursConverter:
return config
class OriginalMaskFormerConfigToFeatureExtractorConverter:
def __call__(self, original_config: object) -> MaskFormerFeatureExtractor:
class OriginalMaskFormerConfigToImageProcessorConverter:
def __call__(self, original_config: object) -> MaskFormerImageProcessor:
model = original_config.MODEL
model_input = original_config.INPUT
dataset_catalog = MetadataCatalog.get(original_config.DATASETS.TEST[0])
return MaskFormerFeatureExtractor(
return MaskFormerImageProcessor(
image_mean=(torch.tensor(model.PIXEL_MEAN) / 255).tolist(),
image_std=(torch.tensor(model.PIXEL_STD) / 255).tolist(),
size=model_input.MIN_SIZE_TEST,
@ -554,7 +554,7 @@ class OriginalMaskFormerCheckpointToOursConverter:
yield config, checkpoint
def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_extractor: MaskFormerFeatureExtractor):
def test(original_model, our_model: MaskFormerForInstanceSegmentation, image_processor: MaskFormerImageProcessor):
with torch.no_grad():
original_model = original_model.eval()
our_model = our_model.eval()
@ -600,7 +600,7 @@ def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_e
our_model_out: MaskFormerForInstanceSegmentationOutput = our_model(x)
our_segmentation = feature_extractor.post_process_segmentation(our_model_out, target_size=(384, 384))
our_segmentation = image_processor.post_process_segmentation(our_model_out, target_size=(384, 384))
assert torch.allclose(
original_segmentation, our_segmentation, atol=1e-3
@ -686,9 +686,7 @@ if __name__ == "__main__":
for config_file, checkpoint_file in OriginalMaskFormerCheckpointToOursConverter.using_dirs(
checkpoints_dir, config_dir
):
feature_extractor = OriginalMaskFormerConfigToFeatureExtractorConverter()(
setup_cfg(Args(config_file=config_file))
)
image_processor = OriginalMaskFormerConfigToImageProcessorConverter()(setup_cfg(Args(config_file=config_file)))
original_config = setup_cfg(Args(config_file=config_file))
mask_former_kwargs = OriginalMaskFormer.from_config(original_config)
@ -712,15 +710,15 @@ if __name__ == "__main__":
mask_former_for_instance_segmentation
)
test(original_model, mask_former_for_instance_segmentation, feature_extractor)
test(original_model, mask_former_for_instance_segmentation, image_processor)
model_name = get_name(checkpoint_file)
logger.info(f"🪄 Saving {model_name}")
feature_extractor.save_pretrained(save_directory / model_name)
image_processor.save_pretrained(save_directory / model_name)
mask_former_for_instance_segmentation.save_pretrained(save_directory / model_name)
feature_extractor.push_to_hub(
image_processor.push_to_hub(
repo_path_or_name=save_directory / model_name,
commit_message="Add model",
use_temp_dir=True,

View File

@ -26,7 +26,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, ResNetConfig
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, ResNetConfig
from transformers.utils import logging
@ -297,9 +297,9 @@ def convert_maskformer_checkpoint(
else:
ignore_index = 255
reduce_labels = True if "ade" in model_name else False
feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)
inputs = feature_extractor(image, return_tensors="pt")
inputs = image_processor(image, return_tensors="pt")
outputs = model(**inputs)
@ -340,15 +340,15 @@ def convert_maskformer_checkpoint(
print("Looks ok!")
if pytorch_dump_folder_path is not None:
print(f"Saving model and feature extractor of {model_name} to {pytorch_dump_folder_path}")
print(f"Saving model and image processor of {model_name} to {pytorch_dump_folder_path}")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
feature_extractor.save_pretrained(pytorch_dump_folder_path)
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
print(f"Pushing model and feature extractor of {model_name} to the hub...")
print(f"Pushing model and image processor of {model_name} to the hub...")
model.push_to_hub(f"facebook/{model_name}")
feature_extractor.push_to_hub(f"facebook/{model_name}")
image_processor.push_to_hub(f"facebook/{model_name}")
if __name__ == "__main__":

View File

@ -26,7 +26,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, SwinConfig
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, SwinConfig
from transformers.utils import logging
@ -278,9 +278,9 @@ def convert_maskformer_checkpoint(
else:
ignore_index = 255
reduce_labels = True if "ade" in model_name else False
feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)
inputs = feature_extractor(image, return_tensors="pt")
inputs = image_processor(image, return_tensors="pt")
outputs = model(**inputs)
@ -294,15 +294,15 @@ def convert_maskformer_checkpoint(
print("Looks ok!")
if pytorch_dump_folder_path is not None:
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
feature_extractor.save_pretrained(pytorch_dump_folder_path)
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
print("Pushing model and feature extractor to the hub...")
print("Pushing model and image processor to the hub...")
model.push_to_hub(f"nielsr/{model_name}")
feature_extractor.push_to_hub(f"nielsr/{model_name}")
image_processor.push_to_hub(f"nielsr/{model_name}")
if __name__ == "__main__":

View File

@ -27,8 +27,8 @@ from PIL import Image
from transformers import (
MobileNetV1Config,
MobileNetV1FeatureExtractor,
MobileNetV1ForImageClassification,
MobileNetV1ImageProcessor,
load_tf_weights_in_mobilenet_v1,
)
from transformers.utils import logging
@ -83,12 +83,12 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
# Load weights from TensorFlow checkpoint
load_tf_weights_in_mobilenet_v1(model, config, checkpoint_path)
# Check outputs on an image, prepared by MobileNetV1FeatureExtractor
feature_extractor = MobileNetV1FeatureExtractor(
# Check outputs on an image, prepared by MobileNetV1ImageProcessor
image_processor = MobileNetV1ImageProcessor(
crop_size={"width": config.image_size, "height": config.image_size},
size={"shortest_edge": config.image_size + 32},
)
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
encoding = image_processor(images=prepare_img(), return_tensors="pt")
outputs = model(**encoding)
logits = outputs.logits
@ -107,13 +107,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
print("Pushing to the hub...")
repo_id = "google/" + model_name
feature_extractor.push_to_hub(repo_id)
image_processor.push_to_hub(repo_id)
model.push_to_hub(repo_id)

View File

@ -99,11 +99,11 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
load_tf_weights_in_mobilenet_v2(model, config, checkpoint_path)
# Check outputs on an image, prepared by MobileNetV2ImageProcessor
feature_extractor = MobileNetV2ImageProcessor(
image_processor = MobileNetV2ImageProcessor(
crop_size={"width": config.image_size, "height": config.image_size},
size={"shortest_edge": config.image_size + 32},
)
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
encoding = image_processor(images=prepare_img(), return_tensors="pt")
outputs = model(**encoding)
logits = outputs.logits
@ -143,13 +143,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
print("Pushing to the hub...")
repo_id = "google/" + model_name
feature_extractor.push_to_hub(repo_id)
image_processor.push_to_hub(repo_id)
model.push_to_hub(repo_id)

View File

@ -26,9 +26,9 @@ from PIL import Image
from transformers import (
MobileViTConfig,
MobileViTFeatureExtractor,
MobileViTForImageClassification,
MobileViTForSemanticSegmentation,
MobileViTImageProcessor,
)
from transformers.utils import logging
@ -211,9 +211,9 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
new_state_dict = convert_state_dict(state_dict, model)
model.load_state_dict(new_state_dict)
# Check outputs on an image, prepared by MobileViTFeatureExtractor
feature_extractor = MobileViTFeatureExtractor(crop_size=config.image_size, size=config.image_size + 32)
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
# Check outputs on an image, prepared by MobileViTImageProcessor
image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
encoding = image_processor(images=prepare_img(), return_tensors="pt")
outputs = model(**encoding)
logits = outputs.logits
@ -265,8 +265,8 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {mobilevit_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
model_mapping = {
@ -280,7 +280,7 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
print("Pushing to the hub...")
model_name = model_mapping[mobilevit_name]
feature_extractor.push_to_hub(model_name, organization="apple")
image_processor.push_to_hub(model_name, organization="apple")
model.push_to_hub(model_name, organization="apple")

View File

@ -259,8 +259,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
model.load_state_dict(state_dict)
# Check outputs on an image, prepared by MobileViTImageProcessor
feature_extractor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
encoding = image_processor(images=prepare_img(), return_tensors="pt")
outputs = model(**encoding)
# verify classification model
@ -276,8 +276,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {task_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -383,7 +383,7 @@ class OwlViTOnnxConfig(OnnxConfig):
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
)
image_input_dict = super().generate_dummy_inputs(
processor.feature_extractor, batch_size=batch_size, framework=framework
processor.image_processor, batch_size=batch_size, framework=framework
)
return {**text_input_dict, **image_input_dict}

View File

@ -29,8 +29,8 @@ from huggingface_hub import Repository
from transformers import (
CLIPTokenizer,
OwlViTConfig,
OwlViTFeatureExtractor,
OwlViTForObjectDetection,
OwlViTImageProcessor,
OwlViTModel,
OwlViTProcessor,
)
@ -350,16 +350,16 @@ def convert_owlvit_checkpoint(pt_backbone, flax_params, attn_params, pytorch_dum
# Save HF model
hf_model.save_pretrained(repo.local_dir)
# Initialize feature extractor
feature_extractor = OwlViTFeatureExtractor(
# Initialize image processor
image_processor = OwlViTImageProcessor(
size=config.vision_config.image_size, crop_size=config.vision_config.image_size
)
# Initialize tokenizer
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32", pad_token="!", model_max_length=16)
# Initialize processor
processor = OwlViTProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
feature_extractor.save_pretrained(repo.local_dir)
processor = OwlViTProcessor(image_processor=image_processor, tokenizer=tokenizer)
image_processor.save_pretrained(repo.local_dir)
processor.save_pretrained(repo.local_dir)
repo.git_add()

View File

@ -29,13 +29,13 @@ from PIL import Image
from transformers import (
PerceiverConfig,
PerceiverFeatureExtractor,
PerceiverForImageClassificationConvProcessing,
PerceiverForImageClassificationFourier,
PerceiverForImageClassificationLearned,
PerceiverForMaskedLM,
PerceiverForMultimodalAutoencoding,
PerceiverForOpticalFlow,
PerceiverImageProcessor,
PerceiverTokenizer,
)
from transformers.utils import logging
@ -389,9 +389,9 @@ def convert_perceiver_checkpoint(pickle_file, pytorch_dump_folder_path, architec
inputs = encoding.input_ids
input_mask = encoding.attention_mask
elif architecture in ["image_classification", "image_classification_fourier", "image_classification_conv"]:
feature_extractor = PerceiverFeatureExtractor()
image_processor = PerceiverImageProcessor()
image = prepare_img()
encoding = feature_extractor(image, return_tensors="pt")
encoding = image_processor(image, return_tensors="pt")
inputs = encoding.pixel_values
elif architecture == "optical_flow":
inputs = torch.randn(1, 2, 27, 368, 496)

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import PoolFormerConfig, PoolFormerFeatureExtractor, PoolFormerForImageClassification
from transformers import PoolFormerConfig, PoolFormerForImageClassification, PoolFormerImageProcessor
from transformers.utils import logging
@ -141,12 +141,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
else:
raise ValueError(f"Size {size} not supported")
# load feature extractor
feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
# load image processor
image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)
# Prepare image
image = prepare_img()
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
logger.info(f"Converting model {model_name}...")
@ -161,9 +161,9 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
model.load_state_dict(state_dict)
model.eval()
# Define feature extractor
feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
# Define image processor
image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)
pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values
# forward pass
outputs = model(pixel_values)
@ -187,12 +187,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
assert logits.shape == expected_shape
assert torch.allclose(logits[0, :3], expected_slice, atol=1e-2)
# finally, save model and feature extractor
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
# finally, save model and image processor
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -34,7 +34,7 @@ from huggingface_hub import cached_download, hf_hub_url
from torch import Tensor
from vissl.models.model_helpers import get_trunk_forward_outputs
from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
from transformers.modeling_utils import PreTrainedModel
from transformers.utils import logging
@ -262,10 +262,10 @@ def convert_weights_and_push(save_directory: Path, model_name: str = None, push_
)
size = 384
# we can use the convnext one
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
feature_extractor.push_to_hub(
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
image_processor.push_to_hub(
repo_path_or_name=save_directory / model_name,
commit_message="Add feature extractor",
commit_message="Add image processor",
output_dir=save_directory / model_name,
)
@ -294,7 +294,7 @@ if __name__ == "__main__":
default=True,
type=bool,
required=False,
help="If True, push model and feature extractor to the hub.",
help="If True, push model and image processor to the hub.",
)
args = parser.parse_args()

View File

@ -30,7 +30,7 @@ from huggingface_hub import cached_download, hf_hub_url
from torch import Tensor
from vissl.models.model_helpers import get_trunk_forward_outputs
from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
from transformers.utils import logging
@ -209,10 +209,10 @@ def convert_weight_and_push(
size = 224 if "seer" not in name else 384
# we can use the convnext one
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
feature_extractor.push_to_hub(
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
image_processor.push_to_hub(
repo_path_or_name=save_directory / name,
commit_message="Add feature extractor",
commit_message="Add image processor",
use_temp_dir=True,
)
@ -449,7 +449,7 @@ if __name__ == "__main__":
default=True,
type=bool,
required=False,
help="If True, push model and feature extractor to the hub.",
help="If True, push model and image processor to the hub.",
)
args = parser.parse_args()

View File

@ -28,7 +28,7 @@ import torch.nn as nn
from huggingface_hub import hf_hub_download
from torch import Tensor
from transformers import AutoFeatureExtractor, ResNetConfig, ResNetForImageClassification
from transformers import AutoImageProcessor, ResNetConfig, ResNetForImageClassification
from transformers.utils import logging
@ -113,10 +113,10 @@ def convert_weight_and_push(name: str, config: ResNetConfig, save_directory: Pat
)
# we can use the convnext one
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
feature_extractor.push_to_hub(
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
image_processor.push_to_hub(
repo_path_or_name=save_directory / checkpoint_name,
commit_message="Add feature extractor",
commit_message="Add image processor",
use_temp_dir=True,
)
@ -191,7 +191,7 @@ if __name__ == "__main__":
default=True,
type=bool,
required=False,
help="If True, push model and feature extractor to the hub.",
help="If True, push model and image processor to the hub.",
)
args = parser.parse_args()

View File

@ -27,9 +27,9 @@ from PIL import Image
from transformers import (
SegformerConfig,
SegformerFeatureExtractor,
SegformerForImageClassification,
SegformerForSemanticSegmentation,
SegformerImageProcessor,
)
from transformers.utils import logging
@ -179,14 +179,14 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
else:
raise ValueError(f"Size {size} not supported")
# load feature extractor (only resize + normalize)
feature_extractor = SegformerFeatureExtractor(
# load image processor (only resize + normalize)
image_processor = SegformerImageProcessor(
image_scale=(512, 512), keep_ratio=False, align=False, do_random_crop=False
)
# prepare image
image = prepare_img()
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
logger.info(f"Converting model {model_name}...")
@ -362,11 +362,11 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
assert logits.shape == expected_shape
assert torch.allclose(logits[0, :3, :3, :3], expected_slice, atol=1e-2)
# finally, save model and feature extractor
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
# finally, save model and image processor
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
feature_extractor.save_pretrained(pytorch_dump_folder_path)
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -22,7 +22,7 @@ import requests
import torch
from PIL import Image
from transformers import SwinConfig, SwinForMaskedImageModeling, ViTFeatureExtractor
from transformers import SwinConfig, SwinForMaskedImageModeling, ViTImageProcessor
def get_swin_config(model_name):
@ -132,9 +132,9 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
feature_extractor = ViTFeatureExtractor(size={"height": 192, "width": 192})
image_processor = ViTImageProcessor(size={"height": 192, "width": 192})
image = Image.open(requests.get(url, stream=True).raw)
inputs = feature_extractor(images=image, return_tensors="pt")
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
@ -146,13 +146,13 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
print(f"Pushing model and feature extractor for {model_name} to hub")
print(f"Pushing model and image processor for {model_name} to hub")
model.push_to_hub(f"microsoft/{model_name}")
feature_extractor.push_to_hub(f"microsoft/{model_name}")
image_processor.push_to_hub(f"microsoft/{model_name}")
if __name__ == "__main__":

View File

@ -7,7 +7,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import AutoFeatureExtractor, SwinConfig, SwinForImageClassification
from transformers import AutoImageProcessor, SwinConfig, SwinForImageClassification
def get_swin_config(swin_name):
@ -140,9 +140,9 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
image = Image.open(requests.get(url, stream=True).raw)
inputs = feature_extractor(images=image, return_tensors="pt")
inputs = image_processor(images=image, return_tensors="pt")
timm_outs = timm_model(inputs["pixel_values"])
hf_outs = model(**inputs).logits
@ -152,8 +152,8 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):
print(f"Saving model {swin_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import AutoFeatureExtractor, Swinv2Config, Swinv2ForImageClassification
from transformers import AutoImageProcessor, Swinv2Config, Swinv2ForImageClassification
def get_swinv2_config(swinv2_name):
@ -180,9 +180,9 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
image = Image.open(requests.get(url, stream=True).raw)
inputs = feature_extractor(images=image, return_tensors="pt")
inputs = image_processor(images=image, return_tensors="pt")
timm_outs = timm_model(inputs["pixel_values"])
hf_outs = model(**inputs).logits
@ -192,8 +192,8 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):
print(f"Saving model {swinv2_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
model.push_to_hub(
repo_path_or_name=Path(pytorch_dump_folder_path, swinv2_name),

View File

@ -27,7 +27,7 @@ from huggingface_hub import hf_hub_download
from PIL import Image
from torchvision.transforms import functional as F
from transformers import DetrFeatureExtractor, TableTransformerConfig, TableTransformerForObjectDetection
from transformers import DetrImageProcessor, TableTransformerConfig, TableTransformerForObjectDetection
from transformers.utils import logging
@ -242,7 +242,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
config.id2label = id2label
config.label2id = {v: k for k, v in id2label.items()}
feature_extractor = DetrFeatureExtractor(
image_processor = DetrImageProcessor(
format="coco_detection", max_size=800 if "detection" in checkpoint_url else 1000
)
model = TableTransformerForObjectDetection(config)
@ -277,11 +277,11 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
print("Looks ok!")
if pytorch_dump_folder_path is not None:
# Save model and feature extractor
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
# Save model and image processor
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
model.save_pretrained(pytorch_dump_folder_path)
feature_extractor.save_pretrained(pytorch_dump_folder_path)
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
# Push model to HF hub
@ -292,7 +292,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
else "microsoft/table-transformer-structure-recognition"
)
model.push_to_hub(model_name)
feature_extractor.push_to_hub(model_name)
image_processor.push_to_hub(model_name)
if __name__ == "__main__":

View File

@ -22,7 +22,7 @@ import numpy as np
import torch
from huggingface_hub import hf_hub_download
from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEFeatureExtractor
from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEImageProcessor
def get_timesformer_config(model_name):
@ -156,9 +156,9 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
model.eval()
# verify model on basic input
feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
video = prepare_video()
inputs = feature_extractor(video[:8], return_tensors="pt")
inputs = image_processor(video[:8], return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
@ -215,8 +215,8 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
print("Logits ok!")
if pytorch_dump_folder_path is not None:
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
model.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:

View File

@ -513,8 +513,8 @@ TIMESFORMER_START_DOCSTRING = r"""
TIMESFORMER_INPUTS_DOCSTRING = r"""
Args:
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_frames, num_channels, height, width)`):
Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
[`VideoMAEFeatureExtractor.__call__`] for details.
Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
[`VideoMAEImageProcessor.preprocess`] for details.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned

View File

@ -29,7 +29,7 @@ from transformers import (
TrOCRProcessor,
VisionEncoderDecoderModel,
ViTConfig,
ViTFeatureExtractor,
ViTImageProcessor,
ViTModel,
)
from transformers.utils import logging
@ -182,9 +182,9 @@ def convert_tr_ocr_checkpoint(checkpoint_url, pytorch_dump_folder_path):
model.load_state_dict(state_dict)
# Check outputs on an image
feature_extractor = ViTFeatureExtractor(size=encoder_config.image_size)
image_processor = ViTImageProcessor(size=encoder_config.image_size)
tokenizer = RobertaTokenizer.from_pretrained("roberta-large")
processor = TrOCRProcessor(feature_extractor, tokenizer)
processor = TrOCRProcessor(image_processor, tokenizer)
pixel_values = processor(images=prepare_img(checkpoint_url), return_tensors="pt").pixel_values

View File

@ -30,7 +30,7 @@ import torch.nn as nn
from huggingface_hub import cached_download, hf_hub_download
from torch import Tensor
from transformers import AutoFeatureExtractor, VanConfig, VanForImageClassification
from transformers import AutoImageProcessor, VanConfig, VanForImageClassification
from transformers.models.van.modeling_van import VanLayerScaling
from transformers.utils import logging
@ -154,10 +154,10 @@ def convert_weight_and_push(
)
# we can use the convnext one
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
feature_extractor.push_to_hub(
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
image_processor.push_to_hub(
repo_path_or_name=save_directory / checkpoint_name,
commit_message="Add feature extractor",
commit_message="Add image processor",
use_temp_dir=True,
)
@ -277,7 +277,7 @@ if __name__ == "__main__":
default=True,
type=bool,
required=False,
help="If True, push model and feature extractor to the hub.",
help="If True, push model and image processor to the hub.",
)
args = parser.parse_args()

View File

@ -24,9 +24,9 @@ from huggingface_hub import hf_hub_download
from transformers import (
VideoMAEConfig,
VideoMAEFeatureExtractor,
VideoMAEForPreTraining,
VideoMAEForVideoClassification,
VideoMAEImageProcessor,
)
@ -198,9 +198,9 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
model.eval()
# verify model on basic input
feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
video = prepare_video()
inputs = feature_extractor(video, return_tensors="pt")
inputs = image_processor(video, return_tensors="pt")
if "finetuned" not in model_name:
local_path = hf_hub_download(repo_id="hf-internal-testing/bool-masked-pos", filename="bool_masked_pos.pt")
@ -288,8 +288,8 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
print("Loss ok!")
if pytorch_dump_folder_path is not None:
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
model.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:

View File

@ -27,11 +27,11 @@ from PIL import Image
from transformers import (
BertTokenizer,
ViltConfig,
ViltFeatureExtractor,
ViltForImageAndTextRetrieval,
ViltForImagesAndTextClassification,
ViltForMaskedLM,
ViltForQuestionAnswering,
ViltImageProcessor,
ViltProcessor,
)
from transformers.utils import logging
@ -223,9 +223,9 @@ def convert_vilt_checkpoint(checkpoint_url, pytorch_dump_folder_path):
model.load_state_dict(state_dict)
# Define processor
feature_extractor = ViltFeatureExtractor(size=384)
image_processor = ViltImageProcessor(size=384)
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
processor = ViltProcessor(feature_extractor, tokenizer)
processor = ViltProcessor(image_processor, tokenizer)
# Forward pass on example inputs (image + text)
if nlvr_model:

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
from transformers import ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
from transformers.utils import logging
@ -175,9 +175,9 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
model = ViTForImageClassification(config).eval()
model.load_state_dict(state_dict)
# Check outputs on an image, prepared by ViTFeatureExtractor
feature_extractor = ViTFeatureExtractor()
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
# Check outputs on an image, prepared by ViTImageProcessor
image_processor = ViTImageProcessor()
encoding = image_processor(images=prepare_img(), return_tensors="pt")
pixel_values = encoding["pixel_values"]
outputs = model(pixel_values)
@ -192,8 +192,8 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -25,7 +25,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import DeiTFeatureExtractor, ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
from transformers import DeiTImageProcessor, ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
from transformers.utils import logging
@ -208,12 +208,12 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
model = ViTForImageClassification(config).eval()
model.load_state_dict(state_dict)
# Check outputs on an image, prepared by ViTFeatureExtractor/DeiTFeatureExtractor
# Check outputs on an image, prepared by ViTImageProcessor/DeiTImageProcessor
if "deit" in vit_name:
feature_extractor = DeiTFeatureExtractor(size=config.image_size)
image_processor = DeiTImageProcessor(size=config.image_size)
else:
feature_extractor = ViTFeatureExtractor(size=config.image_size)
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
image_processor = ViTImageProcessor(size=config.image_size)
encoding = image_processor(images=prepare_img(), return_tensors="pt")
pixel_values = encoding["pixel_values"]
outputs = model(pixel_values)
@ -229,8 +229,8 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {vit_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -20,7 +20,7 @@ import requests
import torch
from PIL import Image
from transformers import ViTMAEConfig, ViTMAEFeatureExtractor, ViTMAEForPreTraining
from transformers import ViTMAEConfig, ViTMAEForPreTraining, ViTMAEImageProcessor
def rename_key(name):
@ -120,7 +120,7 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["model"]
feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
image_processor = ViTMAEImageProcessor(size=config.image_size)
new_state_dict = convert_state_dict(state_dict, config)
@ -130,8 +130,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
url = "https://user-images.githubusercontent.com/11435359/147738734-196fd92f-9260-48d5-ba7e-bf103d29364d.jpg"
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
inputs = feature_extractor(images=image, return_tensors="pt")
image_processor = ViTMAEImageProcessor(size=config.image_size)
inputs = image_processor(images=image, return_tensors="pt")
# forward pass
torch.manual_seed(2)
@ -157,8 +157,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -22,7 +22,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import ViTFeatureExtractor, ViTMSNConfig, ViTMSNModel
from transformers import ViTImageProcessor, ViTMSNConfig, ViTMSNModel
from transformers.image_utils import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
@ -180,7 +180,7 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["target_encoder"]
feature_extractor = ViTFeatureExtractor(size=config.image_size)
image_processor = ViTImageProcessor(size=config.image_size)
remove_projection_head(state_dict)
rename_keys = create_rename_keys(config, base_model=True)
@ -195,10 +195,10 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = ViTFeatureExtractor(
image_processor = ViTImageProcessor(
size=config.image_size, image_mean=IMAGENET_DEFAULT_MEAN, image_std=IMAGENET_DEFAULT_STD
)
inputs = feature_extractor(images=image, return_tensors="pt")
inputs = image_processor(images=image, return_tensors="pt")
# forward pass
torch.manual_seed(2)
@ -224,8 +224,8 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
print(f"Saving model to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if __name__ == "__main__":

View File

@ -23,7 +23,7 @@ from huggingface_hub import hf_hub_download
from transformers import (
CLIPTokenizer,
CLIPTokenizerFast,
VideoMAEFeatureExtractor,
VideoMAEImageProcessor,
XCLIPConfig,
XCLIPModel,
XCLIPProcessor,
@ -291,10 +291,10 @@ def convert_xclip_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
model.eval()
size = 336 if model_name == "xclip-large-patch14-16-frames" else 224
feature_extractor = VideoMAEFeatureExtractor(size=size)
image_processor = VideoMAEImageProcessor(size=size)
slow_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
fast_tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-base-patch32")
processor = XCLIPProcessor(feature_extractor=feature_extractor, tokenizer=fast_tokenizer)
processor = XCLIPProcessor(image_processor=image_processor, tokenizer=fast_tokenizer)
video = prepare_video(num_frames)
inputs = processor(

View File

@ -24,7 +24,7 @@ import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from transformers import YolosConfig, YolosFeatureExtractor, YolosForObjectDetection
from transformers import YolosConfig, YolosForObjectDetection, YolosImageProcessor
from transformers.utils import logging
@ -172,10 +172,10 @@ def convert_yolos_checkpoint(
new_state_dict = convert_state_dict(state_dict, model)
model.load_state_dict(new_state_dict)
# Check outputs on an image, prepared by YolosFeatureExtractor
# Check outputs on an image, prepared by YolosImageProcessor
size = 800 if yolos_name != "yolos_ti" else 512
feature_extractor = YolosFeatureExtractor(format="coco_detection", size=size)
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
image_processor = YolosImageProcessor(format="coco_detection", size=size)
encoding = image_processor(images=prepare_img(), return_tensors="pt")
outputs = model(**encoding)
logits, pred_boxes = outputs.logits, outputs.pred_boxes
@ -224,8 +224,8 @@ def convert_yolos_checkpoint(
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
print(f"Saving model {yolos_name} to {pytorch_dump_folder_path}")
model.save_pretrained(pytorch_dump_folder_path)
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
feature_extractor.save_pretrained(pytorch_dump_folder_path)
print(f"Saving image processor to {pytorch_dump_folder_path}")
image_processor.save_pretrained(pytorch_dump_folder_path)
if push_to_hub:
model_mapping = {
@ -238,7 +238,7 @@ def convert_yolos_checkpoint(
print("Pushing to the hub...")
model_name = model_mapping[yolos_name]
feature_extractor.push_to_hub(model_name, organization="hustvl")
image_processor.push_to_hub(model_name, organization="hustvl")
model.push_to_hub(model_name, organization="hustvl")

View File

@ -19,7 +19,7 @@ from pathlib import Path
from packaging import version
from .. import AutoFeatureExtractor, AutoProcessor, AutoTokenizer
from .. import AutoFeatureExtractor, AutoImageProcessor, AutoProcessor, AutoTokenizer
from ..utils import logging
from ..utils.import_utils import is_optimum_available
from .convert import export, validate_model_outputs
@ -145,6 +145,8 @@ def export_with_transformers(args):
preprocessor = get_preprocessor(args.model)
elif args.preprocessor == "tokenizer":
preprocessor = AutoTokenizer.from_pretrained(args.model)
elif args.preprocessor == "image_processor":
preprocessor = AutoImageProcessor.from_pretrained(args.model)
elif args.preprocessor == "feature_extractor":
preprocessor = AutoFeatureExtractor.from_pretrained(args.model)
elif args.preprocessor == "processor":
@ -213,7 +215,7 @@ def main():
parser.add_argument(
"--preprocessor",
type=str,
choices=["auto", "tokenizer", "feature_extractor", "processor"],
choices=["auto", "tokenizer", "feature_extractor", "image_processor", "processor"],
default="auto",
help="Which type of preprocessor to use. 'auto' tries to automatically detect it.",
)

View File

@ -49,7 +49,7 @@ if is_vision_available():
import PIL
from PIL import Image
from transformers import BeitFeatureExtractor
from transformers import BeitImageProcessor
class BeitModelTester:
@ -342,18 +342,16 @@ def prepare_img():
@require_vision
class BeitModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return (
BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
)
def default_image_processor(self):
return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
@slow
def test_inference_masked_image_modeling_head(self):
model = BeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)
# prepare bool_masked_pos
bool_masked_pos = torch.ones((1, 196), dtype=torch.bool).to(torch_device)
@ -377,9 +375,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
def test_inference_image_classification_head_imagenet_1k(self):
model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -403,9 +401,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
torch_device
)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -428,11 +426,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
model = model.to(torch_device)
feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
image = Image.open(ds[0]["file"])
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -471,11 +469,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
model = model.to(torch_device)
feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
image = Image.open(ds[0]["file"])
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -483,10 +481,10 @@ class BeitModelIntegrationTest(unittest.TestCase):
outputs.logits = outputs.logits.detach().cpu()
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
expected_shape = torch.Size((500, 300))
self.assertEqual(segmentation[0].shape, expected_shape)
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
expected_shape = torch.Size((160, 160))
self.assertEqual(segmentation[0].shape, expected_shape)

View File

@ -33,7 +33,7 @@ if is_flax_available():
if is_vision_available():
from PIL import Image
from transformers import BeitFeatureExtractor
from transformers import BeitImageProcessor
class FlaxBeitModelTester(unittest.TestCase):
@ -219,18 +219,16 @@ def prepare_img():
@require_flax
class FlaxBeitModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return (
BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
)
def default_image_processor(self):
return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
@slow
def test_inference_masked_image_modeling_head(self):
model = FlaxBeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k")
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
pixel_values = feature_extractor(images=image, return_tensors="np").pixel_values
pixel_values = image_processor(images=image, return_tensors="np").pixel_values
# prepare bool_masked_pos
bool_masked_pos = np.ones((1, 196), dtype=bool)
@ -253,9 +251,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
def test_inference_image_classification_head_imagenet_1k(self):
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="np")
inputs = image_processor(images=image, return_tensors="np")
# forward pass
outputs = model(**inputs)
@ -276,9 +274,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
def test_inference_image_classification_head_imagenet_22k(self):
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-large-patch16-224-pt22k-ft22k")
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="np")
inputs = image_processor(images=image, return_tensors="np")
# forward pass
outputs = model(**inputs)

View File

@ -297,7 +297,7 @@ def prepare_img():
@require_vision
class BitModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
BitImageProcessor.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]) if is_vision_available() else None
)
@ -306,9 +306,9 @@ class BitModelIntegrationTest(unittest.TestCase):
def test_inference_image_classification_head(self):
model = BitForImageClassification.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -145,7 +145,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
pass
def test_call_pil(self):
# Initialize feature_extractor
# Initialize image processor
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PIL images
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False)
@ -176,7 +176,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
)
def test_call_numpy(self):
# Initialize feature_extractor
# Initialize image processor
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random numpy tensors
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, numpify=True)
@ -207,7 +207,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
)
def test_call_pytorch(self):
# Initialize feature_extractor
# Initialize image processor
image_processing = self.image_processing_class(**self.image_processor_dict)
# create random PyTorch tensors
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, torchify=True)
@ -238,7 +238,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
)
def test_equivalence_pad_and_create_pixel_mask(self):
# Initialize feature_extractors
# Initialize image processors
image_processing_1 = self.image_processing_class(**self.image_processor_dict)
image_processing_2 = self.image_processing_class(do_resize=False, do_normalize=False, do_rescale=False)
# create random PyTorch tensors

View File

@ -43,7 +43,7 @@ if is_timm_available():
if is_vision_available():
from PIL import Image
from transformers import ConditionalDetrFeatureExtractor
from transformers import ConditionalDetrImageProcessor
class ConditionalDetrModelTester:
@ -493,9 +493,9 @@ def prepare_img():
@slow
class ConditionalDetrModelIntegrationTests(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
ConditionalDetrFeatureExtractor.from_pretrained("microsoft/conditional-detr-resnet-50")
ConditionalDetrImageProcessor.from_pretrained("microsoft/conditional-detr-resnet-50")
if is_vision_available()
else None
)
@ -503,9 +503,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
def test_inference_no_head(self):
model = ConditionalDetrModel.from_pretrained("microsoft/conditional-detr-resnet-50").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
with torch.no_grad():
outputs = model(**encoding)
@ -522,9 +522,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
torch_device
)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
pixel_values = encoding["pixel_values"].to(torch_device)
pixel_mask = encoding["pixel_mask"].to(torch_device)
@ -547,7 +547,7 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
# verify postprocessing
results = feature_extractor.post_process_object_detection(
results = image_processor.post_process_object_detection(
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
)[0]
expected_scores = torch.tensor([0.8330, 0.8313, 0.8039, 0.6829, 0.5355]).to(torch_device)

View File

@ -38,7 +38,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import AutoFeatureExtractor
from transformers import AutoImageProcessor
class ConvNextModelTester:
@ -285,16 +285,16 @@ def prepare_img():
@require_vision
class ConvNextModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return AutoFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
def default_image_processor(self):
return AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
@slow
def test_inference_image_classification_head(self):
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -38,7 +38,7 @@ if is_tf_available():
if is_vision_available():
from PIL import Image
from transformers import ConvNextFeatureExtractor
from transformers import ConvNextImageProcessor
class TFConvNextModelTester:
@ -279,18 +279,16 @@ def prepare_img():
@require_vision
class TFConvNextModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return (
ConvNextFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
)
def default_image_processor(self):
return ConvNextImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
@slow
def test_inference_image_classification_head(self):
model = TFConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224")
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="tf")
inputs = image_processor(images=image, return_tensors="tf")
# forward pass
outputs = model(**inputs)

View File

@ -38,7 +38,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import AutoFeatureExtractor
from transformers import AutoImageProcessor
class CvtConfigTester(ConfigTester):
@ -264,16 +264,16 @@ def prepare_img():
@require_vision
class CvtModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return AutoFeatureExtractor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
def default_image_processor(self):
return AutoImageProcessor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
@slow
def test_inference_image_classification_head(self):
model = CvtForImageClassification.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -28,7 +28,7 @@ if is_tf_available():
if is_vision_available():
from PIL import Image
from transformers import AutoFeatureExtractor
from transformers import AutoImageProcessor
class TFCvtConfigTester(ConfigTester):
@ -265,16 +265,16 @@ def prepare_img():
@require_vision
class TFCvtModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return AutoFeatureExtractor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
def default_image_processor(self):
return AutoImageProcessor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
@slow
def test_inference_image_classification_head(self):
model = TFCvtForImageClassification.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="tf")
inputs = image_processor(images=image, return_tensors="tf")
# forward pass
outputs = model(**inputs)

View File

@ -44,7 +44,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import BeitFeatureExtractor
from transformers import BeitImageProcessor
class Data2VecVisionModelTester:
@ -327,11 +327,9 @@ def prepare_img():
@require_vision
class Data2VecVisionModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
if is_vision_available()
else None
BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
)
@slow
@ -340,9 +338,9 @@ class Data2VecVisionModelIntegrationTest(unittest.TestCase):
torch_device
)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -46,7 +46,7 @@ if is_tf_available():
if is_vision_available():
from PIL import Image
from transformers import BeitFeatureExtractor
from transformers import BeitImageProcessor
class TFData2VecVisionModelTester:
@ -469,20 +469,18 @@ def prepare_img():
@require_vision
class TFData2VecVisionModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
if is_vision_available()
else None
BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
)
@slow
def test_inference_image_classification_head_imagenet_1k(self):
model = TFData2VecVisionForImageClassification.from_pretrained("facebook/data2vec-vision-base-ft1k")
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="tf")
inputs = image_processor(images=image, return_tensors="tf")
# forward pass
outputs = model(**inputs)

View File

@ -39,7 +39,7 @@ if is_timm_available():
if is_vision_available():
from PIL import Image
from transformers import AutoFeatureExtractor
from transformers import AutoImageProcessor
class DeformableDetrModelTester:
@ -563,15 +563,15 @@ def prepare_img():
@slow
class DeformableDetrModelIntegrationTests(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return AutoFeatureExtractor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None
def default_image_processor(self):
return AutoImageProcessor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None
def test_inference_object_detection_head(self):
model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
pixel_values = encoding["pixel_values"].to(torch_device)
pixel_mask = encoding["pixel_mask"].to(torch_device)
@ -595,7 +595,7 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
# verify postprocessing
results = feature_extractor.post_process_object_detection(
results = image_processor.post_process_object_detection(
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
)[0]
expected_scores = torch.tensor([0.7999, 0.7894, 0.6331, 0.4720, 0.4382]).to(torch_device)
@ -612,9 +612,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
"SenseTime/deformable-detr-with-box-refine-two-stage"
).to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
pixel_values = encoding["pixel_values"].to(torch_device)
pixel_mask = encoding["pixel_mask"].to(torch_device)
@ -639,9 +639,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
@require_torch_gpu
def test_inference_object_detection_head_equivalence_cpu_gpu(self):
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt")
encoding = image_processor(images=image, return_tensors="pt")
pixel_values = encoding["pixel_values"]
pixel_mask = encoding["pixel_mask"]

View File

@ -55,7 +55,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import DeiTFeatureExtractor
from transformers import DeiTImageProcessor
class DeiTModelTester:
@ -381,9 +381,9 @@ def prepare_img():
@require_vision
class DeiTModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
if is_vision_available()
else None
)
@ -394,9 +394,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
torch_device
)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -420,10 +420,10 @@ class DeiTModelIntegrationTest(unittest.TestCase):
model = DeiTModel.from_pretrained(
"facebook/deit-base-distilled-patch16-224", torch_dtype=torch.float16, device_map="auto"
)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt")
inputs = image_processor(images=image, return_tensors="pt")
pixel_values = inputs.pixel_values.to(torch_device)
# forward pass to make sure inference works in fp16

View File

@ -46,7 +46,7 @@ if is_tf_available():
if is_vision_available():
from PIL import Image
from transformers import DeiTFeatureExtractor
from transformers import DeiTImageProcessor
class TFDeiTModelTester:
@ -266,9 +266,9 @@ def prepare_img():
@require_vision
class DeiTModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
if is_vision_available()
else None
)
@ -277,9 +277,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
def test_inference_image_classification_head(self):
model = TFDeiTForImageClassificationWithTeacher.from_pretrained("facebook/deit-base-distilled-patch16-224")
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="tf")
inputs = image_processor(images=image, return_tensors="tf")
# forward pass
outputs = model(**inputs)

View File

@ -38,7 +38,7 @@ if is_timm_available():
if is_vision_available():
from PIL import Image
from transformers import DetrFeatureExtractor
from transformers import DetrImageProcessor
class DetrModelTester:
@ -512,15 +512,15 @@ def prepare_img():
@slow
class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None
def default_image_processor(self):
return DetrImageProcessor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None
def test_inference_no_head(self):
model = DetrModel.from_pretrained("facebook/detr-resnet-50").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
with torch.no_grad():
outputs = model(**encoding)
@ -535,9 +535,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
def test_inference_object_detection_head(self):
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
pixel_values = encoding["pixel_values"].to(torch_device)
pixel_mask = encoding["pixel_mask"].to(torch_device)
@ -560,7 +560,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
# verify postprocessing
results = feature_extractor.post_process_object_detection(
results = image_processor.post_process_object_detection(
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
)[0]
expected_scores = torch.tensor([0.9982, 0.9960, 0.9955, 0.9988, 0.9987]).to(torch_device)
@ -575,9 +575,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
def test_inference_panoptic_segmentation_head(self):
model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50-panoptic").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
pixel_values = encoding["pixel_values"].to(torch_device)
pixel_mask = encoding["pixel_mask"].to(torch_device)
@ -607,7 +607,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
self.assertTrue(torch.allclose(outputs.pred_masks[0, 0, :3, :3], expected_slice_masks, atol=1e-3))
# verify postprocessing
results = feature_extractor.post_process_panoptic_segmentation(
results = image_processor.post_process_panoptic_segmentation(
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
)[0]
@ -633,9 +633,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
@slow
class DetrModelIntegrationTests(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
if is_vision_available()
else None
)
@ -643,9 +643,9 @@ class DetrModelIntegrationTests(unittest.TestCase):
def test_inference_no_head(self):
model = DetrModel.from_pretrained("facebook/detr-resnet-50", revision="no_timm").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
with torch.no_grad():
outputs = model(**encoding)

View File

@ -367,16 +367,16 @@ class DinatModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
@require_torch
class DinatModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return AutoImageProcessor.from_pretrained("shi-labs/dinat-mini-in1k-224") if is_vision_available() else None
@slow
def test_inference_image_classification_head(self):
model = DinatForImageClassification.from_pretrained("shi-labs/dinat-mini-in1k-224").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -25,7 +25,7 @@ if is_torch_available():
from transformers import AutoModelForImageClassification
if is_vision_available():
from transformers import AutoFeatureExtractor
from transformers import AutoImageProcessor
@require_torch
@ -33,7 +33,7 @@ if is_vision_available():
class DiTIntegrationTest(unittest.TestCase):
@slow
def test_for_image_classification(self):
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
image_processor = AutoImageProcessor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
model = AutoModelForImageClassification.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
model.to(torch_device)
@ -43,7 +43,7 @@ class DiTIntegrationTest(unittest.TestCase):
image = dataset["train"][0]["image"].convert("RGB")
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
inputs = image_processor(image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -39,7 +39,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import DPTFeatureExtractor
from transformers import DPTImageProcessor
class DPTModelTester:
@ -293,11 +293,11 @@ def prepare_img():
@slow
class DPTModelIntegrationTest(unittest.TestCase):
def test_inference_depth_estimation(self):
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large")
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large")
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large").to(torch_device)
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -315,11 +315,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
self.assertTrue(torch.allclose(outputs.predicted_depth[0, :3, :3], expected_slice, atol=1e-4))
def test_inference_semantic_segmentation(self):
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -336,11 +336,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
self.assertTrue(torch.allclose(outputs.logits[0, 0, :3, :3], expected_slice, atol=1e-4))
def test_post_processing_semantic_segmentation(self):
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -348,10 +348,10 @@ class DPTModelIntegrationTest(unittest.TestCase):
outputs.logits = outputs.logits.detach().cpu()
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
expected_shape = torch.Size((500, 300))
self.assertEqual(segmentation[0].shape, expected_shape)
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
expected_shape = torch.Size((480, 480))
self.assertEqual(segmentation[0].shape, expected_shape)

View File

@ -39,7 +39,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import DPTFeatureExtractor
from transformers import DPTImageProcessor
class DPTModelTester:
@ -314,11 +314,11 @@ def prepare_img():
@slow
class DPTModelIntegrationTest(unittest.TestCase):
def test_inference_depth_estimation(self):
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas")
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(torch_device)
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -444,7 +444,7 @@ def prepare_img():
@require_vision
class EfficientFormerModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return (
EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
if is_vision_available()
@ -457,9 +457,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
torch_device
)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
@ -478,9 +478,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
"snap-research/efficientformer-l1-300"
).to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -37,7 +37,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import GLPNFeatureExtractor
from transformers import GLPNImageProcessor
class GLPNConfigTester(ConfigTester):
@ -337,11 +337,11 @@ def prepare_img():
class GLPNModelIntegrationTest(unittest.TestCase):
@slow
def test_inference_depth_estimation(self):
feature_extractor = GLPNFeatureExtractor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
image_processor = GLPNImageProcessor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
model = GLPNForDepthEstimation.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -49,7 +49,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import ImageGPTFeatureExtractor
from transformers import ImageGPTImageProcessor
class ImageGPTModelTester:
@ -535,16 +535,16 @@ def prepare_img():
@require_vision
class ImageGPTModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return ImageGPTFeatureExtractor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None
def default_image_processor(self):
return ImageGPTImageProcessor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None
@slow
def test_inference_causal_lm_head(self):
model = ImageGPTForCausalImageModeling.from_pretrained("openai/imagegpt-small").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -45,7 +45,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import LayoutLMv3FeatureExtractor
from transformers import LayoutLMv3ImageProcessor
class LayoutLMv3ModelTester:
@ -382,16 +382,16 @@ def prepare_img():
@require_torch
class LayoutLMv3ModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
def default_image_processor(self):
return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None
@slow
def test_inference_no_head(self):
model = LayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base").to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)
input_ids = torch.tensor([[1, 2]])
bbox = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]]).unsqueeze(0)

View File

@ -51,7 +51,7 @@ if is_tf_available():
if is_vision_available():
from PIL import Image
from transformers import LayoutLMv3FeatureExtractor
from transformers import LayoutLMv3ImageProcessor
class TFLayoutLMv3ModelTester:
@ -482,16 +482,16 @@ def prepare_img():
@require_tf
class TFLayoutLMv3ModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
def default_image_processor(self):
return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None
@slow
def test_inference_no_head(self):
model = TFLayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base")
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
pixel_values = feature_extractor(images=image, return_tensors="tf").pixel_values
pixel_values = image_processor(images=image, return_tensors="tf").pixel_values
input_ids = tf.constant([[1, 2]])
bbox = tf.expand_dims(tf.constant([[1, 2, 3, 4], [5, 6, 7, 8]]), axis=0)

View File

@ -36,7 +36,7 @@ from transformers.utils import FEATURE_EXTRACTOR_NAME, cached_property, is_pytes
if is_pytesseract_available():
from PIL import Image
from transformers import LayoutLMv2FeatureExtractor, LayoutXLMProcessor
from transformers import LayoutLMv2ImageProcessor, LayoutXLMProcessor
@require_pytesseract
@ -47,7 +47,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
rust_tokenizer_class = LayoutXLMTokenizerFast
def setUp(self):
feature_extractor_map = {
image_processor_map = {
"do_resize": True,
"size": 224,
"apply_ocr": True,
@ -56,7 +56,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
self.tmpdirname = tempfile.mkdtemp()
self.feature_extraction_file = os.path.join(self.tmpdirname, FEATURE_EXTRACTOR_NAME)
with open(self.feature_extraction_file, "w", encoding="utf-8") as fp:
fp.write(json.dumps(feature_extractor_map) + "\n")
fp.write(json.dumps(image_processor_map) + "\n")
# taken from `test_tokenization_layoutxlm.LayoutXLMTokenizationTest.test_save_pretrained`
self.tokenizer_pretrained_name = "hf-internal-testing/tiny-random-layoutxlm"
@ -70,8 +70,8 @@ class LayoutXLMProcessorTest(unittest.TestCase):
def get_tokenizers(self, **kwargs) -> List[PreTrainedTokenizerBase]:
return [self.get_tokenizer(**kwargs), self.get_rust_tokenizer(**kwargs)]
def get_feature_extractor(self, **kwargs):
return LayoutLMv2FeatureExtractor.from_pretrained(self.tmpdirname, **kwargs)
def get_image_processor(self, **kwargs):
return LayoutLMv2ImageProcessor.from_pretrained(self.tmpdirname, **kwargs)
def tearDown(self):
shutil.rmtree(self.tmpdirname)
@ -88,10 +88,10 @@ class LayoutXLMProcessorTest(unittest.TestCase):
return image_inputs
def test_save_load_pretrained_default(self):
feature_extractor = self.get_feature_extractor()
image_processor = self.get_image_processor()
tokenizers = self.get_tokenizers()
for tokenizer in tokenizers:
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
processor.save_pretrained(self.tmpdirname)
processor = LayoutXLMProcessor.from_pretrained(self.tmpdirname)
@ -99,16 +99,16 @@ class LayoutXLMProcessorTest(unittest.TestCase):
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer.get_vocab())
self.assertIsInstance(processor.tokenizer, (LayoutXLMTokenizer, LayoutXLMTokenizerFast))
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor.to_json_string())
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
self.assertEqual(processor.image_processor.to_json_string(), image_processor.to_json_string())
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
def test_save_load_pretrained_additional_features(self):
processor = LayoutXLMProcessor(feature_extractor=self.get_feature_extractor(), tokenizer=self.get_tokenizer())
processor = LayoutXLMProcessor(image_processor=self.get_image_processor(), tokenizer=self.get_tokenizer())
processor.save_pretrained(self.tmpdirname)
# slow tokenizer
tokenizer_add_kwargs = self.get_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)
processor = LayoutXLMProcessor.from_pretrained(
self.tmpdirname,
@ -122,12 +122,12 @@ class LayoutXLMProcessorTest(unittest.TestCase):
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizer)
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
# fast tokenizer
tokenizer_add_kwargs = self.get_rust_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)
processor = LayoutXLMProcessor.from_pretrained(
self.tmpdirname, use_xlm=True, bos_token="(BOS)", eos_token="(EOS)", do_resize=False, size=30
@ -136,14 +136,14 @@ class LayoutXLMProcessorTest(unittest.TestCase):
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizerFast)
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
def test_model_input_names(self):
feature_extractor = self.get_feature_extractor()
image_processor = self.get_image_processor()
tokenizer = self.get_tokenizer()
processor = LayoutXLMProcessor(tokenizer=tokenizer, feature_extractor=feature_extractor)
processor = LayoutXLMProcessor(tokenizer=tokenizer, image_processor=image_processor)
input_str = "lower newer"
image_input = self.prepare_image_inputs()
@ -215,15 +215,15 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
def test_processor_case_1(self):
# case 1: document image classification (training, inference) + token classification (inference), apply_ocr = True
feature_extractor = LayoutLMv2FeatureExtractor()
image_processor = LayoutLMv2ImageProcessor()
tokenizers = self.get_tokenizers
images = self.get_images
for tokenizer in tokenizers:
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
# not batched
input_feat_extract = feature_extractor(images[0], return_tensors="pt")
input_feat_extract = image_processor(images[0], return_tensors="pt")
input_processor = processor(images[0], return_tensors="pt")
# verify keys
@ -245,7 +245,7 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
self.assertSequenceEqual(decoding, expected_decoding)
# batched
input_feat_extract = feature_extractor(images, return_tensors="pt")
input_feat_extract = image_processor(images, return_tensors="pt")
input_processor = processor(images, padding=True, return_tensors="pt")
# verify keys
@ -270,12 +270,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
def test_processor_case_2(self):
# case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
tokenizers = self.get_tokenizers
images = self.get_images
for tokenizer in tokenizers:
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
# not batched
words = ["hello", "world"]
@ -324,12 +324,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
def test_processor_case_3(self):
# case 3: token classification (training), apply_ocr=False
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
tokenizers = self.get_tokenizers
images = self.get_images
for tokenizer in tokenizers:
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
# not batched
words = ["weirdly", "world"]
@ -389,12 +389,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
def test_processor_case_4(self):
# case 4: visual question answering (inference), apply_ocr=True
feature_extractor = LayoutLMv2FeatureExtractor()
image_processor = LayoutLMv2ImageProcessor()
tokenizers = self.get_tokenizers
images = self.get_images
for tokenizer in tokenizers:
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
# not batched
question = "What's his name?"
@ -440,12 +440,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
def test_processor_case_5(self):
# case 5: visual question answering (inference), apply_ocr=False
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
tokenizers = self.get_tokenizers
images = self.get_images
for tokenizer in tokenizers:
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
# not batched
question = "What's his name?"

View File

@ -46,7 +46,7 @@ if is_torch_available():
if is_vision_available():
from PIL import Image
from transformers import LevitFeatureExtractor
from transformers import LevitImageProcessor
class LevitConfigTester(ConfigTester):
@ -409,8 +409,8 @@ def prepare_img():
@require_vision
class LevitModelIntegrationTest(unittest.TestCase):
@cached_property
def default_feature_extractor(self):
return LevitFeatureExtractor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
def default_image_processor(self):
return LevitImageProcessor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
@slow
def test_inference_image_classification_head(self):
@ -418,9 +418,9 @@ class LevitModelIntegrationTest(unittest.TestCase):
torch_device
)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():

View File

@ -545,9 +545,9 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
self.assertEqual(segmentation[0].shape, target_sizes[0])
def test_post_process_instance_segmentation(self):
feature_extractor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
image_processor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
outputs = self.image_processor_tester.get_fake_mask2former_outputs()
segmentation = feature_extractor.post_process_instance_segmentation(outputs, threshold=0)
segmentation = image_processor.post_process_instance_segmentation(outputs, threshold=0)
self.assertTrue(len(segmentation) == self.image_processor_tester.batch_size)
for el in segmentation:
@ -556,7 +556,7 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
self.assertEqual(type(el["segments_info"]), list)
self.assertEqual(el["segmentation"].shape, (384, 384))
segmentation = feature_extractor.post_process_instance_segmentation(
segmentation = image_processor.post_process_instance_segmentation(
outputs, threshold=0, return_binary_maps=True
)

View File

@ -325,14 +325,14 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
return "facebook/mask2former-swin-small-coco-instance"
@cached_property
def default_feature_extractor(self):
def default_image_processor(self):
return Mask2FormerImageProcessor.from_pretrained(self.model_checkpoints) if is_vision_available() else None
def test_inference_no_head(self):
model = Mask2FormerModel.from_pretrained(self.model_checkpoints).to(torch_device)
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
inputs = image_processor(image, return_tensors="pt").to(torch_device)
inputs_shape = inputs["pixel_values"].shape
# check size is divisible by 32
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
@ -371,9 +371,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
def test_inference_universal_segmentation_head(self):
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
image = prepare_img()
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
inputs = image_processor(image, return_tensors="pt").to(torch_device)
inputs_shape = inputs["pixel_values"].shape
# check size is divisible by 32
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
@ -408,9 +408,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
def test_with_segmentation_maps_and_loss(self):
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
feature_extractor = self.default_feature_extractor
image_processor = self.default_image_processor
inputs = feature_extractor(
inputs = image_processor(
[np.zeros((3, 800, 1333)), np.zeros((3, 800, 1333))],
segmentation_maps=[np.zeros((384, 384)).astype(np.float32), np.zeros((384, 384)).astype(np.float32)],
return_tensors="pt",

Some files were not shown because too many files have changed in this diff Show More