mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
Update old existing feature extractor references (#24552)
* Update old existing feature extractor references * Typo * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Address comments from review - update 'feature extractor' Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
This commit is contained in:
parent
10c2ac7bc6
commit
ae454f41d4
@ -354,12 +354,12 @@ Als Nächstes sehen Sie sich das Bild mit dem Merkmal 🤗 Datensätze [Bild] (h
|
||||
|
||||
### Merkmalsextraktor
|
||||
|
||||
Laden Sie den Merkmalsextraktor mit [`AutoFeatureExtractor.from_pretrained`]:
|
||||
Laden Sie den Merkmalsextraktor mit [`AutoImageProcessor.from_pretrained`]:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoFeatureExtractor
|
||||
>>> from transformers import AutoImageProcessor
|
||||
|
||||
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
|
||||
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
|
||||
```
|
||||
|
||||
### Datenerweiterung
|
||||
@ -371,9 +371,9 @@ Bei Bildverarbeitungsaufgaben ist es üblich, den Bildern als Teil der Vorverarb
|
||||
```py
|
||||
>>> from torchvision.transforms import Compose, Normalize, RandomResizedCrop, ColorJitter, ToTensor
|
||||
|
||||
>>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
|
||||
>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
|
||||
>>> _transforms = Compose(
|
||||
... [RandomResizedCrop(feature_extractor.size), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
|
||||
... [RandomResizedCrop(image_processor.size["height"]), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
|
||||
... )
|
||||
```
|
||||
|
||||
|
@ -263,7 +263,7 @@ To use, create an image processor associated with the model you're using. For ex
|
||||
ViTImageProcessor {
|
||||
"do_normalize": true,
|
||||
"do_resize": true,
|
||||
"feature_extractor_type": "ViTImageProcessor",
|
||||
"image_processor_type": "ViTImageProcessor",
|
||||
"image_mean": [
|
||||
0.5,
|
||||
0.5,
|
||||
@ -295,7 +295,7 @@ Modify any of the [`ViTImageProcessor`] parameters to create your custom image p
|
||||
ViTImageProcessor {
|
||||
"do_normalize": false,
|
||||
"do_resize": true,
|
||||
"feature_extractor_type": "ViTImageProcessor",
|
||||
"image_processor_type": "ViTImageProcessor",
|
||||
"image_mean": [
|
||||
0.3,
|
||||
0.3,
|
||||
|
@ -50,10 +50,10 @@ product between the projected image and text features is then used as a similar
|
||||
To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
|
||||
which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
|
||||
also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
|
||||
The [`CLIPFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model.
|
||||
The [`CLIPImageProcessor`] can be used to resize (or rescale) and normalize images for the model.
|
||||
|
||||
The [`CLIPTokenizer`] is used to encode the text. The [`CLIPProcessor`] wraps
|
||||
[`CLIPFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both
|
||||
[`CLIPImageProcessor`] and [`CLIPTokenizer`] into a single instance to both
|
||||
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
|
||||
[`CLIPProcessor`] and [`CLIPModel`].
|
||||
|
||||
|
@ -46,9 +46,9 @@ Tips:
|
||||
Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
|
||||
[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.
|
||||
|
||||
The [`DonutFeatureExtractor`] class is responsible for preprocessing the input image and
|
||||
The [`DonutImageProcessor`] class is responsible for preprocessing the input image and
|
||||
[`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
|
||||
[`DonutProcessor`] wraps [`DonutFeatureExtractor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
|
||||
[`DonutProcessor`] wraps [`DonutImageProcessor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
|
||||
into a single instance to both extract the input features and decode the predicted token ids.
|
||||
|
||||
- Step-by-step Document Image Classification
|
||||
|
@ -150,23 +150,23 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
|
||||
## Usage: LayoutLMv2Processor
|
||||
|
||||
The easiest way to prepare data for the model is to use [`LayoutLMv2Processor`], which internally
|
||||
combines a feature extractor ([`LayoutLMv2FeatureExtractor`]) and a tokenizer
|
||||
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The feature extractor
|
||||
combines a image processor ([`LayoutLMv2ImageProcessor`]) and a tokenizer
|
||||
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The image processor
|
||||
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
|
||||
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
|
||||
modality.
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
|
||||
from transformers import LayoutLMv2ImageProcessor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor() # apply_ocr is set to True by default
|
||||
image_processor = LayoutLMv2ImageProcessor() # apply_ocr is set to True by default
|
||||
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
|
||||
processor = LayoutLMv2Processor(image_processor, tokenizer)
|
||||
```
|
||||
|
||||
In short, one can provide a document image (and possibly additional data) to [`LayoutLMv2Processor`],
|
||||
and it will create the inputs expected by the model. Internally, the processor first uses
|
||||
[`LayoutLMv2FeatureExtractor`] to apply OCR on the image to get a list of words and normalized
|
||||
[`LayoutLMv2ImageProcessor`] to apply OCR on the image to get a list of words and normalized
|
||||
bounding boxes, as well to resize the image to a given size in order to get the `image` input. The words and
|
||||
normalized bounding boxes are then provided to [`LayoutLMv2Tokenizer`] or
|
||||
[`LayoutLMv2TokenizerFast`], which converts them to token-level `input_ids`,
|
||||
@ -176,7 +176,7 @@ which are turned into token-level `labels`.
|
||||
[`LayoutLMv2Processor`] uses [PyTesseract](https://pypi.org/project/pytesseract/), a Python
|
||||
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
|
||||
choice, and provide the words and normalized boxes yourself. This requires initializing
|
||||
[`LayoutLMv2FeatureExtractor`] with `apply_ocr` set to `False`.
|
||||
[`LayoutLMv2ImageProcessor`] with `apply_ocr` set to `False`.
|
||||
|
||||
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
|
||||
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
|
||||
@ -184,7 +184,7 @@ use cases work for both batched and non-batched inputs (we illustrate them for n
|
||||
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
|
||||
True**
|
||||
|
||||
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
|
||||
This is the simplest case, in which the processor (actually the image processor) will perform OCR on the image to get
|
||||
the words and normalized bounding boxes.
|
||||
|
||||
```python
|
||||
@ -205,7 +205,7 @@ print(encoding.keys())
|
||||
|
||||
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
|
||||
|
||||
In case one wants to do OCR themselves, one can initialize the feature extractor with `apply_ocr` set to
|
||||
In case one wants to do OCR themselves, one can initialize the image processor with `apply_ocr` set to
|
||||
`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
|
||||
the processor.
|
||||
|
||||
|
@ -31,7 +31,7 @@ Tips:
|
||||
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
|
||||
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
|
||||
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
|
||||
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
|
||||
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3ImageProcessor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
|
||||
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
|
||||
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
|
||||
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
|
||||
|
@ -52,7 +52,7 @@ tokenizer = LayoutXLMTokenizer.from_pretrained("microsoft/layoutxlm-base")
|
||||
```
|
||||
|
||||
Similar to LayoutLMv2, you can use [`LayoutXLMProcessor`] (which internally applies
|
||||
[`LayoutLMv2FeatureExtractor`] and
|
||||
[`LayoutLMv2ImageProcessor`] and
|
||||
[`LayoutXLMTokenizer`]/[`LayoutXLMTokenizerFast`] in sequence) to prepare all
|
||||
data for the model.
|
||||
|
||||
|
@ -28,7 +28,7 @@ The abstract from the paper is the following:
|
||||
|
||||
OWL-ViT is a zero-shot text-conditioned object detection model. OWL-ViT uses [CLIP](clip) as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection.
|
||||
|
||||
[`OwlViTFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].
|
||||
[`OwlViTImageProcessor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTImageProcessor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].
|
||||
|
||||
|
||||
```python
|
||||
|
@ -39,7 +39,7 @@ Tips:
|
||||
- The quickest way to get started with ViLT is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ViLT)
|
||||
(which showcase both inference and fine-tuning on custom data).
|
||||
- ViLT is a model that takes both `pixel_values` and `input_ids` as input. One can use [`ViltProcessor`] to prepare data for the model.
|
||||
This processor wraps a feature extractor (for the image modality) and a tokenizer (for the language modality) into one.
|
||||
This processor wraps a image processor (for the image modality) and a tokenizer (for the language modality) into one.
|
||||
- ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to
|
||||
under 640 while preserving the aspect ratio. To make batching of images possible, the authors use a `pixel_mask` that indicates
|
||||
which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you.
|
||||
|
@ -462,9 +462,9 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco
|
||||
|
||||
|
||||
>>> class CocoDetection(torchvision.datasets.CocoDetection):
|
||||
... def __init__(self, img_folder, feature_extractor, ann_file):
|
||||
... def __init__(self, img_folder, image_processor, ann_file):
|
||||
... super().__init__(img_folder, ann_file)
|
||||
... self.feature_extractor = feature_extractor
|
||||
... self.image_processor = image_processor
|
||||
|
||||
... def __getitem__(self, idx):
|
||||
... # read in PIL image and target in COCO format
|
||||
@ -474,7 +474,7 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco
|
||||
... # resizing + normalization of both image and target)
|
||||
... image_id = self.ids[idx]
|
||||
... target = {"image_id": image_id, "annotations": target}
|
||||
... encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
|
||||
... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
|
||||
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
|
||||
... target = encoding["labels"][0] # remove batch dimension
|
||||
|
||||
@ -591,4 +591,3 @@ Let's plot the result:
|
||||
<div class="flex justify-center">
|
||||
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
|
||||
</div>
|
||||
|
||||
|
@ -73,12 +73,12 @@ Cada clase de alimento - o label - corresponde a un número; `79` indica una cos
|
||||
|
||||
## Preprocesa
|
||||
|
||||
Carga el feature extractor de ViT para procesar la imagen en un tensor:
|
||||
Carga el image processor de ViT para procesar la imagen en un tensor:
|
||||
|
||||
```py
|
||||
>>> from transformers import AutoFeatureExtractor
|
||||
>>> from transformers import AutoImageProcessor
|
||||
|
||||
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
|
||||
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
|
||||
```
|
||||
|
||||
Aplica varias transformaciones de imagen al dataset para hacer el modelo más robusto contra el overfitting. En este caso se utilizará el módulo [`transforms`](https://pytorch.org/vision/stable/transforms.html) de torchvision. Recorta una parte aleatoria de la imagen, cambia su tamaño y normalízala con la media y la desviación estándar de la imagen:
|
||||
@ -86,8 +86,8 @@ Aplica varias transformaciones de imagen al dataset para hacer el modelo más ro
|
||||
```py
|
||||
>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor
|
||||
|
||||
>>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
|
||||
>>> _transforms = Compose([RandomResizedCrop(feature_extractor.size), ToTensor(), normalize])
|
||||
>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
|
||||
>>> _transforms = Compose([RandomResizedCrop(image_processor.size["height"]), ToTensor(), normalize])
|
||||
```
|
||||
|
||||
Crea una función de preprocesamiento que aplique las transformaciones y devuelva los `pixel_values` - los inputs al modelo - de la imagen:
|
||||
@ -160,7 +160,7 @@ Al llegar a este punto, solo quedan tres pasos:
|
||||
... data_collator=data_collator,
|
||||
... train_dataset=food["train"],
|
||||
... eval_dataset=food["test"],
|
||||
... tokenizer=feature_extractor,
|
||||
... tokenizer=image_processor,
|
||||
... )
|
||||
|
||||
>>> trainer.train()
|
||||
|
@ -454,9 +454,9 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로
|
||||
|
||||
|
||||
>>> class CocoDetection(torchvision.datasets.CocoDetection):
|
||||
... def __init__(self, img_folder, feature_extractor, ann_file):
|
||||
... def __init__(self, img_folder, image_processor, ann_file):
|
||||
... super().__init__(img_folder, ann_file)
|
||||
... self.feature_extractor = feature_extractor
|
||||
... self.image_processor = image_processor
|
||||
|
||||
... def __getitem__(self, idx):
|
||||
... # read in PIL image and target in COCO format
|
||||
@ -466,7 +466,7 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로
|
||||
... # resizing + normalization of both image and target)
|
||||
... image_id = self.ids[idx]
|
||||
... target = {"image_id": image_id, "annotations": target}
|
||||
... encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
|
||||
... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
|
||||
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
|
||||
... target = encoding["labels"][0] # remove batch dimension
|
||||
|
||||
@ -586,4 +586,3 @@ Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.
|
||||
<div class="flex justify-center">
|
||||
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
|
||||
</div>
|
||||
|
||||
|
@ -354,12 +354,12 @@ def convert_align_checkpoint(checkpoint_path, pytorch_dump_folder_path, save_mod
|
||||
# Create folder to save model
|
||||
if not os.path.isdir(pytorch_dump_folder_path):
|
||||
os.mkdir(pytorch_dump_folder_path)
|
||||
# Save converted model and feature extractor
|
||||
# Save converted model and image processor
|
||||
hf_model.save_pretrained(pytorch_dump_folder_path)
|
||||
processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
# Push model and feature extractor to hub
|
||||
# Push model and image processor to hub
|
||||
print("Pushing converted ALIGN to the hub...")
|
||||
processor.push_to_hub("align-base")
|
||||
hf_model.push_to_hub("align-base")
|
||||
@ -381,7 +381,7 @@ if __name__ == "__main__":
|
||||
help="Path to the output PyTorch model directory.",
|
||||
)
|
||||
parser.add_argument("--save_model", action="store_true", help="Save model to local")
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||
|
||||
args = parser.parse_args()
|
||||
convert_align_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
|
||||
|
@ -27,10 +27,10 @@ from PIL import Image
|
||||
|
||||
from transformers import (
|
||||
BeitConfig,
|
||||
BeitFeatureExtractor,
|
||||
BeitForImageClassification,
|
||||
BeitForMaskedImageModeling,
|
||||
BeitForSemanticSegmentation,
|
||||
BeitImageProcessor,
|
||||
)
|
||||
from transformers.image_utils import PILImageResampling
|
||||
from transformers.utils import logging
|
||||
@ -266,16 +266,16 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
|
||||
# Check outputs on an image
|
||||
if is_semantic:
|
||||
feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
|
||||
image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
|
||||
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
||||
image = Image.open(ds[0]["file"])
|
||||
else:
|
||||
feature_extractor = BeitFeatureExtractor(
|
||||
image_processor = BeitImageProcessor(
|
||||
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
|
||||
)
|
||||
image = prepare_img()
|
||||
|
||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
||||
encoding = image_processor(images=image, return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
|
||||
outputs = model(pixel_values)
|
||||
@ -353,8 +353,8 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -468,7 +468,7 @@ class ChineseCLIPOnnxConfig(OnnxConfig):
|
||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
||||
processor.image_processor, batch_size=batch_size, framework=framework
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
|
@ -449,7 +449,7 @@ class CLIPOnnxConfig(OnnxConfig):
|
||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
||||
processor.image_processor, batch_size=batch_size, framework=framework
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
|
@ -28,7 +28,7 @@ from transformers import (
|
||||
CLIPSegTextConfig,
|
||||
CLIPSegVisionConfig,
|
||||
CLIPTokenizer,
|
||||
ViTFeatureExtractor,
|
||||
ViTImageProcessor,
|
||||
)
|
||||
|
||||
|
||||
@ -185,9 +185,9 @@ def convert_clipseg_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_
|
||||
if unexpected_keys != ["decoder.reduce.weight", "decoder.reduce.bias"]:
|
||||
raise ValueError(f"Unexpected keys: {unexpected_keys}")
|
||||
|
||||
feature_extractor = ViTFeatureExtractor(size=352)
|
||||
image_processor = ViTImageProcessor(size=352)
|
||||
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
|
||||
processor = CLIPSegProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
processor = CLIPSegProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
|
||||
image = prepare_img()
|
||||
text = ["a glass", "something to fill", "wood", "a jar"]
|
||||
|
@ -27,9 +27,9 @@ from PIL import Image
|
||||
|
||||
from transformers import (
|
||||
ConditionalDetrConfig,
|
||||
ConditionalDetrFeatureExtractor,
|
||||
ConditionalDetrForObjectDetection,
|
||||
ConditionalDetrForSegmentation,
|
||||
ConditionalDetrImageProcessor,
|
||||
)
|
||||
from transformers.utils import logging
|
||||
|
||||
@ -244,13 +244,13 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
||||
config.id2label = id2label
|
||||
config.label2id = {v: k for k, v in id2label.items()}
|
||||
|
||||
# load feature extractor
|
||||
# load image processor
|
||||
format = "coco_panoptic" if is_panoptic else "coco_detection"
|
||||
feature_extractor = ConditionalDetrFeatureExtractor(format=format)
|
||||
image_processor = ConditionalDetrImageProcessor(format=format)
|
||||
|
||||
# prepare image
|
||||
img = prepare_img()
|
||||
encoding = feature_extractor(images=img, return_tensors="pt")
|
||||
encoding = image_processor(images=img, return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
|
||||
logger.info(f"Converting model {model_name}...")
|
||||
@ -302,11 +302,11 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
||||
if is_panoptic:
|
||||
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
|
||||
|
||||
# Save model and feature extractor
|
||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
||||
# Save model and image processor
|
||||
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -26,7 +26,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import ConvNextConfig, ConvNextFeatureExtractor, ConvNextForImageClassification
|
||||
from transformers import ConvNextConfig, ConvNextForImageClassification, ConvNextImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -144,10 +144,10 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
model.load_state_dict(state_dict)
|
||||
model.eval()
|
||||
|
||||
# Check outputs on an image, prepared by ConvNextFeatureExtractor
|
||||
# Check outputs on an image, prepared by ConvNextImageProcessor
|
||||
size = 224 if "224" in checkpoint_url else 384
|
||||
feature_extractor = ConvNextFeatureExtractor(size=size)
|
||||
pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
|
||||
image_processor = ConvNextImageProcessor(size=size)
|
||||
pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values
|
||||
|
||||
logits = model(pixel_values).logits
|
||||
|
||||
@ -191,8 +191,8 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
print("Pushing model to the hub...")
|
||||
model_name = "convnext"
|
||||
|
@ -24,7 +24,7 @@ from collections import OrderedDict
|
||||
import torch
|
||||
from huggingface_hub import cached_download, hf_hub_url
|
||||
|
||||
from transformers import AutoFeatureExtractor, CvtConfig, CvtForImageClassification
|
||||
from transformers import AutoImageProcessor, CvtConfig, CvtForImageClassification
|
||||
|
||||
|
||||
def embeddings(idx):
|
||||
@ -307,8 +307,8 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo
|
||||
config.embed_dim = [192, 768, 1024]
|
||||
|
||||
model = CvtForImageClassification(config)
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||
feature_extractor.size["shortest_edge"] = image_size
|
||||
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||
image_processor.size["shortest_edge"] = image_size
|
||||
original_weights = torch.load(cvt_file_name, map_location=torch.device("cpu"))
|
||||
|
||||
huggingface_weights = OrderedDict()
|
||||
@ -329,7 +329,7 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo
|
||||
|
||||
model.load_state_dict(huggingface_weights)
|
||||
model.save_pretrained(pytorch_dump_folder)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder)
|
||||
image_processor.save_pretrained(pytorch_dump_folder)
|
||||
|
||||
|
||||
# Download the weights from zoo: https://1drv.ms/u/s!AhIXJn_J-blW9RzF3rMW7SsLHa8h?e=blQ0Al
|
||||
|
@ -8,7 +8,7 @@ from PIL import Image
|
||||
from timm.models import create_model
|
||||
|
||||
from transformers import (
|
||||
BeitFeatureExtractor,
|
||||
BeitImageProcessor,
|
||||
Data2VecVisionConfig,
|
||||
Data2VecVisionForImageClassification,
|
||||
Data2VecVisionModel,
|
||||
@ -304,9 +304,9 @@ def main():
|
||||
orig_model.eval()
|
||||
|
||||
# 3. Forward Beit model
|
||||
feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
|
||||
image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
|
||||
image = Image.open("../../../../tests/fixtures/tests_samples/COCO/000000039769.png")
|
||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
||||
encoding = image_processor(images=image, return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
|
||||
orig_args = (pixel_values,) if is_finetuned else (pixel_values, None)
|
||||
@ -354,7 +354,7 @@ def main():
|
||||
# 7. Save
|
||||
print(f"Saving to {args.hf_checkpoint_name}")
|
||||
hf_model.save_pretrained(args.hf_checkpoint_name)
|
||||
feature_extractor.save_pretrained(args.hf_checkpoint_name)
|
||||
image_processor.save_pretrained(args.hf_checkpoint_name)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import cached_download, hf_hub_url
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DeformableDetrConfig, DeformableDetrFeatureExtractor, DeformableDetrForObjectDetection
|
||||
from transformers import DeformableDetrConfig, DeformableDetrForObjectDetection, DeformableDetrImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -115,12 +115,12 @@ def convert_deformable_detr_checkpoint(
|
||||
config.id2label = id2label
|
||||
config.label2id = {v: k for k, v in id2label.items()}
|
||||
|
||||
# load feature extractor
|
||||
feature_extractor = DeformableDetrFeatureExtractor(format="coco_detection")
|
||||
# load image processor
|
||||
image_processor = DeformableDetrImageProcessor(format="coco_detection")
|
||||
|
||||
# prepare image
|
||||
img = prepare_img()
|
||||
encoding = feature_extractor(images=img, return_tensors="pt")
|
||||
encoding = image_processor(images=img, return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
|
||||
logger.info("Converting model...")
|
||||
@ -185,11 +185,11 @@ def convert_deformable_detr_checkpoint(
|
||||
|
||||
print("Everything ok!")
|
||||
|
||||
# Save model and feature extractor
|
||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
||||
# Save model and image processor
|
||||
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
# Push to hub
|
||||
if push_to_hub:
|
||||
|
@ -25,7 +25,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DeiTConfig, DeiTFeatureExtractor, DeiTForImageClassificationWithTeacher
|
||||
from transformers import DeiTConfig, DeiTForImageClassificationWithTeacher, DeiTImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -182,12 +182,12 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
|
||||
model = DeiTForImageClassificationWithTeacher(config).eval()
|
||||
model.load_state_dict(state_dict)
|
||||
|
||||
# Check outputs on an image, prepared by DeiTFeatureExtractor
|
||||
# Check outputs on an image, prepared by DeiTImageProcessor
|
||||
size = int(
|
||||
(256 / 224) * config.image_size
|
||||
) # to maintain same ratio w.r.t. 224 images, see https://github.com/facebookresearch/deit/blob/ab5715372db8c6cad5740714b2216d55aeae052e/datasets.py#L103
|
||||
feature_extractor = DeiTFeatureExtractor(size=size, crop_size=config.image_size)
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
image_processor = DeiTImageProcessor(size=size, crop_size=config.image_size)
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
outputs = model(pixel_values)
|
||||
|
||||
@ -198,8 +198,8 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {deit_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -25,7 +25,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DetrConfig, DetrFeatureExtractor, DetrForObjectDetection, DetrForSegmentation
|
||||
from transformers import DetrConfig, DetrForObjectDetection, DetrForSegmentation, DetrImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -201,13 +201,13 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
||||
config.id2label = id2label
|
||||
config.label2id = {v: k for k, v in id2label.items()}
|
||||
|
||||
# load feature extractor
|
||||
# load image processor
|
||||
format = "coco_panoptic" if is_panoptic else "coco_detection"
|
||||
feature_extractor = DetrFeatureExtractor(format=format)
|
||||
image_processor = DetrImageProcessor(format=format)
|
||||
|
||||
# prepare image
|
||||
img = prepare_img()
|
||||
encoding = feature_extractor(images=img, return_tensors="pt")
|
||||
encoding = image_processor(images=img, return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
|
||||
logger.info(f"Converting model {model_name}...")
|
||||
@ -258,11 +258,11 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
||||
if is_panoptic:
|
||||
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
|
||||
|
||||
# Save model and feature extractor
|
||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
||||
# Save model and image processor
|
||||
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -1341,8 +1341,7 @@ class DetrImageProcessor(BaseImageProcessor):
|
||||
|
||||
Args:
|
||||
results (`List[Dict]`):
|
||||
Results list obtained by [`~DetrFeatureExtractor.post_process`], to which "masks" results will be
|
||||
added.
|
||||
Results list obtained by [`~DetrImageProcessor.post_process`], to which "masks" results will be added.
|
||||
outputs ([`DetrSegmentationOutput`]):
|
||||
Raw outputs of the model.
|
||||
orig_target_sizes (`torch.Tensor` of shape `(batch_size, 2)`):
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import BeitConfig, BeitFeatureExtractor, BeitForImageClassification, BeitForMaskedImageModeling
|
||||
from transformers import BeitConfig, BeitForImageClassification, BeitForMaskedImageModeling, BeitImageProcessor
|
||||
from transformers.image_utils import PILImageResampling
|
||||
from transformers.utils import logging
|
||||
|
||||
@ -171,12 +171,12 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
||||
model.load_state_dict(state_dict)
|
||||
|
||||
# Check outputs on an image
|
||||
feature_extractor = BeitFeatureExtractor(
|
||||
image_processor = BeitImageProcessor(
|
||||
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
|
||||
)
|
||||
image = prepare_img()
|
||||
|
||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
||||
encoding = image_processor(images=image, return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
|
||||
outputs = model(pixel_values)
|
||||
@ -189,18 +189,18 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
if has_lm_head:
|
||||
model_name = "dit-base" if "base" in checkpoint_url else "dit-large"
|
||||
else:
|
||||
model_name = "dit-base-finetuned-rvlcdip" if "dit-b" in checkpoint_url else "dit-large-finetuned-rvlcdip"
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||
organization="nielsr",
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
model.push_to_hub(
|
||||
|
@ -21,7 +21,7 @@ from datasets import load_dataset
|
||||
from donut import DonutModel
|
||||
|
||||
from transformers import (
|
||||
DonutFeatureExtractor,
|
||||
DonutImageProcessor,
|
||||
DonutProcessor,
|
||||
DonutSwinConfig,
|
||||
DonutSwinModel,
|
||||
@ -152,10 +152,10 @@ def convert_donut_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
|
||||
image = dataset["test"][0]["image"].convert("RGB")
|
||||
|
||||
tokenizer = XLMRobertaTokenizerFast.from_pretrained(model_name, from_slow=True)
|
||||
feature_extractor = DonutFeatureExtractor(
|
||||
image_processor = DonutImageProcessor(
|
||||
do_align_long_axis=original_model.config.align_long_axis, size=original_model.config.input_size[::-1]
|
||||
)
|
||||
processor = DonutProcessor(feature_extractor, tokenizer)
|
||||
processor = DonutProcessor(image_processor, tokenizer)
|
||||
pixel_values = processor(image, return_tensors="pt").pixel_values
|
||||
|
||||
if model_name == "naver-clova-ix/donut-base-finetuned-docvqa":
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import cached_download, hf_hub_url
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
|
||||
from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
||||
|
||||
# Check outputs on an image
|
||||
size = 480 if "ade" in checkpoint_url else 384
|
||||
feature_extractor = DPTFeatureExtractor(size=size)
|
||||
image_processor = DPTImageProcessor(size=size)
|
||||
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(image, return_tensors="pt")
|
||||
encoding = image_processor(image, return_tensors="pt")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
|
||||
@ -271,12 +271,12 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
model.push_to_hub("ybelkada/dpt-hybrid-midas")
|
||||
feature_extractor.push_to_hub("ybelkada/dpt-hybrid-midas")
|
||||
image_processor.push_to_hub("ybelkada/dpt-hybrid-midas")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import cached_download, hf_hub_url
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
|
||||
from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -211,10 +211,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
||||
|
||||
# Check outputs on an image
|
||||
size = 480 if "ade" in checkpoint_url else 384
|
||||
feature_extractor = DPTFeatureExtractor(size=size)
|
||||
image_processor = DPTImageProcessor(size=size)
|
||||
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(image, return_tensors="pt")
|
||||
encoding = image_processor(image, return_tensors="pt")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
|
||||
@ -233,8 +233,8 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
print("Pushing model to hub...")
|
||||
@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
||||
commit_message="Add model",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||
organization="nielsr",
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
|
||||
|
@ -208,7 +208,7 @@ def convert_efficientformer_checkpoint(
|
||||
)
|
||||
processor.push_to_hub(
|
||||
repo_id=f"Bearnardd/{pytorch_dump_path}",
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
|
||||
@ -234,12 +234,12 @@ if __name__ == "__main__":
|
||||
"--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
|
||||
)
|
||||
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||
parser.add_argument(
|
||||
"--no-push_to_hub",
|
||||
dest="push_to_hub",
|
||||
action="store_false",
|
||||
help="Do not push model and feature extractor to the hub",
|
||||
help="Do not push model and image processor to the hub",
|
||||
)
|
||||
parser.set_defaults(push_to_hub=True)
|
||||
|
||||
|
@ -537,8 +537,8 @@ EFFICIENTFORMER_START_DOCSTRING = r"""
|
||||
EFFICIENTFORMER_INPUTS_DOCSTRING = r"""
|
||||
Args:
|
||||
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
|
||||
Pixel values. Pixel values can be obtained using [`ViTFeatureExtractor`]. See
|
||||
[`ViTFeatureExtractor.__call__`] for details.
|
||||
Pixel values. Pixel values can be obtained using [`ViTImageProcessor`]. See
|
||||
[`ViTImageProcessor.preprocess`] for details.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
|
||||
tensors for more detail.
|
||||
|
@ -305,12 +305,12 @@ def convert_efficientnet_checkpoint(model_name, pytorch_dump_folder_path, save_m
|
||||
# Create folder to save model
|
||||
if not os.path.isdir(pytorch_dump_folder_path):
|
||||
os.mkdir(pytorch_dump_folder_path)
|
||||
# Save converted model and feature extractor
|
||||
# Save converted model and image processor
|
||||
hf_model.save_pretrained(pytorch_dump_folder_path)
|
||||
preprocessor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
# Push model and feature extractor to hub
|
||||
# Push model and image processor to hub
|
||||
print(f"Pushing converted {model_name} to the hub...")
|
||||
model_name = f"efficientnet-{model_name}"
|
||||
preprocessor.push_to_hub(model_name)
|
||||
@ -333,7 +333,7 @@ if __name__ == "__main__":
|
||||
help="Path to the output PyTorch model directory.",
|
||||
)
|
||||
parser.add_argument("--save_model", action="store_true", help="Save model to local")
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||
|
||||
args = parser.parse_args()
|
||||
convert_efficientnet_checkpoint(args.model_name, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
|
||||
|
@ -23,7 +23,7 @@ import requests
|
||||
import torch
|
||||
from PIL import Image
|
||||
|
||||
from transformers import GLPNConfig, GLPNFeatureExtractor, GLPNForDepthEstimation
|
||||
from transformers import GLPNConfig, GLPNForDepthEstimation, GLPNImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -131,12 +131,12 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h
|
||||
# load GLPN configuration (Segformer-B4 size)
|
||||
config = GLPNConfig(hidden_sizes=[64, 128, 320, 512], decoder_hidden_size=64, depths=[3, 8, 27, 3])
|
||||
|
||||
# load feature extractor (only resize + rescale)
|
||||
feature_extractor = GLPNFeatureExtractor()
|
||||
# load image processor (only resize + rescale)
|
||||
image_processor = GLPNImageProcessor()
|
||||
|
||||
# prepare image
|
||||
image = prepare_img()
|
||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
|
||||
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
|
||||
|
||||
logger.info("Converting model...")
|
||||
|
||||
@ -179,17 +179,17 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h
|
||||
|
||||
# finally, push to hub if required
|
||||
if push_to_hub:
|
||||
logger.info("Pushing model and feature extractor to the hub...")
|
||||
logger.info("Pushing model and image processor to the hub...")
|
||||
model.push_to_hub(
|
||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||
organization="nielsr",
|
||||
commit_message="Add model",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||
organization="nielsr",
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
|
||||
|
@ -458,7 +458,7 @@ class GroupViTOnnxConfig(OnnxConfig):
|
||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
||||
processor.image_processor, batch_size=batch_size, framework=framework
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
|
@ -81,7 +81,7 @@ class ImageGPTImageProcessor(BaseImageProcessor):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
# clusters is a first argument to maintain backwards compatibility with the old ImageGPTFeatureExtractor
|
||||
# clusters is a first argument to maintain backwards compatibility with the old ImageGPTImageProcessor
|
||||
clusters: Optional[Union[List[List[int]], np.ndarray]] = None,
|
||||
do_resize: bool = True,
|
||||
size: Dict[str, int] = None,
|
||||
|
@ -260,7 +260,7 @@ class LayoutLMv3OnnxConfig(OnnxConfig):
|
||||
"""
|
||||
|
||||
# A dummy image is used so OCR should not be applied
|
||||
setattr(processor.feature_extractor, "apply_ocr", False)
|
||||
setattr(processor.image_processor, "apply_ocr", False)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
|
@ -15,6 +15,7 @@
|
||||
"""
|
||||
Processor class for LayoutXLM.
|
||||
"""
|
||||
import warnings
|
||||
from typing import List, Optional, Union
|
||||
|
||||
from ...processing_utils import ProcessorMixin
|
||||
@ -24,26 +25,45 @@ from ...utils import TensorType
|
||||
|
||||
class LayoutXLMProcessor(ProcessorMixin):
|
||||
r"""
|
||||
Constructs a LayoutXLM processor which combines a LayoutXLM feature extractor and a LayoutXLM tokenizer into a
|
||||
single processor.
|
||||
Constructs a LayoutXLM processor which combines a LayoutXLM image processor and a LayoutXLM tokenizer into a single
|
||||
processor.
|
||||
|
||||
[`LayoutXLMProcessor`] offers all the functionalities you need to prepare data for the model.
|
||||
|
||||
It first uses [`LayoutLMv2FeatureExtractor`] to resize document images to a fixed size, and optionally applies OCR
|
||||
to get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
|
||||
It first uses [`LayoutLMv2ImageProcessor`] to resize document images to a fixed size, and optionally applies OCR to
|
||||
get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
|
||||
[`LayoutXLMTokenizerFast`], which turns the words and bounding boxes into token-level `input_ids`,
|
||||
`attention_mask`, `token_type_ids`, `bbox`. Optionally, one can provide integer `word_labels`, which are turned
|
||||
into token-level `labels` for token classification tasks (such as FUNSD, CORD).
|
||||
|
||||
Args:
|
||||
feature_extractor (`LayoutLMv2FeatureExtractor`):
|
||||
An instance of [`LayoutLMv2FeatureExtractor`]. The feature extractor is a required input.
|
||||
image_processor (`LayoutLMv2ImageProcessor`):
|
||||
An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input.
|
||||
tokenizer (`LayoutXLMTokenizer` or `LayoutXLMTokenizerFast`):
|
||||
An instance of [`LayoutXLMTokenizer`] or [`LayoutXLMTokenizerFast`]. The tokenizer is a required input.
|
||||
"""
|
||||
feature_extractor_class = "LayoutLMv2FeatureExtractor"
|
||||
|
||||
attributes = ["image_processor", "tokenizer"]
|
||||
image_processor_class = "LayoutLMv2ImageProcessor"
|
||||
tokenizer_class = ("LayoutXLMTokenizer", "LayoutXLMTokenizerFast")
|
||||
|
||||
def __init__(self, image_processor=None, tokenizer=None, **kwargs):
|
||||
if "feature_extractor" in kwargs:
|
||||
warnings.warn(
|
||||
"The `feature_extractor` argument is deprecated and will be removed in v5, use `image_processor`"
|
||||
" instead.",
|
||||
FutureWarning,
|
||||
)
|
||||
feature_extractor = kwargs.pop("feature_extractor")
|
||||
|
||||
image_processor = image_processor if image_processor is not None else feature_extractor
|
||||
if image_processor is None:
|
||||
raise ValueError("You need to specify an `image_processor`.")
|
||||
if tokenizer is None:
|
||||
raise ValueError("You need to specify a `tokenizer`.")
|
||||
|
||||
super().__init__(image_processor, tokenizer)
|
||||
|
||||
def __call__(
|
||||
self,
|
||||
images,
|
||||
@ -68,37 +88,37 @@ class LayoutXLMProcessor(ProcessorMixin):
|
||||
**kwargs,
|
||||
) -> BatchEncoding:
|
||||
"""
|
||||
This method first forwards the `images` argument to [`~LayoutLMv2FeatureExtractor.__call__`]. In case
|
||||
[`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
|
||||
This method first forwards the `images` argument to [`~LayoutLMv2ImagePrpcessor.__call__`]. In case
|
||||
[`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
|
||||
bounding boxes along with the additional arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output,
|
||||
together with resized `images`. In case [`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to
|
||||
together with resized `images`. In case [`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to
|
||||
`False`, it passes the words (`text`/``text_pair`) and `boxes` specified by the user along with the additional
|
||||
arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output, together with resized `images``.
|
||||
|
||||
Please refer to the docstring of the above two methods for more information.
|
||||
"""
|
||||
# verify input
|
||||
if self.feature_extractor.apply_ocr and (boxes is not None):
|
||||
if self.image_processor.apply_ocr and (boxes is not None):
|
||||
raise ValueError(
|
||||
"You cannot provide bounding boxes "
|
||||
"if you initialized the feature extractor with apply_ocr set to True."
|
||||
"if you initialized the image processor with apply_ocr set to True."
|
||||
)
|
||||
|
||||
if self.feature_extractor.apply_ocr and (word_labels is not None):
|
||||
if self.image_processor.apply_ocr and (word_labels is not None):
|
||||
raise ValueError(
|
||||
"You cannot provide word labels if you initialized the feature extractor with apply_ocr set to True."
|
||||
"You cannot provide word labels if you initialized the image processor with apply_ocr set to True."
|
||||
)
|
||||
|
||||
if return_overflowing_tokens is True and return_offsets_mapping is False:
|
||||
raise ValueError("You cannot return overflowing tokens without returning the offsets mapping.")
|
||||
|
||||
# first, apply the feature extractor
|
||||
features = self.feature_extractor(images=images, return_tensors=return_tensors)
|
||||
# first, apply the image processor
|
||||
features = self.image_processor(images=images, return_tensors=return_tensors)
|
||||
|
||||
# second, apply the tokenizer
|
||||
if text is not None and self.feature_extractor.apply_ocr and text_pair is None:
|
||||
if text is not None and self.image_processor.apply_ocr and text_pair is None:
|
||||
if isinstance(text, str):
|
||||
text = [text] # add batch dimension (as the feature extractor always adds a batch dimension)
|
||||
text = [text] # add batch dimension (as the image processor always adds a batch dimension)
|
||||
text_pair = features["words"]
|
||||
|
||||
encoded_inputs = self.tokenizer(
|
||||
@ -162,3 +182,19 @@ class LayoutXLMProcessor(ProcessorMixin):
|
||||
@property
|
||||
def model_input_names(self):
|
||||
return ["input_ids", "bbox", "attention_mask", "image"]
|
||||
|
||||
@property
|
||||
def feature_extractor_class(self):
|
||||
warnings.warn(
|
||||
"`feature_extractor_class` is deprecated and will be removed in v5. Use `image_processor_class` instead.",
|
||||
FutureWarning,
|
||||
)
|
||||
return self.image_processor_class
|
||||
|
||||
@property
|
||||
def feature_extractor(self):
|
||||
warnings.warn(
|
||||
"`feature_extractor` is deprecated and will be removed in v5. Use `image_processor` instead.",
|
||||
FutureWarning,
|
||||
)
|
||||
return self.image_processor
|
||||
|
@ -25,7 +25,7 @@ import timm
|
||||
import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
|
||||
from transformers import LevitConfig, LevitFeatureExtractor, LevitForImageClassificationWithTeacher
|
||||
from transformers import LevitConfig, LevitForImageClassificationWithTeacher, LevitImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -74,8 +74,8 @@ def convert_weight_and_push(
|
||||
|
||||
if push_to_hub:
|
||||
our_model.save_pretrained(save_directory / checkpoint_name)
|
||||
feature_extractor = LevitFeatureExtractor()
|
||||
feature_extractor.save_pretrained(save_directory / checkpoint_name)
|
||||
image_processor = LevitImageProcessor()
|
||||
image_processor.save_pretrained(save_directory / checkpoint_name)
|
||||
|
||||
print(f"Pushed {checkpoint_name}")
|
||||
|
||||
@ -167,12 +167,12 @@ if __name__ == "__main__":
|
||||
required=False,
|
||||
help="Path to the output PyTorch model directory.",
|
||||
)
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||
parser.add_argument(
|
||||
"--no-push_to_hub",
|
||||
dest="push_to_hub",
|
||||
action="store_false",
|
||||
help="Do not push model and feature extractor to the hub",
|
||||
help="Do not push model and image processor to the hub",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
@ -192,7 +192,7 @@ class OriginalMask2FormerConfigToOursConverter:
|
||||
return config
|
||||
|
||||
|
||||
class OriginalMask2FormerConfigToFeatureExtractorConverter:
|
||||
class OriginalMask2FormerConfigToImageProcessorConverter:
|
||||
def __call__(self, original_config: object) -> Mask2FormerImageProcessor:
|
||||
model = original_config.MODEL
|
||||
model_input = original_config.INPUT
|
||||
@ -846,7 +846,7 @@ class OriginalMask2FormerCheckpointToOursConverter:
|
||||
def test(
|
||||
original_model,
|
||||
our_model: Mask2FormerForUniversalSegmentation,
|
||||
feature_extractor: Mask2FormerImageProcessor,
|
||||
image_processor: Mask2FormerImageProcessor,
|
||||
tolerance: float,
|
||||
):
|
||||
with torch.no_grad():
|
||||
@ -854,7 +854,7 @@ def test(
|
||||
our_model = our_model.eval()
|
||||
|
||||
im = prepare_img()
|
||||
x = feature_extractor(images=im, return_tensors="pt")["pixel_values"]
|
||||
x = image_processor(images=im, return_tensors="pt")["pixel_values"]
|
||||
|
||||
original_model_backbone_features = original_model.backbone(x.clone())
|
||||
our_model_output: Mask2FormerModelOutput = our_model.model(x.clone(), output_hidden_states=True)
|
||||
@ -979,10 +979,10 @@ if __name__ == "__main__":
|
||||
checkpoints_dir, config_dir
|
||||
):
|
||||
model_name = get_model_name(checkpoint_file)
|
||||
feature_extractor = OriginalMask2FormerConfigToFeatureExtractorConverter()(
|
||||
image_processor = OriginalMask2FormerConfigToImageProcessorConverter()(
|
||||
setup_cfg(Args(config_file=config_file))
|
||||
)
|
||||
feature_extractor.size = {"height": 384, "width": 384}
|
||||
image_processor.size = {"height": 384, "width": 384}
|
||||
|
||||
original_config = setup_cfg(Args(config_file=config_file))
|
||||
mask2former_kwargs = OriginalMask2Former.from_config(original_config)
|
||||
@ -1012,8 +1012,8 @@ if __name__ == "__main__":
|
||||
tolerance = 3e-1
|
||||
|
||||
logger.info(f"🪄 Testing {model_name}...")
|
||||
test(original_model, mask2former_for_segmentation, feature_extractor, tolerance)
|
||||
test(original_model, mask2former_for_segmentation, image_processor, tolerance)
|
||||
logger.info(f"🪄 Pushing {model_name} to hub...")
|
||||
|
||||
feature_extractor.push_to_hub(model_name)
|
||||
image_processor.push_to_hub(model_name)
|
||||
mask2former_for_segmentation.push_to_hub(model_name)
|
||||
|
@ -2106,8 +2106,8 @@ MASK2FORMER_START_DOCSTRING = r"""
|
||||
MASK2FORMER_INPUTS_DOCSTRING = r"""
|
||||
Args:
|
||||
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
|
||||
Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
|
||||
[`AutoFeatureExtractor.__call__`] for details.
|
||||
Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
|
||||
[`AutoImageProcessor.preprocess`] for details.
|
||||
pixel_mask (`torch.LongTensor` of shape `(batch_size, height, width)`, *optional*):
|
||||
Mask to avoid performing attention on padding pixel values. Mask values selected in `[0, 1]`:
|
||||
|
||||
|
@ -29,7 +29,7 @@ from detectron2.projects.deeplab import add_deeplab_config
|
||||
from PIL import Image
|
||||
from torch import Tensor, nn
|
||||
|
||||
from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerFeatureExtractor
|
||||
from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerImageProcessor
|
||||
from transformers.models.maskformer.modeling_maskformer import (
|
||||
MaskFormerConfig,
|
||||
MaskFormerForInstanceSegmentation,
|
||||
@ -164,13 +164,13 @@ class OriginalMaskFormerConfigToOursConverter:
|
||||
return config
|
||||
|
||||
|
||||
class OriginalMaskFormerConfigToFeatureExtractorConverter:
|
||||
def __call__(self, original_config: object) -> MaskFormerFeatureExtractor:
|
||||
class OriginalMaskFormerConfigToImageProcessorConverter:
|
||||
def __call__(self, original_config: object) -> MaskFormerImageProcessor:
|
||||
model = original_config.MODEL
|
||||
model_input = original_config.INPUT
|
||||
dataset_catalog = MetadataCatalog.get(original_config.DATASETS.TEST[0])
|
||||
|
||||
return MaskFormerFeatureExtractor(
|
||||
return MaskFormerImageProcessor(
|
||||
image_mean=(torch.tensor(model.PIXEL_MEAN) / 255).tolist(),
|
||||
image_std=(torch.tensor(model.PIXEL_STD) / 255).tolist(),
|
||||
size=model_input.MIN_SIZE_TEST,
|
||||
@ -554,7 +554,7 @@ class OriginalMaskFormerCheckpointToOursConverter:
|
||||
yield config, checkpoint
|
||||
|
||||
|
||||
def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_extractor: MaskFormerFeatureExtractor):
|
||||
def test(original_model, our_model: MaskFormerForInstanceSegmentation, image_processor: MaskFormerImageProcessor):
|
||||
with torch.no_grad():
|
||||
original_model = original_model.eval()
|
||||
our_model = our_model.eval()
|
||||
@ -600,7 +600,7 @@ def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_e
|
||||
|
||||
our_model_out: MaskFormerForInstanceSegmentationOutput = our_model(x)
|
||||
|
||||
our_segmentation = feature_extractor.post_process_segmentation(our_model_out, target_size=(384, 384))
|
||||
our_segmentation = image_processor.post_process_segmentation(our_model_out, target_size=(384, 384))
|
||||
|
||||
assert torch.allclose(
|
||||
original_segmentation, our_segmentation, atol=1e-3
|
||||
@ -686,9 +686,7 @@ if __name__ == "__main__":
|
||||
for config_file, checkpoint_file in OriginalMaskFormerCheckpointToOursConverter.using_dirs(
|
||||
checkpoints_dir, config_dir
|
||||
):
|
||||
feature_extractor = OriginalMaskFormerConfigToFeatureExtractorConverter()(
|
||||
setup_cfg(Args(config_file=config_file))
|
||||
)
|
||||
image_processor = OriginalMaskFormerConfigToImageProcessorConverter()(setup_cfg(Args(config_file=config_file)))
|
||||
|
||||
original_config = setup_cfg(Args(config_file=config_file))
|
||||
mask_former_kwargs = OriginalMaskFormer.from_config(original_config)
|
||||
@ -712,15 +710,15 @@ if __name__ == "__main__":
|
||||
mask_former_for_instance_segmentation
|
||||
)
|
||||
|
||||
test(original_model, mask_former_for_instance_segmentation, feature_extractor)
|
||||
test(original_model, mask_former_for_instance_segmentation, image_processor)
|
||||
|
||||
model_name = get_name(checkpoint_file)
|
||||
logger.info(f"🪄 Saving {model_name}")
|
||||
|
||||
feature_extractor.save_pretrained(save_directory / model_name)
|
||||
image_processor.save_pretrained(save_directory / model_name)
|
||||
mask_former_for_instance_segmentation.save_pretrained(save_directory / model_name)
|
||||
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=save_directory / model_name,
|
||||
commit_message="Add model",
|
||||
use_temp_dir=True,
|
||||
|
@ -26,7 +26,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, ResNetConfig
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, ResNetConfig
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -297,9 +297,9 @@ def convert_maskformer_checkpoint(
|
||||
else:
|
||||
ignore_index = 255
|
||||
reduce_labels = True if "ade" in model_name else False
|
||||
feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
||||
image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
||||
|
||||
inputs = feature_extractor(image, return_tensors="pt")
|
||||
inputs = image_processor(image, return_tensors="pt")
|
||||
|
||||
outputs = model(**inputs)
|
||||
|
||||
@ -340,15 +340,15 @@ def convert_maskformer_checkpoint(
|
||||
print("Looks ok!")
|
||||
|
||||
if pytorch_dump_folder_path is not None:
|
||||
print(f"Saving model and feature extractor of {model_name} to {pytorch_dump_folder_path}")
|
||||
print(f"Saving model and image processor of {model_name} to {pytorch_dump_folder_path}")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
print(f"Pushing model and feature extractor of {model_name} to the hub...")
|
||||
print(f"Pushing model and image processor of {model_name} to the hub...")
|
||||
model.push_to_hub(f"facebook/{model_name}")
|
||||
feature_extractor.push_to_hub(f"facebook/{model_name}")
|
||||
image_processor.push_to_hub(f"facebook/{model_name}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -26,7 +26,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, SwinConfig
|
||||
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, SwinConfig
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -278,9 +278,9 @@ def convert_maskformer_checkpoint(
|
||||
else:
|
||||
ignore_index = 255
|
||||
reduce_labels = True if "ade" in model_name else False
|
||||
feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
||||
image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
||||
|
||||
inputs = feature_extractor(image, return_tensors="pt")
|
||||
inputs = image_processor(image, return_tensors="pt")
|
||||
|
||||
outputs = model(**inputs)
|
||||
|
||||
@ -294,15 +294,15 @@ def convert_maskformer_checkpoint(
|
||||
print("Looks ok!")
|
||||
|
||||
if pytorch_dump_folder_path is not None:
|
||||
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
|
||||
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
print("Pushing model and feature extractor to the hub...")
|
||||
print("Pushing model and image processor to the hub...")
|
||||
model.push_to_hub(f"nielsr/{model_name}")
|
||||
feature_extractor.push_to_hub(f"nielsr/{model_name}")
|
||||
image_processor.push_to_hub(f"nielsr/{model_name}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -27,8 +27,8 @@ from PIL import Image
|
||||
|
||||
from transformers import (
|
||||
MobileNetV1Config,
|
||||
MobileNetV1FeatureExtractor,
|
||||
MobileNetV1ForImageClassification,
|
||||
MobileNetV1ImageProcessor,
|
||||
load_tf_weights_in_mobilenet_v1,
|
||||
)
|
||||
from transformers.utils import logging
|
||||
@ -83,12 +83,12 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
||||
# Load weights from TensorFlow checkpoint
|
||||
load_tf_weights_in_mobilenet_v1(model, config, checkpoint_path)
|
||||
|
||||
# Check outputs on an image, prepared by MobileNetV1FeatureExtractor
|
||||
feature_extractor = MobileNetV1FeatureExtractor(
|
||||
# Check outputs on an image, prepared by MobileNetV1ImageProcessor
|
||||
image_processor = MobileNetV1ImageProcessor(
|
||||
crop_size={"width": config.image_size, "height": config.image_size},
|
||||
size={"shortest_edge": config.image_size + 32},
|
||||
)
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
outputs = model(**encoding)
|
||||
logits = outputs.logits
|
||||
|
||||
@ -107,13 +107,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
print("Pushing to the hub...")
|
||||
repo_id = "google/" + model_name
|
||||
feature_extractor.push_to_hub(repo_id)
|
||||
image_processor.push_to_hub(repo_id)
|
||||
model.push_to_hub(repo_id)
|
||||
|
||||
|
||||
|
@ -99,11 +99,11 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
||||
load_tf_weights_in_mobilenet_v2(model, config, checkpoint_path)
|
||||
|
||||
# Check outputs on an image, prepared by MobileNetV2ImageProcessor
|
||||
feature_extractor = MobileNetV2ImageProcessor(
|
||||
image_processor = MobileNetV2ImageProcessor(
|
||||
crop_size={"width": config.image_size, "height": config.image_size},
|
||||
size={"shortest_edge": config.image_size + 32},
|
||||
)
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
outputs = model(**encoding)
|
||||
logits = outputs.logits
|
||||
|
||||
@ -143,13 +143,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
print("Pushing to the hub...")
|
||||
repo_id = "google/" + model_name
|
||||
feature_extractor.push_to_hub(repo_id)
|
||||
image_processor.push_to_hub(repo_id)
|
||||
model.push_to_hub(repo_id)
|
||||
|
||||
|
||||
|
@ -26,9 +26,9 @@ from PIL import Image
|
||||
|
||||
from transformers import (
|
||||
MobileViTConfig,
|
||||
MobileViTFeatureExtractor,
|
||||
MobileViTForImageClassification,
|
||||
MobileViTForSemanticSegmentation,
|
||||
MobileViTImageProcessor,
|
||||
)
|
||||
from transformers.utils import logging
|
||||
|
||||
@ -211,9 +211,9 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
|
||||
new_state_dict = convert_state_dict(state_dict, model)
|
||||
model.load_state_dict(new_state_dict)
|
||||
|
||||
# Check outputs on an image, prepared by MobileViTFeatureExtractor
|
||||
feature_extractor = MobileViTFeatureExtractor(crop_size=config.image_size, size=config.image_size + 32)
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
# Check outputs on an image, prepared by MobileViTImageProcessor
|
||||
image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
outputs = model(**encoding)
|
||||
logits = outputs.logits
|
||||
|
||||
@ -265,8 +265,8 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {mobilevit_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
model_mapping = {
|
||||
@ -280,7 +280,7 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
|
||||
|
||||
print("Pushing to the hub...")
|
||||
model_name = model_mapping[mobilevit_name]
|
||||
feature_extractor.push_to_hub(model_name, organization="apple")
|
||||
image_processor.push_to_hub(model_name, organization="apple")
|
||||
model.push_to_hub(model_name, organization="apple")
|
||||
|
||||
|
||||
|
@ -259,8 +259,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
|
||||
model.load_state_dict(state_dict)
|
||||
|
||||
# Check outputs on an image, prepared by MobileViTImageProcessor
|
||||
feature_extractor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
outputs = model(**encoding)
|
||||
|
||||
# verify classification model
|
||||
@ -276,8 +276,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {task_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -383,7 +383,7 @@ class OwlViTOnnxConfig(OnnxConfig):
|
||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
||||
processor.image_processor, batch_size=batch_size, framework=framework
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
|
@ -29,8 +29,8 @@ from huggingface_hub import Repository
|
||||
from transformers import (
|
||||
CLIPTokenizer,
|
||||
OwlViTConfig,
|
||||
OwlViTFeatureExtractor,
|
||||
OwlViTForObjectDetection,
|
||||
OwlViTImageProcessor,
|
||||
OwlViTModel,
|
||||
OwlViTProcessor,
|
||||
)
|
||||
@ -350,16 +350,16 @@ def convert_owlvit_checkpoint(pt_backbone, flax_params, attn_params, pytorch_dum
|
||||
# Save HF model
|
||||
hf_model.save_pretrained(repo.local_dir)
|
||||
|
||||
# Initialize feature extractor
|
||||
feature_extractor = OwlViTFeatureExtractor(
|
||||
# Initialize image processor
|
||||
image_processor = OwlViTImageProcessor(
|
||||
size=config.vision_config.image_size, crop_size=config.vision_config.image_size
|
||||
)
|
||||
# Initialize tokenizer
|
||||
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32", pad_token="!", model_max_length=16)
|
||||
|
||||
# Initialize processor
|
||||
processor = OwlViTProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
feature_extractor.save_pretrained(repo.local_dir)
|
||||
processor = OwlViTProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
image_processor.save_pretrained(repo.local_dir)
|
||||
processor.save_pretrained(repo.local_dir)
|
||||
|
||||
repo.git_add()
|
||||
|
@ -29,13 +29,13 @@ from PIL import Image
|
||||
|
||||
from transformers import (
|
||||
PerceiverConfig,
|
||||
PerceiverFeatureExtractor,
|
||||
PerceiverForImageClassificationConvProcessing,
|
||||
PerceiverForImageClassificationFourier,
|
||||
PerceiverForImageClassificationLearned,
|
||||
PerceiverForMaskedLM,
|
||||
PerceiverForMultimodalAutoencoding,
|
||||
PerceiverForOpticalFlow,
|
||||
PerceiverImageProcessor,
|
||||
PerceiverTokenizer,
|
||||
)
|
||||
from transformers.utils import logging
|
||||
@ -389,9 +389,9 @@ def convert_perceiver_checkpoint(pickle_file, pytorch_dump_folder_path, architec
|
||||
inputs = encoding.input_ids
|
||||
input_mask = encoding.attention_mask
|
||||
elif architecture in ["image_classification", "image_classification_fourier", "image_classification_conv"]:
|
||||
feature_extractor = PerceiverFeatureExtractor()
|
||||
image_processor = PerceiverImageProcessor()
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(image, return_tensors="pt")
|
||||
encoding = image_processor(image, return_tensors="pt")
|
||||
inputs = encoding.pixel_values
|
||||
elif architecture == "optical_flow":
|
||||
inputs = torch.randn(1, 2, 27, 368, 496)
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import PoolFormerConfig, PoolFormerFeatureExtractor, PoolFormerForImageClassification
|
||||
from transformers import PoolFormerConfig, PoolFormerForImageClassification, PoolFormerImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -141,12 +141,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
|
||||
else:
|
||||
raise ValueError(f"Size {size} not supported")
|
||||
|
||||
# load feature extractor
|
||||
feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
|
||||
# load image processor
|
||||
image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)
|
||||
|
||||
# Prepare image
|
||||
image = prepare_img()
|
||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
|
||||
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
|
||||
|
||||
logger.info(f"Converting model {model_name}...")
|
||||
|
||||
@ -161,9 +161,9 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
|
||||
model.load_state_dict(state_dict)
|
||||
model.eval()
|
||||
|
||||
# Define feature extractor
|
||||
feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
|
||||
pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
|
||||
# Define image processor
|
||||
image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)
|
||||
pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values
|
||||
|
||||
# forward pass
|
||||
outputs = model(pixel_values)
|
||||
@ -187,12 +187,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
|
||||
assert logits.shape == expected_shape
|
||||
assert torch.allclose(logits[0, :3], expected_slice, atol=1e-2)
|
||||
|
||||
# finally, save model and feature extractor
|
||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
||||
# finally, save model and image processor
|
||||
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -34,7 +34,7 @@ from huggingface_hub import cached_download, hf_hub_url
|
||||
from torch import Tensor
|
||||
from vissl.models.model_helpers import get_trunk_forward_outputs
|
||||
|
||||
from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
||||
from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
||||
from transformers.modeling_utils import PreTrainedModel
|
||||
from transformers.utils import logging
|
||||
|
||||
@ -262,10 +262,10 @@ def convert_weights_and_push(save_directory: Path, model_name: str = None, push_
|
||||
)
|
||||
size = 384
|
||||
# we can use the convnext one
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=save_directory / model_name,
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
output_dir=save_directory / model_name,
|
||||
)
|
||||
|
||||
@ -294,7 +294,7 @@ if __name__ == "__main__":
|
||||
default=True,
|
||||
type=bool,
|
||||
required=False,
|
||||
help="If True, push model and feature extractor to the hub.",
|
||||
help="If True, push model and image processor to the hub.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
@ -30,7 +30,7 @@ from huggingface_hub import cached_download, hf_hub_url
|
||||
from torch import Tensor
|
||||
from vissl.models.model_helpers import get_trunk_forward_outputs
|
||||
|
||||
from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
||||
from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -209,10 +209,10 @@ def convert_weight_and_push(
|
||||
|
||||
size = 224 if "seer" not in name else 384
|
||||
# we can use the convnext one
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=save_directory / name,
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
|
||||
@ -449,7 +449,7 @@ if __name__ == "__main__":
|
||||
default=True,
|
||||
type=bool,
|
||||
required=False,
|
||||
help="If True, push model and feature extractor to the hub.",
|
||||
help="If True, push model and image processor to the hub.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
@ -28,7 +28,7 @@ import torch.nn as nn
|
||||
from huggingface_hub import hf_hub_download
|
||||
from torch import Tensor
|
||||
|
||||
from transformers import AutoFeatureExtractor, ResNetConfig, ResNetForImageClassification
|
||||
from transformers import AutoImageProcessor, ResNetConfig, ResNetForImageClassification
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -113,10 +113,10 @@ def convert_weight_and_push(name: str, config: ResNetConfig, save_directory: Pat
|
||||
)
|
||||
|
||||
# we can use the convnext one
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=save_directory / checkpoint_name,
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
|
||||
@ -191,7 +191,7 @@ if __name__ == "__main__":
|
||||
default=True,
|
||||
type=bool,
|
||||
required=False,
|
||||
help="If True, push model and feature extractor to the hub.",
|
||||
help="If True, push model and image processor to the hub.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
@ -27,9 +27,9 @@ from PIL import Image
|
||||
|
||||
from transformers import (
|
||||
SegformerConfig,
|
||||
SegformerFeatureExtractor,
|
||||
SegformerForImageClassification,
|
||||
SegformerForSemanticSegmentation,
|
||||
SegformerImageProcessor,
|
||||
)
|
||||
from transformers.utils import logging
|
||||
|
||||
@ -179,14 +179,14 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
||||
else:
|
||||
raise ValueError(f"Size {size} not supported")
|
||||
|
||||
# load feature extractor (only resize + normalize)
|
||||
feature_extractor = SegformerFeatureExtractor(
|
||||
# load image processor (only resize + normalize)
|
||||
image_processor = SegformerImageProcessor(
|
||||
image_scale=(512, 512), keep_ratio=False, align=False, do_random_crop=False
|
||||
)
|
||||
|
||||
# prepare image
|
||||
image = prepare_img()
|
||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
|
||||
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
|
||||
|
||||
logger.info(f"Converting model {model_name}...")
|
||||
|
||||
@ -362,11 +362,11 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
||||
assert logits.shape == expected_shape
|
||||
assert torch.allclose(logits[0, :3, :3, :3], expected_slice, atol=1e-2)
|
||||
|
||||
# finally, save model and feature extractor
|
||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
||||
# finally, save model and image processor
|
||||
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -22,7 +22,7 @@ import requests
|
||||
import torch
|
||||
from PIL import Image
|
||||
|
||||
from transformers import SwinConfig, SwinForMaskedImageModeling, ViTFeatureExtractor
|
||||
from transformers import SwinConfig, SwinForMaskedImageModeling, ViTImageProcessor
|
||||
|
||||
|
||||
def get_swin_config(model_name):
|
||||
@ -132,9 +132,9 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat
|
||||
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
|
||||
feature_extractor = ViTFeatureExtractor(size={"height": 192, "width": 192})
|
||||
image_processor = ViTImageProcessor(size={"height": 192, "width": 192})
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model(**inputs).logits
|
||||
@ -146,13 +146,13 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat
|
||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
print(f"Pushing model and feature extractor for {model_name} to hub")
|
||||
print(f"Pushing model and image processor for {model_name} to hub")
|
||||
model.push_to_hub(f"microsoft/{model_name}")
|
||||
feature_extractor.push_to_hub(f"microsoft/{model_name}")
|
||||
image_processor.push_to_hub(f"microsoft/{model_name}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -7,7 +7,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import AutoFeatureExtractor, SwinConfig, SwinForImageClassification
|
||||
from transformers import AutoImageProcessor, SwinConfig, SwinForImageClassification
|
||||
|
||||
|
||||
def get_swin_config(swin_name):
|
||||
@ -140,9 +140,9 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):
|
||||
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
|
||||
image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
|
||||
timm_outs = timm_model(inputs["pixel_values"])
|
||||
hf_outs = model(**inputs).logits
|
||||
@ -152,8 +152,8 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):
|
||||
print(f"Saving model {swin_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import AutoFeatureExtractor, Swinv2Config, Swinv2ForImageClassification
|
||||
from transformers import AutoImageProcessor, Swinv2Config, Swinv2ForImageClassification
|
||||
|
||||
|
||||
def get_swinv2_config(swinv2_name):
|
||||
@ -180,9 +180,9 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):
|
||||
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
|
||||
image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
|
||||
timm_outs = timm_model(inputs["pixel_values"])
|
||||
hf_outs = model(**inputs).logits
|
||||
@ -192,8 +192,8 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):
|
||||
print(f"Saving model {swinv2_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
model.push_to_hub(
|
||||
repo_path_or_name=Path(pytorch_dump_folder_path, swinv2_name),
|
||||
|
@ -27,7 +27,7 @@ from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
from torchvision.transforms import functional as F
|
||||
|
||||
from transformers import DetrFeatureExtractor, TableTransformerConfig, TableTransformerForObjectDetection
|
||||
from transformers import DetrImageProcessor, TableTransformerConfig, TableTransformerForObjectDetection
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -242,7 +242,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
|
||||
config.id2label = id2label
|
||||
config.label2id = {v: k for k, v in id2label.items()}
|
||||
|
||||
feature_extractor = DetrFeatureExtractor(
|
||||
image_processor = DetrImageProcessor(
|
||||
format="coco_detection", max_size=800 if "detection" in checkpoint_url else 1000
|
||||
)
|
||||
model = TableTransformerForObjectDetection(config)
|
||||
@ -277,11 +277,11 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
|
||||
print("Looks ok!")
|
||||
|
||||
if pytorch_dump_folder_path is not None:
|
||||
# Save model and feature extractor
|
||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
||||
# Save model and image processor
|
||||
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
# Push model to HF hub
|
||||
@ -292,7 +292,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
|
||||
else "microsoft/table-transformer-structure-recognition"
|
||||
)
|
||||
model.push_to_hub(model_name)
|
||||
feature_extractor.push_to_hub(model_name)
|
||||
image_processor.push_to_hub(model_name)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -22,7 +22,7 @@ import numpy as np
|
||||
import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
|
||||
from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEFeatureExtractor
|
||||
from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEImageProcessor
|
||||
|
||||
|
||||
def get_timesformer_config(model_name):
|
||||
@ -156,9 +156,9 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
|
||||
model.eval()
|
||||
|
||||
# verify model on basic input
|
||||
feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
||||
image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
||||
video = prepare_video()
|
||||
inputs = feature_extractor(video[:8], return_tensors="pt")
|
||||
inputs = image_processor(video[:8], return_tensors="pt")
|
||||
|
||||
outputs = model(**inputs)
|
||||
logits = outputs.logits
|
||||
@ -215,8 +215,8 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
|
||||
print("Logits ok!")
|
||||
|
||||
if pytorch_dump_folder_path is not None:
|
||||
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
|
@ -513,8 +513,8 @@ TIMESFORMER_START_DOCSTRING = r"""
|
||||
TIMESFORMER_INPUTS_DOCSTRING = r"""
|
||||
Args:
|
||||
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_frames, num_channels, height, width)`):
|
||||
Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
|
||||
[`VideoMAEFeatureExtractor.__call__`] for details.
|
||||
Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
|
||||
[`VideoMAEImageProcessor.preprocess`] for details.
|
||||
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
|
||||
|
@ -29,7 +29,7 @@ from transformers import (
|
||||
TrOCRProcessor,
|
||||
VisionEncoderDecoderModel,
|
||||
ViTConfig,
|
||||
ViTFeatureExtractor,
|
||||
ViTImageProcessor,
|
||||
ViTModel,
|
||||
)
|
||||
from transformers.utils import logging
|
||||
@ -182,9 +182,9 @@ def convert_tr_ocr_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
model.load_state_dict(state_dict)
|
||||
|
||||
# Check outputs on an image
|
||||
feature_extractor = ViTFeatureExtractor(size=encoder_config.image_size)
|
||||
image_processor = ViTImageProcessor(size=encoder_config.image_size)
|
||||
tokenizer = RobertaTokenizer.from_pretrained("roberta-large")
|
||||
processor = TrOCRProcessor(feature_extractor, tokenizer)
|
||||
processor = TrOCRProcessor(image_processor, tokenizer)
|
||||
|
||||
pixel_values = processor(images=prepare_img(checkpoint_url), return_tensors="pt").pixel_values
|
||||
|
||||
|
@ -30,7 +30,7 @@ import torch.nn as nn
|
||||
from huggingface_hub import cached_download, hf_hub_download
|
||||
from torch import Tensor
|
||||
|
||||
from transformers import AutoFeatureExtractor, VanConfig, VanForImageClassification
|
||||
from transformers import AutoImageProcessor, VanConfig, VanForImageClassification
|
||||
from transformers.models.van.modeling_van import VanLayerScaling
|
||||
from transformers.utils import logging
|
||||
|
||||
@ -154,10 +154,10 @@ def convert_weight_and_push(
|
||||
)
|
||||
|
||||
# we can use the convnext one
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||
feature_extractor.push_to_hub(
|
||||
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||
image_processor.push_to_hub(
|
||||
repo_path_or_name=save_directory / checkpoint_name,
|
||||
commit_message="Add feature extractor",
|
||||
commit_message="Add image processor",
|
||||
use_temp_dir=True,
|
||||
)
|
||||
|
||||
@ -277,7 +277,7 @@ if __name__ == "__main__":
|
||||
default=True,
|
||||
type=bool,
|
||||
required=False,
|
||||
help="If True, push model and feature extractor to the hub.",
|
||||
help="If True, push model and image processor to the hub.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
@ -24,9 +24,9 @@ from huggingface_hub import hf_hub_download
|
||||
|
||||
from transformers import (
|
||||
VideoMAEConfig,
|
||||
VideoMAEFeatureExtractor,
|
||||
VideoMAEForPreTraining,
|
||||
VideoMAEForVideoClassification,
|
||||
VideoMAEImageProcessor,
|
||||
)
|
||||
|
||||
|
||||
@ -198,9 +198,9 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
|
||||
model.eval()
|
||||
|
||||
# verify model on basic input
|
||||
feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
||||
image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
||||
video = prepare_video()
|
||||
inputs = feature_extractor(video, return_tensors="pt")
|
||||
inputs = image_processor(video, return_tensors="pt")
|
||||
|
||||
if "finetuned" not in model_name:
|
||||
local_path = hf_hub_download(repo_id="hf-internal-testing/bool-masked-pos", filename="bool_masked_pos.pt")
|
||||
@ -288,8 +288,8 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
|
||||
print("Loss ok!")
|
||||
|
||||
if pytorch_dump_folder_path is not None:
|
||||
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
|
@ -27,11 +27,11 @@ from PIL import Image
|
||||
from transformers import (
|
||||
BertTokenizer,
|
||||
ViltConfig,
|
||||
ViltFeatureExtractor,
|
||||
ViltForImageAndTextRetrieval,
|
||||
ViltForImagesAndTextClassification,
|
||||
ViltForMaskedLM,
|
||||
ViltForQuestionAnswering,
|
||||
ViltImageProcessor,
|
||||
ViltProcessor,
|
||||
)
|
||||
from transformers.utils import logging
|
||||
@ -223,9 +223,9 @@ def convert_vilt_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
model.load_state_dict(state_dict)
|
||||
|
||||
# Define processor
|
||||
feature_extractor = ViltFeatureExtractor(size=384)
|
||||
image_processor = ViltImageProcessor(size=384)
|
||||
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
||||
processor = ViltProcessor(feature_extractor, tokenizer)
|
||||
processor = ViltProcessor(image_processor, tokenizer)
|
||||
|
||||
# Forward pass on example inputs (image + text)
|
||||
if nlvr_model:
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
|
||||
from transformers import ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -175,9 +175,9 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
|
||||
model = ViTForImageClassification(config).eval()
|
||||
model.load_state_dict(state_dict)
|
||||
|
||||
# Check outputs on an image, prepared by ViTFeatureExtractor
|
||||
feature_extractor = ViTFeatureExtractor()
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
# Check outputs on an image, prepared by ViTImageProcessor
|
||||
image_processor = ViTImageProcessor()
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
outputs = model(pixel_values)
|
||||
|
||||
@ -192,8 +192,8 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -25,7 +25,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DeiTFeatureExtractor, ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
|
||||
from transformers import DeiTImageProcessor, ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -208,12 +208,12 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
|
||||
model = ViTForImageClassification(config).eval()
|
||||
model.load_state_dict(state_dict)
|
||||
|
||||
# Check outputs on an image, prepared by ViTFeatureExtractor/DeiTFeatureExtractor
|
||||
# Check outputs on an image, prepared by ViTImageProcessor/DeiTImageProcessor
|
||||
if "deit" in vit_name:
|
||||
feature_extractor = DeiTFeatureExtractor(size=config.image_size)
|
||||
image_processor = DeiTImageProcessor(size=config.image_size)
|
||||
else:
|
||||
feature_extractor = ViTFeatureExtractor(size=config.image_size)
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
image_processor = ViTImageProcessor(size=config.image_size)
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
outputs = model(pixel_values)
|
||||
|
||||
@ -229,8 +229,8 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {vit_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -20,7 +20,7 @@ import requests
|
||||
import torch
|
||||
from PIL import Image
|
||||
|
||||
from transformers import ViTMAEConfig, ViTMAEFeatureExtractor, ViTMAEForPreTraining
|
||||
from transformers import ViTMAEConfig, ViTMAEForPreTraining, ViTMAEImageProcessor
|
||||
|
||||
|
||||
def rename_key(name):
|
||||
@ -120,7 +120,7 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
|
||||
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["model"]
|
||||
|
||||
feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
|
||||
image_processor = ViTMAEImageProcessor(size=config.image_size)
|
||||
|
||||
new_state_dict = convert_state_dict(state_dict, config)
|
||||
|
||||
@ -130,8 +130,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
url = "https://user-images.githubusercontent.com/11435359/147738734-196fd92f-9260-48d5-ba7e-bf103d29364d.jpg"
|
||||
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
|
||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
||||
image_processor = ViTMAEImageProcessor(size=config.image_size)
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
|
||||
# forward pass
|
||||
torch.manual_seed(2)
|
||||
@ -157,8 +157,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -22,7 +22,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import ViTFeatureExtractor, ViTMSNConfig, ViTMSNModel
|
||||
from transformers import ViTImageProcessor, ViTMSNConfig, ViTMSNModel
|
||||
from transformers.image_utils import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
|
||||
|
||||
|
||||
@ -180,7 +180,7 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
|
||||
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["target_encoder"]
|
||||
|
||||
feature_extractor = ViTFeatureExtractor(size=config.image_size)
|
||||
image_processor = ViTImageProcessor(size=config.image_size)
|
||||
|
||||
remove_projection_head(state_dict)
|
||||
rename_keys = create_rename_keys(config, base_model=True)
|
||||
@ -195,10 +195,10 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
feature_extractor = ViTFeatureExtractor(
|
||||
image_processor = ViTImageProcessor(
|
||||
size=config.image_size, image_mean=IMAGENET_DEFAULT_MEAN, image_std=IMAGENET_DEFAULT_STD
|
||||
)
|
||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
|
||||
# forward pass
|
||||
torch.manual_seed(2)
|
||||
@ -224,8 +224,8 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
@ -23,7 +23,7 @@ from huggingface_hub import hf_hub_download
|
||||
from transformers import (
|
||||
CLIPTokenizer,
|
||||
CLIPTokenizerFast,
|
||||
VideoMAEFeatureExtractor,
|
||||
VideoMAEImageProcessor,
|
||||
XCLIPConfig,
|
||||
XCLIPModel,
|
||||
XCLIPProcessor,
|
||||
@ -291,10 +291,10 @@ def convert_xclip_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
|
||||
model.eval()
|
||||
|
||||
size = 336 if model_name == "xclip-large-patch14-16-frames" else 224
|
||||
feature_extractor = VideoMAEFeatureExtractor(size=size)
|
||||
image_processor = VideoMAEImageProcessor(size=size)
|
||||
slow_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
|
||||
fast_tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-base-patch32")
|
||||
processor = XCLIPProcessor(feature_extractor=feature_extractor, tokenizer=fast_tokenizer)
|
||||
processor = XCLIPProcessor(image_processor=image_processor, tokenizer=fast_tokenizer)
|
||||
|
||||
video = prepare_video(num_frames)
|
||||
inputs = processor(
|
||||
|
@ -24,7 +24,7 @@ import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from PIL import Image
|
||||
|
||||
from transformers import YolosConfig, YolosFeatureExtractor, YolosForObjectDetection
|
||||
from transformers import YolosConfig, YolosForObjectDetection, YolosImageProcessor
|
||||
from transformers.utils import logging
|
||||
|
||||
|
||||
@ -172,10 +172,10 @@ def convert_yolos_checkpoint(
|
||||
new_state_dict = convert_state_dict(state_dict, model)
|
||||
model.load_state_dict(new_state_dict)
|
||||
|
||||
# Check outputs on an image, prepared by YolosFeatureExtractor
|
||||
# Check outputs on an image, prepared by YolosImageProcessor
|
||||
size = 800 if yolos_name != "yolos_ti" else 512
|
||||
feature_extractor = YolosFeatureExtractor(format="coco_detection", size=size)
|
||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
||||
image_processor = YolosImageProcessor(format="coco_detection", size=size)
|
||||
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||
outputs = model(**encoding)
|
||||
logits, pred_boxes = outputs.logits, outputs.pred_boxes
|
||||
|
||||
@ -224,8 +224,8 @@ def convert_yolos_checkpoint(
|
||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||
print(f"Saving model {yolos_name} to {pytorch_dump_folder_path}")
|
||||
model.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
||||
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||
|
||||
if push_to_hub:
|
||||
model_mapping = {
|
||||
@ -238,7 +238,7 @@ def convert_yolos_checkpoint(
|
||||
|
||||
print("Pushing to the hub...")
|
||||
model_name = model_mapping[yolos_name]
|
||||
feature_extractor.push_to_hub(model_name, organization="hustvl")
|
||||
image_processor.push_to_hub(model_name, organization="hustvl")
|
||||
model.push_to_hub(model_name, organization="hustvl")
|
||||
|
||||
|
||||
|
@ -19,7 +19,7 @@ from pathlib import Path
|
||||
|
||||
from packaging import version
|
||||
|
||||
from .. import AutoFeatureExtractor, AutoProcessor, AutoTokenizer
|
||||
from .. import AutoFeatureExtractor, AutoImageProcessor, AutoProcessor, AutoTokenizer
|
||||
from ..utils import logging
|
||||
from ..utils.import_utils import is_optimum_available
|
||||
from .convert import export, validate_model_outputs
|
||||
@ -145,6 +145,8 @@ def export_with_transformers(args):
|
||||
preprocessor = get_preprocessor(args.model)
|
||||
elif args.preprocessor == "tokenizer":
|
||||
preprocessor = AutoTokenizer.from_pretrained(args.model)
|
||||
elif args.preprocessor == "image_processor":
|
||||
preprocessor = AutoImageProcessor.from_pretrained(args.model)
|
||||
elif args.preprocessor == "feature_extractor":
|
||||
preprocessor = AutoFeatureExtractor.from_pretrained(args.model)
|
||||
elif args.preprocessor == "processor":
|
||||
@ -213,7 +215,7 @@ def main():
|
||||
parser.add_argument(
|
||||
"--preprocessor",
|
||||
type=str,
|
||||
choices=["auto", "tokenizer", "feature_extractor", "processor"],
|
||||
choices=["auto", "tokenizer", "feature_extractor", "image_processor", "processor"],
|
||||
default="auto",
|
||||
help="Which type of preprocessor to use. 'auto' tries to automatically detect it.",
|
||||
)
|
||||
|
@ -49,7 +49,7 @@ if is_vision_available():
|
||||
import PIL
|
||||
from PIL import Image
|
||||
|
||||
from transformers import BeitFeatureExtractor
|
||||
from transformers import BeitImageProcessor
|
||||
|
||||
|
||||
class BeitModelTester:
|
||||
@ -342,18 +342,16 @@ def prepare_img():
|
||||
@require_vision
|
||||
class BeitModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return (
|
||||
BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
||||
)
|
||||
def default_image_processor(self):
|
||||
return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_masked_image_modeling_head(self):
|
||||
model = BeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
||||
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
||||
|
||||
# prepare bool_masked_pos
|
||||
bool_masked_pos = torch.ones((1, 196), dtype=torch.bool).to(torch_device)
|
||||
@ -377,9 +375,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
||||
def test_inference_image_classification_head_imagenet_1k(self):
|
||||
model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -403,9 +401,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
||||
torch_device
|
||||
)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -428,11 +426,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
||||
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
|
||||
model = model.to(torch_device)
|
||||
|
||||
feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
|
||||
image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)
|
||||
|
||||
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
||||
image = Image.open(ds[0]["file"])
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -471,11 +469,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
||||
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
|
||||
model = model.to(torch_device)
|
||||
|
||||
feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
|
||||
image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)
|
||||
|
||||
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
||||
image = Image.open(ds[0]["file"])
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -483,10 +481,10 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
||||
|
||||
outputs.logits = outputs.logits.detach().cpu()
|
||||
|
||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
||||
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
||||
expected_shape = torch.Size((500, 300))
|
||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||
|
||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
|
||||
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
|
||||
expected_shape = torch.Size((160, 160))
|
||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||
|
@ -33,7 +33,7 @@ if is_flax_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import BeitFeatureExtractor
|
||||
from transformers import BeitImageProcessor
|
||||
|
||||
|
||||
class FlaxBeitModelTester(unittest.TestCase):
|
||||
@ -219,18 +219,16 @@ def prepare_img():
|
||||
@require_flax
|
||||
class FlaxBeitModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return (
|
||||
BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
||||
)
|
||||
def default_image_processor(self):
|
||||
return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_masked_image_modeling_head(self):
|
||||
model = FlaxBeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k")
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
pixel_values = feature_extractor(images=image, return_tensors="np").pixel_values
|
||||
pixel_values = image_processor(images=image, return_tensors="np").pixel_values
|
||||
|
||||
# prepare bool_masked_pos
|
||||
bool_masked_pos = np.ones((1, 196), dtype=bool)
|
||||
@ -253,9 +251,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
|
||||
def test_inference_image_classification_head_imagenet_1k(self):
|
||||
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="np")
|
||||
inputs = image_processor(images=image, return_tensors="np")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**inputs)
|
||||
@ -276,9 +274,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
|
||||
def test_inference_image_classification_head_imagenet_22k(self):
|
||||
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-large-patch16-224-pt22k-ft22k")
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="np")
|
||||
inputs = image_processor(images=image, return_tensors="np")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**inputs)
|
||||
|
@ -297,7 +297,7 @@ def prepare_img():
|
||||
@require_vision
|
||||
class BitModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
BitImageProcessor.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]) if is_vision_available() else None
|
||||
)
|
||||
@ -306,9 +306,9 @@ class BitModelIntegrationTest(unittest.TestCase):
|
||||
def test_inference_image_classification_head(self):
|
||||
model = BitForImageClassification.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -145,7 +145,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
||||
pass
|
||||
|
||||
def test_call_pil(self):
|
||||
# Initialize feature_extractor
|
||||
# Initialize image processor
|
||||
image_processing = self.image_processing_class(**self.image_processor_dict)
|
||||
# create random PIL images
|
||||
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False)
|
||||
@ -176,7 +176,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
||||
)
|
||||
|
||||
def test_call_numpy(self):
|
||||
# Initialize feature_extractor
|
||||
# Initialize image processor
|
||||
image_processing = self.image_processing_class(**self.image_processor_dict)
|
||||
# create random numpy tensors
|
||||
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, numpify=True)
|
||||
@ -207,7 +207,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
||||
)
|
||||
|
||||
def test_call_pytorch(self):
|
||||
# Initialize feature_extractor
|
||||
# Initialize image processor
|
||||
image_processing = self.image_processing_class(**self.image_processor_dict)
|
||||
# create random PyTorch tensors
|
||||
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, torchify=True)
|
||||
@ -238,7 +238,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
||||
)
|
||||
|
||||
def test_equivalence_pad_and_create_pixel_mask(self):
|
||||
# Initialize feature_extractors
|
||||
# Initialize image processors
|
||||
image_processing_1 = self.image_processing_class(**self.image_processor_dict)
|
||||
image_processing_2 = self.image_processing_class(do_resize=False, do_normalize=False, do_rescale=False)
|
||||
# create random PyTorch tensors
|
||||
|
@ -43,7 +43,7 @@ if is_timm_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import ConditionalDetrFeatureExtractor
|
||||
from transformers import ConditionalDetrImageProcessor
|
||||
|
||||
|
||||
class ConditionalDetrModelTester:
|
||||
@ -493,9 +493,9 @@ def prepare_img():
|
||||
@slow
|
||||
class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
ConditionalDetrFeatureExtractor.from_pretrained("microsoft/conditional-detr-resnet-50")
|
||||
ConditionalDetrImageProcessor.from_pretrained("microsoft/conditional-detr-resnet-50")
|
||||
if is_vision_available()
|
||||
else None
|
||||
)
|
||||
@ -503,9 +503,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
||||
def test_inference_no_head(self):
|
||||
model = ConditionalDetrModel.from_pretrained("microsoft/conditional-detr-resnet-50").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model(**encoding)
|
||||
@ -522,9 +522,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
||||
torch_device
|
||||
)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||
|
||||
@ -547,7 +547,7 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
||||
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
|
||||
|
||||
# verify postprocessing
|
||||
results = feature_extractor.post_process_object_detection(
|
||||
results = image_processor.post_process_object_detection(
|
||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||
)[0]
|
||||
expected_scores = torch.tensor([0.8330, 0.8313, 0.8039, 0.6829, 0.5355]).to(torch_device)
|
||||
|
@ -38,7 +38,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import AutoFeatureExtractor
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
|
||||
class ConvNextModelTester:
|
||||
@ -285,16 +285,16 @@ def prepare_img():
|
||||
@require_vision
|
||||
class ConvNextModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return AutoFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
||||
def default_image_processor(self):
|
||||
return AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_image_classification_head(self):
|
||||
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -38,7 +38,7 @@ if is_tf_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import ConvNextFeatureExtractor
|
||||
from transformers import ConvNextImageProcessor
|
||||
|
||||
|
||||
class TFConvNextModelTester:
|
||||
@ -279,18 +279,16 @@ def prepare_img():
|
||||
@require_vision
|
||||
class TFConvNextModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return (
|
||||
ConvNextFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
||||
)
|
||||
def default_image_processor(self):
|
||||
return ConvNextImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_image_classification_head(self):
|
||||
model = TFConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224")
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
||||
inputs = image_processor(images=image, return_tensors="tf")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**inputs)
|
||||
|
@ -38,7 +38,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import AutoFeatureExtractor
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
|
||||
class CvtConfigTester(ConfigTester):
|
||||
@ -264,16 +264,16 @@ def prepare_img():
|
||||
@require_vision
|
||||
class CvtModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return AutoFeatureExtractor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
def default_image_processor(self):
|
||||
return AutoImageProcessor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
|
||||
@slow
|
||||
def test_inference_image_classification_head(self):
|
||||
model = CvtForImageClassification.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -28,7 +28,7 @@ if is_tf_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import AutoFeatureExtractor
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
|
||||
class TFCvtConfigTester(ConfigTester):
|
||||
@ -265,16 +265,16 @@ def prepare_img():
|
||||
@require_vision
|
||||
class TFCvtModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return AutoFeatureExtractor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
def default_image_processor(self):
|
||||
return AutoImageProcessor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
|
||||
@slow
|
||||
def test_inference_image_classification_head(self):
|
||||
model = TFCvtForImageClassification.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
||||
inputs = image_processor(images=image, return_tensors="tf")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**inputs)
|
||||
|
@ -44,7 +44,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import BeitFeatureExtractor
|
||||
from transformers import BeitImageProcessor
|
||||
|
||||
|
||||
class Data2VecVisionModelTester:
|
||||
@ -327,11 +327,9 @@ def prepare_img():
|
||||
@require_vision
|
||||
class Data2VecVisionModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
|
||||
if is_vision_available()
|
||||
else None
|
||||
BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
|
||||
)
|
||||
|
||||
@slow
|
||||
@ -340,9 +338,9 @@ class Data2VecVisionModelIntegrationTest(unittest.TestCase):
|
||||
torch_device
|
||||
)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -46,7 +46,7 @@ if is_tf_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import BeitFeatureExtractor
|
||||
from transformers import BeitImageProcessor
|
||||
|
||||
|
||||
class TFData2VecVisionModelTester:
|
||||
@ -469,20 +469,18 @@ def prepare_img():
|
||||
@require_vision
|
||||
class TFData2VecVisionModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
|
||||
if is_vision_available()
|
||||
else None
|
||||
BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
|
||||
)
|
||||
|
||||
@slow
|
||||
def test_inference_image_classification_head_imagenet_1k(self):
|
||||
model = TFData2VecVisionForImageClassification.from_pretrained("facebook/data2vec-vision-base-ft1k")
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
||||
inputs = image_processor(images=image, return_tensors="tf")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**inputs)
|
||||
|
@ -39,7 +39,7 @@ if is_timm_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import AutoFeatureExtractor
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
|
||||
class DeformableDetrModelTester:
|
||||
@ -563,15 +563,15 @@ def prepare_img():
|
||||
@slow
|
||||
class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return AutoFeatureExtractor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None
|
||||
def default_image_processor(self):
|
||||
return AutoImageProcessor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None
|
||||
|
||||
def test_inference_object_detection_head(self):
|
||||
model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||
|
||||
@ -595,7 +595,7 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
||||
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
|
||||
|
||||
# verify postprocessing
|
||||
results = feature_extractor.post_process_object_detection(
|
||||
results = image_processor.post_process_object_detection(
|
||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||
)[0]
|
||||
expected_scores = torch.tensor([0.7999, 0.7894, 0.6331, 0.4720, 0.4382]).to(torch_device)
|
||||
@ -612,9 +612,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
||||
"SenseTime/deformable-detr-with-box-refine-two-stage"
|
||||
).to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||
|
||||
@ -639,9 +639,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
||||
|
||||
@require_torch_gpu
|
||||
def test_inference_object_detection_head_equivalence_cpu_gpu(self):
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
||||
encoding = image_processor(images=image, return_tensors="pt")
|
||||
pixel_values = encoding["pixel_values"]
|
||||
pixel_mask = encoding["pixel_mask"]
|
||||
|
||||
|
@ -55,7 +55,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DeiTFeatureExtractor
|
||||
from transformers import DeiTImageProcessor
|
||||
|
||||
|
||||
class DeiTModelTester:
|
||||
@ -381,9 +381,9 @@ def prepare_img():
|
||||
@require_vision
|
||||
class DeiTModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||
DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||
if is_vision_available()
|
||||
else None
|
||||
)
|
||||
@ -394,9 +394,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
|
||||
torch_device
|
||||
)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -420,10 +420,10 @@ class DeiTModelIntegrationTest(unittest.TestCase):
|
||||
model = DeiTModel.from_pretrained(
|
||||
"facebook/deit-base-distilled-patch16-224", torch_dtype=torch.float16, device_map="auto"
|
||||
)
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
pixel_values = inputs.pixel_values.to(torch_device)
|
||||
|
||||
# forward pass to make sure inference works in fp16
|
||||
|
@ -46,7 +46,7 @@ if is_tf_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DeiTFeatureExtractor
|
||||
from transformers import DeiTImageProcessor
|
||||
|
||||
|
||||
class TFDeiTModelTester:
|
||||
@ -266,9 +266,9 @@ def prepare_img():
|
||||
@require_vision
|
||||
class DeiTModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||
DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||
if is_vision_available()
|
||||
else None
|
||||
)
|
||||
@ -277,9 +277,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
|
||||
def test_inference_image_classification_head(self):
|
||||
model = TFDeiTForImageClassificationWithTeacher.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
||||
inputs = image_processor(images=image, return_tensors="tf")
|
||||
|
||||
# forward pass
|
||||
outputs = model(**inputs)
|
||||
|
@ -38,7 +38,7 @@ if is_timm_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DetrFeatureExtractor
|
||||
from transformers import DetrImageProcessor
|
||||
|
||||
|
||||
class DetrModelTester:
|
||||
@ -512,15 +512,15 @@ def prepare_img():
|
||||
@slow
|
||||
class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None
|
||||
def default_image_processor(self):
|
||||
return DetrImageProcessor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None
|
||||
|
||||
def test_inference_no_head(self):
|
||||
model = DetrModel.from_pretrained("facebook/detr-resnet-50").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model(**encoding)
|
||||
@ -535,9 +535,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
||||
def test_inference_object_detection_head(self):
|
||||
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||
|
||||
@ -560,7 +560,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
||||
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
|
||||
|
||||
# verify postprocessing
|
||||
results = feature_extractor.post_process_object_detection(
|
||||
results = image_processor.post_process_object_detection(
|
||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||
)[0]
|
||||
expected_scores = torch.tensor([0.9982, 0.9960, 0.9955, 0.9988, 0.9987]).to(torch_device)
|
||||
@ -575,9 +575,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
||||
def test_inference_panoptic_segmentation_head(self):
|
||||
model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50-panoptic").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||
|
||||
@ -607,7 +607,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
||||
self.assertTrue(torch.allclose(outputs.pred_masks[0, 0, :3, :3], expected_slice_masks, atol=1e-3))
|
||||
|
||||
# verify postprocessing
|
||||
results = feature_extractor.post_process_panoptic_segmentation(
|
||||
results = image_processor.post_process_panoptic_segmentation(
|
||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||
)[0]
|
||||
|
||||
@ -633,9 +633,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
||||
@slow
|
||||
class DetrModelIntegrationTests(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
|
||||
DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
|
||||
if is_vision_available()
|
||||
else None
|
||||
)
|
||||
@ -643,9 +643,9 @@ class DetrModelIntegrationTests(unittest.TestCase):
|
||||
def test_inference_no_head(self):
|
||||
model = DetrModel.from_pretrained("facebook/detr-resnet-50", revision="no_timm").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model(**encoding)
|
||||
|
@ -367,16 +367,16 @@ class DinatModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
|
||||
@require_torch
|
||||
class DinatModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return AutoImageProcessor.from_pretrained("shi-labs/dinat-mini-in1k-224") if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_image_classification_head(self):
|
||||
model = DinatForImageClassification.from_pretrained("shi-labs/dinat-mini-in1k-224").to(torch_device)
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
|
||||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -25,7 +25,7 @@ if is_torch_available():
|
||||
from transformers import AutoModelForImageClassification
|
||||
|
||||
if is_vision_available():
|
||||
from transformers import AutoFeatureExtractor
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
|
||||
@require_torch
|
||||
@ -33,7 +33,7 @@ if is_vision_available():
|
||||
class DiTIntegrationTest(unittest.TestCase):
|
||||
@slow
|
||||
def test_for_image_classification(self):
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
|
||||
image_processor = AutoImageProcessor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
|
||||
model = AutoModelForImageClassification.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
|
||||
model.to(torch_device)
|
||||
|
||||
@ -43,7 +43,7 @@ class DiTIntegrationTest(unittest.TestCase):
|
||||
|
||||
image = dataset["train"][0]["image"].convert("RGB")
|
||||
|
||||
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -39,7 +39,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DPTFeatureExtractor
|
||||
from transformers import DPTImageProcessor
|
||||
|
||||
|
||||
class DPTModelTester:
|
||||
@ -293,11 +293,11 @@ def prepare_img():
|
||||
@slow
|
||||
class DPTModelIntegrationTest(unittest.TestCase):
|
||||
def test_inference_depth_estimation(self):
|
||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large")
|
||||
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large")
|
||||
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large").to(torch_device)
|
||||
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -315,11 +315,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
|
||||
self.assertTrue(torch.allclose(outputs.predicted_depth[0, :3, :3], expected_slice, atol=1e-4))
|
||||
|
||||
def test_inference_semantic_segmentation(self):
|
||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
|
||||
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
|
||||
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
|
||||
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -336,11 +336,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
|
||||
self.assertTrue(torch.allclose(outputs.logits[0, 0, :3, :3], expected_slice, atol=1e-4))
|
||||
|
||||
def test_post_processing_semantic_segmentation(self):
|
||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
|
||||
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
|
||||
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
|
||||
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -348,10 +348,10 @@ class DPTModelIntegrationTest(unittest.TestCase):
|
||||
|
||||
outputs.logits = outputs.logits.detach().cpu()
|
||||
|
||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
||||
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
||||
expected_shape = torch.Size((500, 300))
|
||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||
|
||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
|
||||
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
|
||||
expected_shape = torch.Size((480, 480))
|
||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||
|
@ -39,7 +39,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import DPTFeatureExtractor
|
||||
from transformers import DPTImageProcessor
|
||||
|
||||
|
||||
class DPTModelTester:
|
||||
@ -314,11 +314,11 @@ def prepare_img():
|
||||
@slow
|
||||
class DPTModelIntegrationTest(unittest.TestCase):
|
||||
def test_inference_depth_estimation(self):
|
||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas")
|
||||
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
|
||||
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(torch_device)
|
||||
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -444,7 +444,7 @@ def prepare_img():
|
||||
@require_vision
|
||||
class EfficientFormerModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return (
|
||||
EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
|
||||
if is_vision_available()
|
||||
@ -457,9 +457,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
|
||||
torch_device
|
||||
)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
@ -478,9 +478,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
|
||||
"snap-research/efficientformer-l1-300"
|
||||
).to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -37,7 +37,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import GLPNFeatureExtractor
|
||||
from transformers import GLPNImageProcessor
|
||||
|
||||
|
||||
class GLPNConfigTester(ConfigTester):
|
||||
@ -337,11 +337,11 @@ def prepare_img():
|
||||
class GLPNModelIntegrationTest(unittest.TestCase):
|
||||
@slow
|
||||
def test_inference_depth_estimation(self):
|
||||
feature_extractor = GLPNFeatureExtractor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
image_processor = GLPNImageProcessor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
model = GLPNForDepthEstimation.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
||||
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -49,7 +49,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import ImageGPTFeatureExtractor
|
||||
from transformers import ImageGPTImageProcessor
|
||||
|
||||
|
||||
class ImageGPTModelTester:
|
||||
@ -535,16 +535,16 @@ def prepare_img():
|
||||
@require_vision
|
||||
class ImageGPTModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return ImageGPTFeatureExtractor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None
|
||||
def default_image_processor(self):
|
||||
return ImageGPTImageProcessor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_causal_lm_head(self):
|
||||
model = ImageGPTForCausalImageModeling.from_pretrained("openai/imagegpt-small").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -45,7 +45,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import LayoutLMv3FeatureExtractor
|
||||
from transformers import LayoutLMv3ImageProcessor
|
||||
|
||||
|
||||
class LayoutLMv3ModelTester:
|
||||
@ -382,16 +382,16 @@ def prepare_img():
|
||||
@require_torch
|
||||
class LayoutLMv3ModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
|
||||
def default_image_processor(self):
|
||||
return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_no_head(self):
|
||||
model = LayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base").to(torch_device)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
||||
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
||||
|
||||
input_ids = torch.tensor([[1, 2]])
|
||||
bbox = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]]).unsqueeze(0)
|
||||
|
@ -51,7 +51,7 @@ if is_tf_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import LayoutLMv3FeatureExtractor
|
||||
from transformers import LayoutLMv3ImageProcessor
|
||||
|
||||
|
||||
class TFLayoutLMv3ModelTester:
|
||||
@ -482,16 +482,16 @@ def prepare_img():
|
||||
@require_tf
|
||||
class TFLayoutLMv3ModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
|
||||
def default_image_processor(self):
|
||||
return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None
|
||||
|
||||
@slow
|
||||
def test_inference_no_head(self):
|
||||
model = TFLayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base")
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
pixel_values = feature_extractor(images=image, return_tensors="tf").pixel_values
|
||||
pixel_values = image_processor(images=image, return_tensors="tf").pixel_values
|
||||
|
||||
input_ids = tf.constant([[1, 2]])
|
||||
bbox = tf.expand_dims(tf.constant([[1, 2, 3, 4], [5, 6, 7, 8]]), axis=0)
|
||||
|
@ -36,7 +36,7 @@ from transformers.utils import FEATURE_EXTRACTOR_NAME, cached_property, is_pytes
|
||||
if is_pytesseract_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import LayoutLMv2FeatureExtractor, LayoutXLMProcessor
|
||||
from transformers import LayoutLMv2ImageProcessor, LayoutXLMProcessor
|
||||
|
||||
|
||||
@require_pytesseract
|
||||
@ -47,7 +47,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
||||
rust_tokenizer_class = LayoutXLMTokenizerFast
|
||||
|
||||
def setUp(self):
|
||||
feature_extractor_map = {
|
||||
image_processor_map = {
|
||||
"do_resize": True,
|
||||
"size": 224,
|
||||
"apply_ocr": True,
|
||||
@ -56,7 +56,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
||||
self.tmpdirname = tempfile.mkdtemp()
|
||||
self.feature_extraction_file = os.path.join(self.tmpdirname, FEATURE_EXTRACTOR_NAME)
|
||||
with open(self.feature_extraction_file, "w", encoding="utf-8") as fp:
|
||||
fp.write(json.dumps(feature_extractor_map) + "\n")
|
||||
fp.write(json.dumps(image_processor_map) + "\n")
|
||||
|
||||
# taken from `test_tokenization_layoutxlm.LayoutXLMTokenizationTest.test_save_pretrained`
|
||||
self.tokenizer_pretrained_name = "hf-internal-testing/tiny-random-layoutxlm"
|
||||
@ -70,8 +70,8 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
||||
def get_tokenizers(self, **kwargs) -> List[PreTrainedTokenizerBase]:
|
||||
return [self.get_tokenizer(**kwargs), self.get_rust_tokenizer(**kwargs)]
|
||||
|
||||
def get_feature_extractor(self, **kwargs):
|
||||
return LayoutLMv2FeatureExtractor.from_pretrained(self.tmpdirname, **kwargs)
|
||||
def get_image_processor(self, **kwargs):
|
||||
return LayoutLMv2ImageProcessor.from_pretrained(self.tmpdirname, **kwargs)
|
||||
|
||||
def tearDown(self):
|
||||
shutil.rmtree(self.tmpdirname)
|
||||
@ -88,10 +88,10 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
||||
return image_inputs
|
||||
|
||||
def test_save_load_pretrained_default(self):
|
||||
feature_extractor = self.get_feature_extractor()
|
||||
image_processor = self.get_image_processor()
|
||||
tokenizers = self.get_tokenizers()
|
||||
for tokenizer in tokenizers:
|
||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
|
||||
processor.save_pretrained(self.tmpdirname)
|
||||
processor = LayoutXLMProcessor.from_pretrained(self.tmpdirname)
|
||||
@ -99,16 +99,16 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
||||
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer.get_vocab())
|
||||
self.assertIsInstance(processor.tokenizer, (LayoutXLMTokenizer, LayoutXLMTokenizerFast))
|
||||
|
||||
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor.to_json_string())
|
||||
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
|
||||
self.assertEqual(processor.image_processor.to_json_string(), image_processor.to_json_string())
|
||||
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
|
||||
|
||||
def test_save_load_pretrained_additional_features(self):
|
||||
processor = LayoutXLMProcessor(feature_extractor=self.get_feature_extractor(), tokenizer=self.get_tokenizer())
|
||||
processor = LayoutXLMProcessor(image_processor=self.get_image_processor(), tokenizer=self.get_tokenizer())
|
||||
processor.save_pretrained(self.tmpdirname)
|
||||
|
||||
# slow tokenizer
|
||||
tokenizer_add_kwargs = self.get_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
|
||||
feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
|
||||
image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)
|
||||
|
||||
processor = LayoutXLMProcessor.from_pretrained(
|
||||
self.tmpdirname,
|
||||
@ -122,12 +122,12 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
||||
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
|
||||
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizer)
|
||||
|
||||
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
|
||||
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
|
||||
self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
|
||||
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
|
||||
|
||||
# fast tokenizer
|
||||
tokenizer_add_kwargs = self.get_rust_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
|
||||
feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
|
||||
image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)
|
||||
|
||||
processor = LayoutXLMProcessor.from_pretrained(
|
||||
self.tmpdirname, use_xlm=True, bos_token="(BOS)", eos_token="(EOS)", do_resize=False, size=30
|
||||
@ -136,14 +136,14 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
||||
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
|
||||
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizerFast)
|
||||
|
||||
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
|
||||
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
|
||||
self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
|
||||
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
|
||||
|
||||
def test_model_input_names(self):
|
||||
feature_extractor = self.get_feature_extractor()
|
||||
image_processor = self.get_image_processor()
|
||||
tokenizer = self.get_tokenizer()
|
||||
|
||||
processor = LayoutXLMProcessor(tokenizer=tokenizer, feature_extractor=feature_extractor)
|
||||
processor = LayoutXLMProcessor(tokenizer=tokenizer, image_processor=image_processor)
|
||||
|
||||
input_str = "lower newer"
|
||||
image_input = self.prepare_image_inputs()
|
||||
@ -215,15 +215,15 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
||||
def test_processor_case_1(self):
|
||||
# case 1: document image classification (training, inference) + token classification (inference), apply_ocr = True
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor()
|
||||
image_processor = LayoutLMv2ImageProcessor()
|
||||
tokenizers = self.get_tokenizers
|
||||
images = self.get_images
|
||||
|
||||
for tokenizer in tokenizers:
|
||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
|
||||
# not batched
|
||||
input_feat_extract = feature_extractor(images[0], return_tensors="pt")
|
||||
input_feat_extract = image_processor(images[0], return_tensors="pt")
|
||||
input_processor = processor(images[0], return_tensors="pt")
|
||||
|
||||
# verify keys
|
||||
@ -245,7 +245,7 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
||||
self.assertSequenceEqual(decoding, expected_decoding)
|
||||
|
||||
# batched
|
||||
input_feat_extract = feature_extractor(images, return_tensors="pt")
|
||||
input_feat_extract = image_processor(images, return_tensors="pt")
|
||||
input_processor = processor(images, padding=True, return_tensors="pt")
|
||||
|
||||
# verify keys
|
||||
@ -270,12 +270,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
||||
def test_processor_case_2(self):
|
||||
# case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
|
||||
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
|
||||
tokenizers = self.get_tokenizers
|
||||
images = self.get_images
|
||||
|
||||
for tokenizer in tokenizers:
|
||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
|
||||
# not batched
|
||||
words = ["hello", "world"]
|
||||
@ -324,12 +324,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
||||
def test_processor_case_3(self):
|
||||
# case 3: token classification (training), apply_ocr=False
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
|
||||
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
|
||||
tokenizers = self.get_tokenizers
|
||||
images = self.get_images
|
||||
|
||||
for tokenizer in tokenizers:
|
||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
|
||||
# not batched
|
||||
words = ["weirdly", "world"]
|
||||
@ -389,12 +389,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
||||
def test_processor_case_4(self):
|
||||
# case 4: visual question answering (inference), apply_ocr=True
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor()
|
||||
image_processor = LayoutLMv2ImageProcessor()
|
||||
tokenizers = self.get_tokenizers
|
||||
images = self.get_images
|
||||
|
||||
for tokenizer in tokenizers:
|
||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
|
||||
# not batched
|
||||
question = "What's his name?"
|
||||
@ -440,12 +440,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
||||
def test_processor_case_5(self):
|
||||
# case 5: visual question answering (inference), apply_ocr=False
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
|
||||
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
|
||||
tokenizers = self.get_tokenizers
|
||||
images = self.get_images
|
||||
|
||||
for tokenizer in tokenizers:
|
||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
||||
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||
|
||||
# not batched
|
||||
question = "What's his name?"
|
||||
|
@ -46,7 +46,7 @@ if is_torch_available():
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
from transformers import LevitFeatureExtractor
|
||||
from transformers import LevitImageProcessor
|
||||
|
||||
|
||||
class LevitConfigTester(ConfigTester):
|
||||
@ -409,8 +409,8 @@ def prepare_img():
|
||||
@require_vision
|
||||
class LevitModelIntegrationTest(unittest.TestCase):
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
return LevitFeatureExtractor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
def default_image_processor(self):
|
||||
return LevitImageProcessor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||
|
||||
@slow
|
||||
def test_inference_image_classification_head(self):
|
||||
@ -418,9 +418,9 @@ class LevitModelIntegrationTest(unittest.TestCase):
|
||||
torch_device
|
||||
)
|
||||
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||
|
||||
# forward pass
|
||||
with torch.no_grad():
|
||||
|
@ -545,9 +545,9 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
||||
self.assertEqual(segmentation[0].shape, target_sizes[0])
|
||||
|
||||
def test_post_process_instance_segmentation(self):
|
||||
feature_extractor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
|
||||
image_processor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
|
||||
outputs = self.image_processor_tester.get_fake_mask2former_outputs()
|
||||
segmentation = feature_extractor.post_process_instance_segmentation(outputs, threshold=0)
|
||||
segmentation = image_processor.post_process_instance_segmentation(outputs, threshold=0)
|
||||
|
||||
self.assertTrue(len(segmentation) == self.image_processor_tester.batch_size)
|
||||
for el in segmentation:
|
||||
@ -556,7 +556,7 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
||||
self.assertEqual(type(el["segments_info"]), list)
|
||||
self.assertEqual(el["segmentation"].shape, (384, 384))
|
||||
|
||||
segmentation = feature_extractor.post_process_instance_segmentation(
|
||||
segmentation = image_processor.post_process_instance_segmentation(
|
||||
outputs, threshold=0, return_binary_maps=True
|
||||
)
|
||||
|
||||
|
@ -325,14 +325,14 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
|
||||
return "facebook/mask2former-swin-small-coco-instance"
|
||||
|
||||
@cached_property
|
||||
def default_feature_extractor(self):
|
||||
def default_image_processor(self):
|
||||
return Mask2FormerImageProcessor.from_pretrained(self.model_checkpoints) if is_vision_available() else None
|
||||
|
||||
def test_inference_no_head(self):
|
||||
model = Mask2FormerModel.from_pretrained(self.model_checkpoints).to(torch_device)
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(image, return_tensors="pt").to(torch_device)
|
||||
inputs_shape = inputs["pixel_values"].shape
|
||||
# check size is divisible by 32
|
||||
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
|
||||
@ -371,9 +371,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
|
||||
|
||||
def test_inference_universal_segmentation_head(self):
|
||||
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
image = prepare_img()
|
||||
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
|
||||
inputs = image_processor(image, return_tensors="pt").to(torch_device)
|
||||
inputs_shape = inputs["pixel_values"].shape
|
||||
# check size is divisible by 32
|
||||
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
|
||||
@ -408,9 +408,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
|
||||
|
||||
def test_with_segmentation_maps_and_loss(self):
|
||||
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
|
||||
feature_extractor = self.default_feature_extractor
|
||||
image_processor = self.default_image_processor
|
||||
|
||||
inputs = feature_extractor(
|
||||
inputs = image_processor(
|
||||
[np.zeros((3, 800, 1333)), np.zeros((3, 800, 1333))],
|
||||
segmentation_maps=[np.zeros((384, 384)).astype(np.float32), np.zeros((384, 384)).astype(np.float32)],
|
||||
return_tensors="pt",
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user