mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-06 06:10:04 +06:00

* First draft * Add docs * Clean up code * Convert model * Add image processor * Convert Zoe_K * More improvements * Improve variable names and docstrings * Improve variable names * Improve variable names * Replace nn.sequential * More improvements * Convert ZoeD_NK * Fix most tests * Verify pixel values * Verify pixel values * Add squeeze * Update beit to support arbitrary window sizes * Improve image processor * Improve docstring * Improve beit * Improve model outputs * Add figure * Fix beit * Update checkpoint * Fix repo id * Add _keys_to_ignore_on_load_unexpected * More improvements * Address comments * Address comments * Address comments * Address comments * Rename variable name * Add backbone_hidden_size * Vectorize * Vectorize more * Address comments * Clarify docstring * Remove backbone_hidden_size * Fix image processor * Remove print statements * Remove print statement * Add integration test * Address comments * Address comments * Address comments * Address comments * Add requires_backends * Clean up * Simplify conversion script * Simplify more * Simplify more * Simplify more * Clean up * Make sure beit is loaded correctly * Address comment * Address bin_configurations * Use bin_configurations * Convert models, add integration tests * Fix doc test * Address comments * Unify regressor classes * Clarify arguments * Improve resize_image * Add num_relative_features * Address comment * [run-slow]beit,data2vec,zoedepth * [run-slow]beit,data2vec,zoedepth * Address comments * Address comment * Address comment * Replace nn.TransformerEncoderLayer and nn.TransformerEncoder * Replace nn.MultiheadAttention * Add attributes for patch transformer to config * Add tests for ensure_multiple_of * Update organization * Add tests * [run-slow] beit data2vec * Update ruff * [run-slow] beit data2vec * Add comment * Improve docstrings, add test * Fix interpolate_pos_encoding * Fix slow tests * Add docstring * Update src/transformers/models/zoedepth/image_processing_zoedepth.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/zoedepth/image_processing_zoedepth.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Improve tests and docstrings * Use run_common_tests * Improve docstrings * Improve docstrings * Improve tests * Improve tests * Remove print statements --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
108 lines
5.5 KiB
Markdown
108 lines
5.5 KiB
Markdown
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
|
|
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
|
rendered properly in your Markdown viewer.
|
|
|
|
-->
|
|
|
|
# ZoeDepth
|
|
|
|
## Overview
|
|
|
|
The ZoeDepth model was proposed in [ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth](https://arxiv.org/abs/2302.12288) by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller. ZoeDepth extends the [DPT](dpt) framework for metric (also called absolute) depth estimation. ZoeDepth is pre-trained on 12 datasets using relative depth and fine-tuned on two domains (NYU and KITTI) using metric depth. A lightweight head is used with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains.*
|
|
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/zoedepth_architecture_bis.png"
|
|
alt="drawing" width="600"/>
|
|
|
|
<small> ZoeDepth architecture. Taken from the <a href="https://arxiv.org/abs/2302.12288">original paper.</a> </small>
|
|
|
|
This model was contributed by [nielsr](https://huggingface.co/nielsr).
|
|
The original code can be found [here](https://github.com/isl-org/ZoeDepth).
|
|
|
|
## Usage tips
|
|
|
|
- ZoeDepth is an absolute (also called metric) depth estimation model, unlike DPT which is a relative depth estimation model. This means that ZoeDepth is able to estimate depth in metric units like meters.
|
|
|
|
The easiest to perform inference with ZoeDepth is by leveraging the [pipeline API](../main_classes/pipelines.md):
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
from PIL import Image
|
|
import requests
|
|
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
|
image = Image.open(requests.get(url, stream=True).raw)
|
|
|
|
pipe = pipeline(task="depth-estimation", model="Intel/zoedepth-nyu-kitti")
|
|
result = pipe(image)
|
|
depth = result["depth"]
|
|
```
|
|
|
|
Alternatively, one can also perform inference using the classes:
|
|
|
|
```python
|
|
from transformers import AutoImageProcessor, ZoeDepthForDepthEstimation
|
|
import torch
|
|
import numpy as np
|
|
from PIL import Image
|
|
import requests
|
|
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
|
image = Image.open(requests.get(url, stream=True).raw)
|
|
|
|
image_processor = AutoImageProcessor.from_pretrained("Intel/zoedepth-nyu-kitti")
|
|
model = ZoeDepthForDepthEstimation.from_pretrained("Intel/zoedepth-nyu-kitti")
|
|
|
|
# prepare image for the model
|
|
inputs = image_processor(images=image, return_tensors="pt")
|
|
|
|
with torch.no_grad():
|
|
outputs = model(**inputs)
|
|
predicted_depth = outputs.predicted_depth
|
|
|
|
# interpolate to original size
|
|
prediction = torch.nn.functional.interpolate(
|
|
predicted_depth.unsqueeze(1),
|
|
size=image.size[::-1],
|
|
mode="bicubic",
|
|
align_corners=False,
|
|
)
|
|
|
|
# visualize the prediction
|
|
output = prediction.squeeze().cpu().numpy()
|
|
formatted = (output * 255 / np.max(output)).astype("uint8")
|
|
depth = Image.fromarray(formatted)
|
|
```
|
|
|
|
## Resources
|
|
|
|
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ZoeDepth.
|
|
|
|
- A demo notebook regarding inference with ZoeDepth models can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ZoeDepth). 🌎
|
|
|
|
## ZoeDepthConfig
|
|
|
|
[[autodoc]] ZoeDepthConfig
|
|
|
|
## ZoeDepthImageProcessor
|
|
|
|
[[autodoc]] ZoeDepthImageProcessor
|
|
- preprocess
|
|
|
|
## ZoeDepthForDepthEstimation
|
|
|
|
[[autodoc]] ZoeDepthForDepthEstimation
|
|
- forward |