mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 21:30:07 +06:00

Shubham Panchal 531e4fcf0e

Update model card for Depth Anything (#37065 )

[docs] Update model card for Depth Anything

2025-04-04 11:36:05 -07:00

3.8 KiB

Raw Blame History

Depth Anything

Depth Anything is designed to be a foundation model for monocular depth estimation (MDE). It is jointly trained on labeled and ~62M unlabeled images to enhance the dataset. It uses a pretrained DINOv2 model as an image encoder to inherit its existing rich semantic priors, and DPT as the decoder. A teacher model is trained on unlabeled images to create pseudo-labels. The student model is trained on a combination of the pseudo-labels and labeled images. To improve the student model's performance, strong perturbations are added to the unlabeled images to challenge the student model to learn more visual knowledge from the image.

You can find all the original Depth Anything checkpoints under the Depth Anything collection.

Tip

Click on the Depth Anything models in the right sidebar for more examples of how to apply Depth Anything to different vision tasks.

The example below demonstrates how to obtain a depth map with [Pipeline] or the [AutoModel] class.

import torch
from transformers import pipeline

pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf", torch_dtype=torch.bfloat16, device=0)
pipe("http://images.cocodataset.org/val2017/000000039769.jpg")["depth"]

import torch
import requests
import numpy as np
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForDepthEstimation

image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-base-hf")
model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-base-hf", torch_dtype=torch.bfloat16)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

post_processed_output = image_processor.post_process_depth_estimation(
    outputs,
    target_sizes=[(image.height, image.width)],
)
predicted_depth = post_processed_output[0]["predicted_depth"]
depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
depth = depth.detach().cpu().numpy() * 255
Image.fromarray(depth.astype("uint8"))

Notes

DepthAnythingV2, released in June 2024, uses the same architecture as Depth Anything and is compatible with all code examples and existing workflows. It uses synthetic data and a larger capacity teacher model to achieve much finer and robust depth predictions.

DepthAnythingConfig

autodoc DepthAnythingConfig

DepthAnythingForDepthEstimation

autodoc DepthAnythingForDepthEstimation - forward

3.8 KiB Raw Blame History

Depth Anything

Notes

DepthAnythingConfig

DepthAnythingForDepthEstimation

3.8 KiB

Raw Blame History