transformers/docs/source/en/model_doc
Armaghan Shakir 9a6be63fdb
Add Apple's Depth-Pro for depth estimation (#34583)
* implement config and model building blocks

* refactor model architechture

* update model outputs

* update init param to include use_fov_model

* update param name in config

* fix hidden_states and attentions outputs for fov

* sort config

* complete minor todos

* update patching

* update config for encoder

* fix config

* use correct defaults in config

* update merge for compatibility with different image size

* restructure encoder for custom configuration

* make fov model compatible with custom config

* replace word "decoder" with "fusion"

* weight conversion script

* fix fov squeeze

* update conversion script (without test)

* upload ruff image processing

* create fast image processing

* use torch interpolation for image processing

* complete post_process_depth_estimation

* config: fix imports and sort args

* apply inference in weight conversion

* use mllama script instead for weight conversion

* clean weight conversion script

* add depth-pro status in other files

* fill docstring in config

* formatting

* more formatting

* formatting with ruff

* formatting with style

* fix copied classes

* add examples; update weight convert script

* fix using check_table.py and isort

* fix config docstring

* add depth pro to sdpa docs

* undo unintentional changes in configuration_gemma.py

* minor fixes

* test image processing

* fixes and tests

* more fixes

* use output states from image_encoder instead

* Revert "use output states from image_encoder instead"

This reverts commit 2408ec54e4.

* make embeddings dynamic

* reshape output hidden states and attentions as part of computation graph

* fix ruff formating

* fix docstring failure

* use num_fov_head_layers in tests

* update doc

* check consistency with config

* ruff formatting

* update test case

* fix ruff formatting

* add tests for fov

* use interpolation in postprocess

* run and fix slow tests locally

* use scaled_images_features for image and fov encoder

* return fused_hidden_states in fusion stage

* fix example

* fix ruff

* fix copyright license for all files

* add __all__ for each file

* minor fixes
- fix download spell
- add push_to_hub option
- fix Optional type hinting
- apply single loop for DepthProImageProcessor.preprocess

* return list in post_process_depth_estimation

* minor fixes
- capitalize start of docstring
- use ignore copy
- fix examples
- move docstring templates and custom output classes to top
- remove "-> None" typehinting from __init__
- type hinting for forward passes
- fix docstrings for custom output classes

* fix "ruff check"

* update upsample and projection

* major changes: (image size and merge optimization)
- add support for images of any size
- optimize merge operation
- remove image_size from config
- use full names instead of B, C, H, W
- remove interpolation from fusion stage
- add interpolation after merge
- move validations to config
- update integration test
- add type hints for functions

* fix push_to_hub option in weights conversion

* remove image_size in weights conversion

* major changes in the architecture
- remove all DepthProViT modules and support different backbones using the AutoModel API
- set default use_fov_model to False
- validate parameters in configuration
- update interpolate function: use "nearest" for faster computation
- update reshape_feature function: remove all special tokens, possible from different backbones
- update merge function: use padding from config instead of merge_out_size
- remove patch_to_batch and batch_to_patch conversions for now
- calculate out_size dynamically in the encoder
- leave head_mask calculation to the backbone
- fix bugs with merge
- add more comments
- update tests

* placeholder for unused config attributes

* improve docs amid review

* minor change in docs

* further optimize merge

* fix formatting

* remove unused patch/batch convertion functions

* use original F.interpolate

* improve function naming

* minor chages
- use torch_int instead of int
- use proper for newly initialized tensors
- use user provided return_dict for patch_encoder
- use if-else block instead in self.use_fov_model

* rearchitect upsample block for improved modularity

* update upsample keys in weight conversion

* improve padding in merge_patches

* use double-loop for merge

* update comments

* create feature_extractor, reduce some forward code

* introduce config.use_mask_token in dinov2

* minor fixes

* minor fixes for onnx

* update __init__ to latest format

* remove DepthProConfig.to_dict()

* major changes in backbone

* update config in weight conversion

* formatting

* converted model is fp32

* improve naming and docs for feature_extractor->reconstruct_feature_maps

* minor fixes; amid review

* create intermediate vars in func call

* use torch.testing.assert_close

* use ModuleList instead of Sequential and ModuleDict

* update docs

* include fov in integraiton tests

* update docs

* improve initialization of convolution layers

* fix unused fov keys

* update tests

* ruff format

* fix test, amid kaimming initialization

* add depthpro to toctree

* add residual layer to _no_split_modules

* architecture rework

* Update src/transformers/models/depth_pro/image_processing_depth_pro.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/depth_pro/image_processing_depth_pro_fast.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* update docs

* improve merge_patches

* use flatten with fov_output

* ruff formatting

* update resources section in docs

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix typo "final_kernal_size"

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix output typehint for DepthProDepthEstimator

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* residual operation in 2 steps

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* use image_size instead of global patch_size in interpolation

* replace all Sequential with ModuleList

* update fov

* update heads

* fix and update conversion script for heads

* ruff formatting

* remove float32 conversion

* use "Fov" instead of "FOV" in class names

* use "Fov" instead of "FOV" in config docs

* remove prune_heads

* update fusion stage

* use device in examples

* update processor

* ruff fixes

* add do_rescale in image_processor_dict

* skip test: test_fast_is_faster_than_slow

* ruff formatting

* DepthProImageProcessorFast in other files

* revert antialias removal

* add antialias in BaseImageProcessorFast

* Revert "revert antialias removal"

This reverts commit 5caa0bd8f9.

* Revert "add antialias in BaseImageProcessorFast"

This reverts commit 3ae1134780.

* update processor for grouping and antialias

* try test_fast_is_faster_than_slow without "skip" or "flanky"

* update checkpoint

* update checkpoint

* use @is_flanky for processor test

* update checkpoint to "apple/DepthPro-hf"

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-02-10 11:32:45 +00:00
..
albert.md
align.md
altclip.md
aria.md
audio-spectrogram-transformer.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
auto.md
autoformer.md
bamba.md
bark.md
bart.md
barthez.md
bartpho.md
beit.md
bert-generation.md
bert-japanese.md
bert.md
bertweet.md
big_bird.md
bigbird_pegasus.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
biogpt.md
bit.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
blenderbot-small.md
blenderbot.md
blip-2.md
blip.md
bloom.md
bort.md
bridgetower.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
bros.md
byt5.md
camembert.md
canine.md
chameleon.md
chinese_clip.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
clap.md
clip.md
clipseg.md
clvp.md
code_llama.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
codegen.md
cohere.md
cohere2.md
colpali.md
conditional_detr.md Add examples for detection models finetuning (#30422) 2024-05-08 11:42:07 +01:00
convbert.md
convnext.md
convnextv2.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
cpm.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
cpmant.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
ctrl.md
cvt.md
dab-detr.md
dac.md Add Descript-Audio-Codec model (#31494) 2024-08-19 10:21:51 +01:00
data2vec.md
dbrx.md
deberta-v2.md
deberta.md
decision_transformer.md
deformable_detr.md Add Image Processor Fast Deformable DETR (#34353) 2024-11-19 11:18:58 -05:00
deit.md
deplot.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
depth_anything_v2.md
depth_anything.md
depth_pro.md
deta.md
detr.md
dialogpt.md
diffllama.md
dinat.md
dinov2_with_registers.md Add DINOv2 with registers (#35348) 2024-12-24 13:21:59 +01:00
dinov2.md
distilbert.md
dit.md
donut.md
dpr.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
dpt.md [DPT, Dinov2] Add resources (#27655) 2023-11-23 17:44:08 +00:00
efficientformer.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
efficientnet.md
electra.md
emu3.md
encodec.md
encoder-decoder.md
ernie_m.md
ernie.md
esm.md
falcon_mamba.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
falcon.md
falcon3.md
fastspeech2_conformer.md Super tiny fix 12 typos about "with with" (#29926) 2024-03-29 14:31:31 +00:00
flan-t5.md
flan-ul2.md
flaubert.md
flava.md
fnet.md
focalnet.md
fsmt.md
funnel.md
fuyu.md
gemma.md
gemma2.md
git.md
glm.md
glpn.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
got_ocr2.md
gpt_bigcode.md
gpt_neo.md
gpt_neox_japanese.md
gpt_neox.md
gpt-sw3.md
gpt2.md
gptj.md
gptsan-japanese.md
granite.md
granitemoe.md
granitevision.md
graphormer.md
grounding-dino.md
groupvit.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
helium.md
herbert.md
hiera.md
hubert.md
ibert.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
idefics.md
idefics2.md
idefics3.md
ijepa.md
imagegpt.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
informer.md
instructblip.md
instructblipvideo.md
jamba.md
jetmoe.md Add JetMoE model (#30005) 2024-05-14 16:32:01 +02:00
jukebox.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
kosmos-2.md
layoutlm.md
layoutlmv2.md
layoutlmv3.md
layoutxlm.md
led.md
levit.md
lilt.md
llama.md
llama2.md
llama3.md
llava_next_video.md
llava_next.md
llava_onevision.md
llava.md
longformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
longt5.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
luke.md
lxmert.md
m2m_100.md
madlad-400.md Add madlad-400 MT models (#27471) 2023-11-28 13:19:50 +00:00
mamba.md
mamba2.md quickfix documentation (#32566) 2024-08-26 17:49:44 +02:00
marian.md
markuplm.md [Docs] Fix spelling and grammar mistakes (#28825) 2024-02-02 08:45:00 +01:00
mask2former.md
maskformer.md
matcha.md
mbart.md
mctct.md
mega.md Deprecate low use models (#30781) 2024-05-28 18:07:07 +01:00
megatron_gpt2.md
megatron-bert.md
mgp-str.md [Docs] Fix broken links and syntax issues (#28918) 2024-02-08 14:13:35 -08:00
mimi.md
mistral.md
mixtral.md
mllama.md Mllama: update docs (#34334) 2024-10-30 10:11:50 +01:00
mluke.md
mms.md
mobilebert.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
mobilenet_v1.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
mobilenet_v2.md
mobilevit.md
mobilevitv2.md
modernbert.md
moonshine.md
moshi.md
mpnet.md
mpt.md
mra.md
mt5.md Adding [T5/MT5/UMT5]ForTokenClassification (#28443) 2024-02-01 03:53:49 +01:00
musicgen_melody.md
musicgen.md
mvp.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
myt5.md
nat.md
nemotron.md
nezha.md
nllb-moe.md
nllb.md
nougat.md
nystromformer.md
olmo.md
olmo2.md
olmoe.md
omdet-turbo.md
oneformer.md
open-llama.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
openai-gpt.md
opt.md
owlv2.md
owlvit.md
paligemma.md
patchtsmixer.md [Docs] Add resources (#28705) 2024-02-19 15:22:29 +01:00
patchtst.md [Docs] Add resources (#28705) 2024-02-19 15:22:29 +01:00
pegasus_x.md
pegasus.md
perceiver.md
persimmon.md
phi.md
phi3.md [doctest] Fixes (#35863) 2025-01-26 15:26:38 -08:00
phimoe.md
phobert.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
pix2struct.md
pixtral.md Add optimized PixtralImageProcessorFast (#34836) 2024-11-28 16:04:05 +01:00
plbart.md
poolformer.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
pop2piano.md
prophetnet.md
pvt_v2.md Add PvT-v2 Model (#26812) 2024-03-13 19:05:20 +00:00
pvt.md
qdqbert.md
qwen2_5_vl.md
qwen2_audio.md
qwen2_moe.md
qwen2_vl.md
qwen2.md
rag.md
realm.md
recurrent_gemma.md
reformer.md
regnet.md
rembert.md
resnet.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
retribert.md
roberta-prelayernorm.md
roberta.md
roc_bert.md
roformer.md
rt_detr_v2.md
rt_detr.md
rwkv.md
sam.md
seamless_m4t_v2.md
seamless_m4t.md
segformer.md Decorators for deprecation and named arguments validation (#30799) 2024-06-10 12:35:10 +01:00
seggpt.md
sew-d.md
sew.md
siglip.md Refactoring of ImageProcessorFast (#35069) 2025-02-04 17:52:31 -05:00
speech_to_text_2.md
speech_to_text.md
speech-encoder-decoder.md
speecht5.md
splinter.md
squeezebert.md
stablelm.md
starcoder2.md Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878) 2024-05-20 10:06:57 +02:00
superglue.md
superpoint.md
swiftformer.md
swin.md
swin2sr.md
swinv2.md
switch_transformers.md
t5.md
t5v1.1.md
table-transformer.md
tapas.md
tapex.md
textnet.md
time_series_transformer.md
timesformer.md
timm_wrapper.md
trajectory_transformer.md
transfo-xl.md
trocr.md
tvlt.md
tvp.md
udop.md
ul2.md
umt5.md
unispeech-sat.md
unispeech.md
univnet.md
upernet.md
van.md
video_llava.md
videomae.md
vilt.md
vipllava.md
vision-encoder-decoder.md
vision-text-dual-encoder.md
visual_bert.md
vit_hybrid.md
vit_mae.md
vit_msn.md add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
vit.md
vitdet.md
vitmatte.md
vitpose.md
vits.md
vivit.md Add sdpa for Vivit (#33757) 2024-10-15 11:27:54 +02:00
wav2vec2_phoneme.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
wav2vec2-bert.md
wav2vec2-conformer.md doc: add info about wav2vec2 bert in older wav2vec2 models. (#31120) 2024-06-05 11:56:11 +01:00
wav2vec2.md
wavlm.md
whisper.md [docs] add quick usage snippet to Whisper. (#31289) 2024-08-27 14:11:52 +02:00
xclip.md
xglm.md
xlm-prophetnet.md
xlm-roberta-xl.md
xlm-roberta.md
xlm-v.md
xlm.md
xlnet.md
xls_r.md [Docs] Model_doc structure/clarity improvements (#26876) 2023-11-03 10:57:03 -04:00
xlsr_wav2vec2.md
xmod.md
yolos.md
yoso.md
zamba.md
zamba2.md
zoedepth.md