transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-21 21:49:06 +06:00

History

Armaghan Shakir 9a6be63fdb Add Apple's Depth-Pro for depth estimation (#34583 ) * implement config and model building blocks * refactor model architechture * update model outputs * update init param to include use_fov_model * update param name in config * fix hidden_states and attentions outputs for fov * sort config * complete minor todos * update patching * update config for encoder * fix config * use correct defaults in config * update merge for compatibility with different image size * restructure encoder for custom configuration * make fov model compatible with custom config * replace word "decoder" with "fusion" * weight conversion script * fix fov squeeze * update conversion script (without test) * upload ruff image processing * create fast image processing * use torch interpolation for image processing * complete post_process_depth_estimation * config: fix imports and sort args * apply inference in weight conversion * use mllama script instead for weight conversion * clean weight conversion script * add depth-pro status in other files * fill docstring in config * formatting * more formatting * formatting with ruff * formatting with style * fix copied classes * add examples; update weight convert script * fix using check_table.py and isort * fix config docstring * add depth pro to sdpa docs * undo unintentional changes in configuration_gemma.py * minor fixes * test image processing * fixes and tests * more fixes * use output states from image_encoder instead * Revert "use output states from image_encoder instead" This reverts commit `2408ec54e4`. * make embeddings dynamic * reshape output hidden states and attentions as part of computation graph * fix ruff formating * fix docstring failure * use num_fov_head_layers in tests * update doc * check consistency with config * ruff formatting * update test case * fix ruff formatting * add tests for fov * use interpolation in postprocess * run and fix slow tests locally * use scaled_images_features for image and fov encoder * return fused_hidden_states in fusion stage * fix example * fix ruff * fix copyright license for all files * add __all__ for each file * minor fixes - fix download spell - add push_to_hub option - fix Optional type hinting - apply single loop for DepthProImageProcessor.preprocess * return list in post_process_depth_estimation * minor fixes - capitalize start of docstring - use ignore copy - fix examples - move docstring templates and custom output classes to top - remove "-> None" typehinting from __init__ - type hinting for forward passes - fix docstrings for custom output classes * fix "ruff check" * update upsample and projection * major changes: (image size and merge optimization) - add support for images of any size - optimize merge operation - remove image_size from config - use full names instead of B, C, H, W - remove interpolation from fusion stage - add interpolation after merge - move validations to config - update integration test - add type hints for functions * fix push_to_hub option in weights conversion * remove image_size in weights conversion * major changes in the architecture - remove all DepthProViT modules and support different backbones using the AutoModel API - set default use_fov_model to False - validate parameters in configuration - update interpolate function: use "nearest" for faster computation - update reshape_feature function: remove all special tokens, possible from different backbones - update merge function: use padding from config instead of merge_out_size - remove patch_to_batch and batch_to_patch conversions for now - calculate out_size dynamically in the encoder - leave head_mask calculation to the backbone - fix bugs with merge - add more comments - update tests * placeholder for unused config attributes * improve docs amid review * minor change in docs * further optimize merge * fix formatting * remove unused patch/batch convertion functions * use original F.interpolate * improve function naming * minor chages - use torch_int instead of int - use proper for newly initialized tensors - use user provided return_dict for patch_encoder - use if-else block instead in self.use_fov_model * rearchitect upsample block for improved modularity * update upsample keys in weight conversion * improve padding in merge_patches * use double-loop for merge * update comments * create feature_extractor, reduce some forward code * introduce config.use_mask_token in dinov2 * minor fixes * minor fixes for onnx * update __init__ to latest format * remove DepthProConfig.to_dict() * major changes in backbone * update config in weight conversion * formatting * converted model is fp32 * improve naming and docs for feature_extractor->reconstruct_feature_maps * minor fixes; amid review * create intermediate vars in func call * use torch.testing.assert_close * use ModuleList instead of Sequential and ModuleDict * update docs * include fov in integraiton tests * update docs * improve initialization of convolution layers * fix unused fov keys * update tests * ruff format * fix test, amid kaimming initialization * add depthpro to toctree * add residual layer to _no_split_modules * architecture rework * Update src/transformers/models/depth_pro/image_processing_depth_pro.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/depth_pro/image_processing_depth_pro_fast.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * update docs * improve merge_patches * use flatten with fov_output * ruff formatting * update resources section in docs Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix typo "final_kernal_size" Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix output typehint for DepthProDepthEstimator Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * residual operation in 2 steps Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * use image_size instead of global patch_size in interpolation * replace all Sequential with ModuleList * update fov * update heads * fix and update conversion script for heads * ruff formatting * remove float32 conversion * use "Fov" instead of "FOV" in class names * use "Fov" instead of "FOV" in config docs * remove prune_heads * update fusion stage * use device in examples * update processor * ruff fixes * add do_rescale in image_processor_dict * skip test: test_fast_is_faster_than_slow * ruff formatting * DepthProImageProcessorFast in other files * revert antialias removal * add antialias in BaseImageProcessorFast * Revert "revert antialias removal" This reverts commit `5caa0bd8f9`. * Revert "add antialias in BaseImageProcessorFast" This reverts commit `3ae1134780`. * update processor for grouping and antialias * try test_fast_is_faster_than_slow without "skip" or "flanky" * update checkpoint * update checkpoint * use @is_flanky for processor test * update checkpoint to "apple/DepthPro-hf" --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>		2025-02-10 11:32:45 +00:00
..
albert.md
align.md
altclip.md
aria.md
audio-spectrogram-transformer.md	add sdpa to ViT [follow up of #29325 ] (#30555 )	2024-05-16 10:56:11 +01:00
auto.md
autoformer.md
bamba.md
bark.md
bart.md
barthez.md
bartpho.md
beit.md
bert-generation.md
bert-japanese.md
bert.md
bertweet.md
big_bird.md
bigbird_pegasus.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
biogpt.md
bit.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
blenderbot-small.md
blenderbot.md
blip-2.md
blip.md
bloom.md
bort.md
bridgetower.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
bros.md
byt5.md
camembert.md
canine.md
chameleon.md
chinese_clip.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
clap.md
clip.md
clipseg.md
clvp.md
code_llama.md	Fixed Majority of the Typos in `transformers[en]` Documentation (#33350 )	2024-09-09 10:47:24 +02:00
codegen.md
cohere.md
cohere2.md
colpali.md
conditional_detr.md	Add examples for detection models finetuning (#30422 )	2024-05-08 11:42:07 +01:00
convbert.md
convnext.md
convnextv2.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
cpm.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
cpmant.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
ctrl.md
cvt.md
dab-detr.md
dac.md	Add Descript-Audio-Codec model (#31494 )	2024-08-19 10:21:51 +01:00
data2vec.md
dbrx.md
deberta-v2.md
deberta.md
decision_transformer.md
deformable_detr.md	Add Image Processor Fast Deformable DETR (#34353 )	2024-11-19 11:18:58 -05:00
deit.md
deplot.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
depth_anything_v2.md
depth_anything.md
depth_pro.md
deta.md
detr.md
dialogpt.md
diffllama.md
dinat.md
dinov2_with_registers.md	Add DINOv2 with registers (#35348 )	2024-12-24 13:21:59 +01:00
dinov2.md
distilbert.md
dit.md
donut.md
dpr.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
dpt.md	[DPT, Dinov2] Add resources (#27655 )	2023-11-23 17:44:08 +00:00
efficientformer.md	Deprecate low use models (#30781 )	2024-05-28 18:07:07 +01:00
efficientnet.md
electra.md
emu3.md
encodec.md
encoder-decoder.md
ernie_m.md
ernie.md
esm.md
falcon_mamba.md	Fixed Majority of the Typos in `transformers[en]` Documentation (#33350 )	2024-09-09 10:47:24 +02:00
falcon.md
falcon3.md
fastspeech2_conformer.md	Super tiny fix 12 typos about "with with" (#29926 )	2024-03-29 14:31:31 +00:00
flan-t5.md
flan-ul2.md
flaubert.md
flava.md
fnet.md
focalnet.md
fsmt.md
funnel.md
fuyu.md
gemma.md
gemma2.md
git.md
glm.md
glpn.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
got_ocr2.md
gpt_bigcode.md
gpt_neo.md
gpt_neox_japanese.md
gpt_neox.md
gpt-sw3.md
gpt2.md
gptj.md
gptsan-japanese.md
granite.md
granitemoe.md
granitevision.md
graphormer.md
grounding-dino.md
groupvit.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
helium.md
herbert.md
hiera.md
hubert.md
ibert.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
idefics.md
idefics2.md
idefics3.md
ijepa.md
imagegpt.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
informer.md
instructblip.md
instructblipvideo.md
jamba.md
jetmoe.md	Add JetMoE model (#30005 )	2024-05-14 16:32:01 +02:00
jukebox.md	Deprecate low use models (#30781 )	2024-05-28 18:07:07 +01:00
kosmos-2.md
layoutlm.md
layoutlmv2.md
layoutlmv3.md
layoutxlm.md
led.md
levit.md
lilt.md
llama.md
llama2.md
llama3.md
llava_next_video.md
llava_next.md
llava_onevision.md
llava.md
longformer.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
longt5.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
luke.md
lxmert.md
m2m_100.md
madlad-400.md	Add madlad-400 MT models (#27471 )	2023-11-28 13:19:50 +00:00
mamba.md
mamba2.md	quickfix documentation (#32566 )	2024-08-26 17:49:44 +02:00
marian.md
markuplm.md	[Docs] Fix spelling and grammar mistakes (#28825 )	2024-02-02 08:45:00 +01:00
mask2former.md
maskformer.md
matcha.md
mbart.md
mctct.md
mega.md	Deprecate low use models (#30781 )	2024-05-28 18:07:07 +01:00
megatron_gpt2.md
megatron-bert.md
mgp-str.md	[Docs] Fix broken links and syntax issues (#28918 )	2024-02-08 14:13:35 -08:00
mimi.md
mistral.md
mixtral.md
mllama.md	Mllama: update docs (#34334 )	2024-10-30 10:11:50 +01:00
mluke.md
mms.md
mobilebert.md	[docs] fixed links with 404 (#27327 )	2023-11-06 19:45:03 +00:00
mobilenet_v1.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
mobilenet_v2.md
mobilevit.md
mobilevitv2.md
modernbert.md
moonshine.md
moshi.md
mpnet.md
mpt.md
mra.md
mt5.md	Adding [T5/MT5/UMT5]ForTokenClassification (#28443 )	2024-02-01 03:53:49 +01:00
musicgen_melody.md
musicgen.md
mvp.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
myt5.md
nat.md
nemotron.md
nezha.md
nllb-moe.md
nllb.md
nougat.md
nystromformer.md
olmo.md
olmo2.md
olmoe.md
omdet-turbo.md
oneformer.md
open-llama.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
openai-gpt.md
opt.md
owlv2.md
owlvit.md
paligemma.md
patchtsmixer.md	[Docs] Add resources (#28705 )	2024-02-19 15:22:29 +01:00
patchtst.md	[Docs] Add resources (#28705 )	2024-02-19 15:22:29 +01:00
pegasus_x.md
pegasus.md
perceiver.md
persimmon.md
phi.md
phi3.md	[doctest] Fixes (#35863 )	2025-01-26 15:26:38 -08:00
phimoe.md
phobert.md	Fixed Majority of the Typos in `transformers[en]` Documentation (#33350 )	2024-09-09 10:47:24 +02:00
pix2struct.md
pixtral.md	Add optimized `PixtralImageProcessorFast` (#34836 )	2024-11-28 16:04:05 +01:00
plbart.md
poolformer.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
pop2piano.md
prophetnet.md
pvt_v2.md	Add PvT-v2 Model (#26812 )	2024-03-13 19:05:20 +00:00
pvt.md
qdqbert.md
qwen2_5_vl.md
qwen2_audio.md
qwen2_moe.md
qwen2_vl.md
qwen2.md
rag.md
realm.md
recurrent_gemma.md
reformer.md
regnet.md
rembert.md
resnet.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
retribert.md
roberta-prelayernorm.md
roberta.md
roc_bert.md
roformer.md
rt_detr_v2.md
rt_detr.md
rwkv.md
sam.md
seamless_m4t_v2.md
seamless_m4t.md
segformer.md	Decorators for deprecation and named arguments validation (#30799 )	2024-06-10 12:35:10 +01:00
seggpt.md
sew-d.md
sew.md
siglip.md	Refactoring of ImageProcessorFast (#35069 )	2025-02-04 17:52:31 -05:00
speech_to_text_2.md
speech_to_text.md
speech-encoder-decoder.md
speecht5.md
splinter.md
squeezebert.md
stablelm.md
starcoder2.md	Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878 )	2024-05-20 10:06:57 +02:00
superglue.md
superpoint.md
swiftformer.md
swin.md
swin2sr.md
swinv2.md
switch_transformers.md
t5.md
t5v1.1.md
table-transformer.md
tapas.md
tapex.md
textnet.md
time_series_transformer.md
timesformer.md
timm_wrapper.md
trajectory_transformer.md
transfo-xl.md
trocr.md
tvlt.md
tvp.md
udop.md
ul2.md
umt5.md
unispeech-sat.md
unispeech.md
univnet.md
upernet.md
van.md
video_llava.md
videomae.md
vilt.md
vipllava.md
vision-encoder-decoder.md
vision-text-dual-encoder.md
visual_bert.md
vit_hybrid.md
vit_mae.md
vit_msn.md	add sdpa to ViT [follow up of #29325 ] (#30555 )	2024-05-16 10:56:11 +01:00
vit.md
vitdet.md
vitmatte.md
vitpose.md
vits.md
vivit.md	Add sdpa for Vivit (#33757 )	2024-10-15 11:27:54 +02:00
wav2vec2_phoneme.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
wav2vec2-bert.md
wav2vec2-conformer.md	doc: add info about wav2vec2 bert in older wav2vec2 models. (#31120 )	2024-06-05 11:56:11 +01:00
wav2vec2.md
wavlm.md
whisper.md	[docs] add quick usage snippet to Whisper. (#31289 )	2024-08-27 14:11:52 +02:00
xclip.md
xglm.md
xlm-prophetnet.md
xlm-roberta-xl.md
xlm-roberta.md
xlm-v.md
xlm.md
xlnet.md
xls_r.md	[Docs] Model_doc structure/clarity improvements (#26876 )	2023-11-03 10:57:03 -04:00
xlsr_wav2vec2.md
xmod.md
yolos.md
yoso.md
zamba.md
zamba2.md
zoedepth.md