mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 21:00:08 +06:00

Refactoring of ImageProcessorFast (#35069 )

* add init and base image processing functions

* add add_fast_image_processor to transformers-cli

* add working fast image processor clip

* add fast image processor to doc, working tests

* remove "to be implemented" SigLip

* fix unprotected import

* fix unprotected vision import

* update ViTImageProcessorFast

* increase threshold slow fast ewuivalence

* add fast img blip

* add fast class in tests with cli

* improve cli

* add fast image processor convnext

* add LlavaPatchingMixin and fast image processor for llava_next and llava_onevision

* add device kwarg to ImagesKwargs for fast processing on cuda

* cleanup

* fix unprotected import

* group images by sizes and add batch processing

* Add batch equivalence tests, skip when center_crop is used

* cleanup

* update init and cli

* fix-copies

* refactor convnext, cleanup base

* fix

* remove patching mixins, add piped torchvision transforms for ViT

* fix unbatched processing

* fix f strings

* protect imports

* change llava onevision to class transforms (test)

* fix convnext

* improve formatting (following Pavel review)

* fix handling device arg

* improve cli

* fix

* fix inits

* Add distinction between preprocess and _preprocess, and support for arbitrary kwargs through valid_extra_kwargs

* uniformize qwen2_vl fast

* fix docstrings

* add add fast image processor llava

* remove min_pixels max_pixels from accepted size

* nit

* nit

* refactor fast image processors docstrings

* cleanup and remove fast class transforms

* update add fast image processor transformers cli

* cleanup docstring

* uniformize pixtral fast and  make _process_image explicit

* fix prepare image structure llava next/onevision

* Use typed kwargs instead of explicit args

* nit fix import Unpack

* clearly separate pops and gets in base preprocess. Use explicit typed kwargs

* make qwen2_vl preprocess arguments hashable

2025-02-04 17:52:31 -05:00

5.3 KiB

Raw Blame History

ConvNeXT

Overview

ConvNeXT モデルは、A ConvNet for the 2020s で Zhuang Liu、Hanzi Mao、Chao-Yuan Wu、Christoph Feichtenhofer、Trevor Darrell、Saining Xie によって提案されました。 ConvNeXT は、ビジョントランスフォーマーの設計からインスピレーションを得た純粋な畳み込みモデル (ConvNet) であり、ビジョントランスフォーマーよりも優れたパフォーマンスを発揮すると主張しています。

論文の要約は次のとおりです。

視覚認識の「狂騒の 20 年代」は、最先端の画像分類モデルとして ConvNet にすぐに取って代わられた Vision Transformers (ViT) の導入から始まりました。一方、バニラ ViT は、オブジェクト検出やセマンティックセグメンテーションなどの一般的なコンピュータービジョンタスクに適用すると困難に直面します。階層型トランスフォーマーです (Swin Transformers など) は、いくつかの ConvNet の以前の機能を再導入し、Transformers を汎用ビジョンバックボーンとして実用的に可能にし、幅広い環境で顕著なパフォーマンスを実証しました。さまざまな視覚タスク。ただし、このようなハイブリッドアプローチの有効性は、依然として、固有の誘導性ではなく、トランスフォーマーの本質的な優位性によるところが大きいと考えられています。畳み込みのバイアス。この作業では、設計空間を再検討し、純粋な ConvNet が達成できる限界をテストします。標準 ResNet を設計に向けて徐々に「最新化」します。ビジョン Transformer の概要を確認し、途中でパフォーマンスの違いに寄与するいくつかの重要なコンポーネントを発見します。この調査の結果は、純粋な ConvNet モデルのファミリーです。 ConvNextと呼ばれます。 ConvNeXts は完全に標準の ConvNet モジュールから構築されており、精度と拡張性の点で Transformers と有利に競合し、87.8% の ImageNet トップ 1 精度を達成しています。標準 ConvNet のシンプルさと効率を維持しながら、COCO 検出と ADE20K セグメンテーションでは Swin Transformers よりも優れたパフォーマンスを発揮します。

ConvNeXT アーキテクチャ。元の論文から抜粋。

このモデルは、nielsr によって提供されました。 TensorFlow バージョンのモデルは ariG23498 によって提供されました。 gante、および sayakpaul (同等の貢献)。元のコードはこちらにあります。

Resources

ConvNeXT の使用を開始するのに役立つ公式 Hugging Face およびコミュニティ (🌎 で示される) リソースのリスト。

[ConvNextForImageClassification] は、このサンプルスクリプトおよびノートブック。
参照: 画像分類タスクガイド

ここに含めるリソースの送信に興味がある場合は、お気軽にプルリクエストを開いてください。審査させていただきます。リソースは、既存のリソースを複製するのではなく、何か新しいものを示すことが理想的です。

ConvNextConfig

autodoc ConvNextConfig

ConvNextFeatureExtractor

autodoc ConvNextFeatureExtractor

ConvNextImageProcessor

autodoc ConvNextImageProcessor - preprocess

ConvNextImageProcessorFast

autodoc ConvNextImageProcessorFast - preprocess

ConvNextModel

autodoc ConvNextModel - forward

ConvNextForImageClassification

autodoc ConvNextForImageClassification - forward

TFConvNextModel

autodoc TFConvNextModel - call

TFConvNextForImageClassification

autodoc TFConvNextForImageClassification - call

5.3 KiB Raw Blame History