transformers/docs/source/en/model_doc/upernet.mdx
NielsRogge 4ed89d48ab
Add UperNet (#20648)
* First draft

* More improvements

* Add convnext backbone

* Add conversion script

* Add more improvements

* Comment out to_dict

* Add to_dict method

* Add default config

* Fix config

* Fix backbone

* Fix backbone some more

* Add docs, auto mapping, tests

* Fix some tests

* Fix more tests

* Fix more tests

* Add conversion script

* Improve conversion script

* Add support for getting reshaped undownsampled hidden states

* Fix forward pass

* Add print statements

* Comment out set_shift_and_window_size

* More improvements

* Correct downsampling layers conversion

* Fix style

* First draft

* Fix conversion script

* Remove config attribute

* Fix more tests

* Update READMEs

* Update ConvNextBackbone

* Fix ConvNext tests

* Align ConvNext with Swin

* Remove files

* Fix index

* Improve docs

* Add output_attentions to model forward

* Add backbone mixin, improve tests

* More improvements

* Update init_weights

* Fix interpolation of logits

* Add UperNetImageProcessor

* Improve image processor

* Fix image processor

* Remove print statements

* Remove script

* Update import

* Add image processor tests

* Remove print statements

* Fix test

* Add integration test

* Add convnext integration test

* Update docstring

* Fix README

* Simplify config

* Apply suggestions

* Improve docs

* Rename class

* Fix test_initialization

* Fix import

* Address review

* Fix confg

* Convert all checkpoints

* Fix default backbone

* Usage same processor as segformer

* Apply suggestions

* Fix init_weights, update conversion scripts

* Improve config

* Use Auto API instead of creating a new image processor

* Fix docs

* Add doctests

* Remove ResNetConfig dependency

* Add always_partition argument

* Fix rebaseé

* Improve docs

* Convert checkpoints

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2023-01-16 09:39:13 +01:00

65 lines
3.2 KiB
Plaintext

<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UPerNet
## Overview
The UPerNet model was proposed in [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221)
by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun. UPerNet is a general framework to effectively segment
a wide range of concepts from images, leveraging any vision backbone like [ConvNeXt](convnext) or [Swin](swin).
The abstract from the paper is the following:
*Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes.*
<img src="https://huggingface.co/datasets/huggingface/documentation-images/blob/main/transformers/model_doc/upernet_architecture.jpg"
alt="drawing" width="600"/>
<small> UPerNet framework. Taken from the <a href="https://arxiv.org/abs/1807.10221">original paper</a>. </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code is based on OpenMMLab's mmsegmentation [here](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/decode_heads/uper_head.py).
## Usage
UPerNet is a general framework for semantic segmentation. It can be used with any vision backbone, like so:
```py
from transformers import SwinConfig, UperNetConfig, UperNetForSemanticSegmentation
backbone_config = SwinConfig(out_features=["stage1", "stage2", "stage3", "stage4"])
config = UperNetConfig(backbone_config=backbone_config)
model = UperNetForSemanticSegmentation(config)
```
To use another vision backbone, like [ConvNeXt](convnext), simply instantiate the model with the appropriate backbone:
```py
from transformers import ConvNextConfig, UperNetConfig, UperNetForSemanticSegmentation
backbone_config = ConvNextConfig(out_features=["stage1", "stage2", "stage3", "stage4"])
config = UperNetConfig(backbone_config=backbone_config)
model = UperNetForSemanticSegmentation(config)
```
Note that this will randomly initialize all the weights of the model.
## UperNetConfig
[[autodoc]] UperNetConfig
## UperNetForSemanticSegmentation
[[autodoc]] UperNetForSemanticSegmentation
- forward