transformers/docs/source/en/model_doc/auto.mdx
Ankur Goyal 2ef7742117
Add DocumentQuestionAnswering pipeline (#18414)
* [WIP] Skeleton of VisualQuestionAnweringPipeline extended to support LayoutLM-like models

* Fixup

* Use the full encoding

* Basic refactoring to DocumentQuestionAnsweringPipeline

* Cleanup

* Improve args, docs, and implement preprocessing

* Integrate OCR

* Refactor question_answering pipeline

* Use refactored QA code in the document qa pipeline

* Fix tests

* Some small cleanups

* Use a string type annotation for Image.Image

* Update encoding with image features

* Wire through the basic docs

* Handle invalid response

* Handle empty word_boxes properly

* Docstring fix

* Integrate Donut model

* Fixup

* Incorporate comments

* Address comments

* Initial incorporation of tests

* Address Comments

* Change assert to ValueError

* Comments

* Wrap `score` in float to make it JSON serializable

* Incorporate AutoModeLForDocumentQuestionAnswering changes

* Fixup

* Rename postprocess function

* Fix auto import

* Applying comments

* Improve docs

* Remove extra assets and add copyright

* Address comments

Co-authored-by: Ankur Goyal <ankur@impira.com>
2022-09-07 13:38:49 -04:00

288 lines
6.3 KiB
Plaintext

<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Auto Classes
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the `from_pretrained()` method. AutoClasses are here to do this job for you so that you
automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.
Instantiating one of [`AutoConfig`], [`AutoModel`], and
[`AutoTokenizer`] will directly create a class of the relevant architecture. For instance
```python
model = AutoModel.from_pretrained("bert-base-cased")
```
will create a model that is an instance of [`BertModel`].
There is one class of `AutoModel` for each task, and for each backend (PyTorch, TensorFlow, or Flax).
## Extending the Auto Classes
Each of the auto classes has a method to be extended with your custom classes. For instance, if you have defined a
custom class of model `NewModel`, make sure you have a `NewModelConfig` then you can add those to the auto
classes like this:
```python
from transformers import AutoConfig, AutoModel
AutoConfig.register("new-model", NewModelConfig)
AutoModel.register(NewModelConfig, NewModel)
```
You will then be able to use the auto classes like you would usually do!
<Tip warning={true}>
If your `NewModelConfig` is a subclass of [`~transformer.PretrainedConfig`], make sure its
`model_type` attribute is set to the same key you use when registering the config (here `"new-model"`).
Likewise, if your `NewModel` is a subclass of [`PreTrainedModel`], make sure its
`config_class` attribute is set to the same class you use when registering the model (here
`NewModelConfig`).
</Tip>
## AutoConfig
[[autodoc]] AutoConfig
## AutoTokenizer
[[autodoc]] AutoTokenizer
## AutoFeatureExtractor
[[autodoc]] AutoFeatureExtractor
## AutoProcessor
[[autodoc]] AutoProcessor
## AutoModel
[[autodoc]] AutoModel
## AutoModelForPreTraining
[[autodoc]] AutoModelForPreTraining
## AutoModelForCausalLM
[[autodoc]] AutoModelForCausalLM
## AutoModelForMaskedLM
[[autodoc]] AutoModelForMaskedLM
## AutoModelForSeq2SeqLM
[[autodoc]] AutoModelForSeq2SeqLM
## AutoModelForSequenceClassification
[[autodoc]] AutoModelForSequenceClassification
## AutoModelForMultipleChoice
[[autodoc]] AutoModelForMultipleChoice
## AutoModelForNextSentencePrediction
[[autodoc]] AutoModelForNextSentencePrediction
## AutoModelForTokenClassification
[[autodoc]] AutoModelForTokenClassification
## AutoModelForQuestionAnswering
[[autodoc]] AutoModelForQuestionAnswering
## AutoModelForTableQuestionAnswering
[[autodoc]] AutoModelForTableQuestionAnswering
## AutoModelForDocumentQuestionAnswering
[[autodoc]] AutoModelForDocumentQuestionAnswering
## AutoModelForImageClassification
[[autodoc]] AutoModelForImageClassification
## AutoModelForVideoClassification
[[autodoc]] AutoModelForVideoClassification
## AutoModelForVision2Seq
[[autodoc]] AutoModelForVision2Seq
## AutoModelForVisualQuestionAnswering
[[autodoc]] AutoModelForVisualQuestionAnswering
## AutoModelForAudioClassification
[[autodoc]] AutoModelForAudioClassification
## AutoModelForAudioFrameClassification
[[autodoc]] AutoModelForAudioFrameClassification
## AutoModelForCTC
[[autodoc]] AutoModelForCTC
## AutoModelForSpeechSeq2Seq
[[autodoc]] AutoModelForSpeechSeq2Seq
## AutoModelForAudioXVector
[[autodoc]] AutoModelForAudioXVector
## AutoModelForMaskedImageModeling
[[autodoc]] AutoModelForMaskedImageModeling
## AutoModelForObjectDetection
[[autodoc]] AutoModelForObjectDetection
## AutoModelForImageSegmentation
[[autodoc]] AutoModelForImageSegmentation
## AutoModelForSemanticSegmentation
[[autodoc]] AutoModelForSemanticSegmentation
## AutoModelForInstanceSegmentation
[[autodoc]] AutoModelForInstanceSegmentation
## TFAutoModel
[[autodoc]] TFAutoModel
## TFAutoModelForPreTraining
[[autodoc]] TFAutoModelForPreTraining
## TFAutoModelForCausalLM
[[autodoc]] TFAutoModelForCausalLM
## TFAutoModelForImageClassification
[[autodoc]] TFAutoModelForImageClassification
## TFAutoModelForSemanticSegmentation
[[autodoc]] TFAutoModelForSemanticSegmentation
## TFAutoModelForMaskedLM
[[autodoc]] TFAutoModelForMaskedLM
## TFAutoModelForSeq2SeqLM
[[autodoc]] TFAutoModelForSeq2SeqLM
## TFAutoModelForSequenceClassification
[[autodoc]] TFAutoModelForSequenceClassification
## TFAutoModelForMultipleChoice
[[autodoc]] TFAutoModelForMultipleChoice
## TFAutoModelForNextSentencePrediction
[[autodoc]] TFAutoModelForNextSentencePrediction
## TFAutoModelForTableQuestionAnswering
[[autodoc]] TFAutoModelForTableQuestionAnswering
## TFAutoModelForDocumentQuestionAnswering
[[autodoc]] TFAutoModelForDocumentQuestionAnswering
## TFAutoModelForTokenClassification
[[autodoc]] TFAutoModelForTokenClassification
## TFAutoModelForQuestionAnswering
[[autodoc]] TFAutoModelForQuestionAnswering
## TFAutoModelForVision2Seq
[[autodoc]] TFAutoModelForVision2Seq
## TFAutoModelForSpeechSeq2Seq
[[autodoc]] TFAutoModelForSpeechSeq2Seq
## FlaxAutoModel
[[autodoc]] FlaxAutoModel
## FlaxAutoModelForCausalLM
[[autodoc]] FlaxAutoModelForCausalLM
## FlaxAutoModelForPreTraining
[[autodoc]] FlaxAutoModelForPreTraining
## FlaxAutoModelForMaskedLM
[[autodoc]] FlaxAutoModelForMaskedLM
## FlaxAutoModelForSeq2SeqLM
[[autodoc]] FlaxAutoModelForSeq2SeqLM
## FlaxAutoModelForSequenceClassification
[[autodoc]] FlaxAutoModelForSequenceClassification
## FlaxAutoModelForQuestionAnswering
[[autodoc]] FlaxAutoModelForQuestionAnswering
## FlaxAutoModelForTokenClassification
[[autodoc]] FlaxAutoModelForTokenClassification
## FlaxAutoModelForMultipleChoice
[[autodoc]] FlaxAutoModelForMultipleChoice
## FlaxAutoModelForNextSentencePrediction
[[autodoc]] FlaxAutoModelForNextSentencePrediction
## FlaxAutoModelForImageClassification
[[autodoc]] FlaxAutoModelForImageClassification
## FlaxAutoModelForVision2Seq
[[autodoc]] FlaxAutoModelForVision2Seq