transformers/docs/source/model_doc/funnel.mdx
Lysandre Debut ec3567fe20
Convert model files from rst to mdx (#14865)
* First pass

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-22 03:27:30 -05:00

154 lines
4.9 KiB
Plaintext

<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Funnel Transformer
## Overview
The Funnel Transformer model was proposed in the paper [Funnel-Transformer: Filtering out Sequential Redundancy for
Efficient Language Processing](https://arxiv.org/abs/2006.03236). It is a bidirectional transformer model, like
BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks
(CNN) in computer vision.
The abstract from the paper is the following:
*With the success of language pretraining, it is highly desirable to develop more efficient architectures of good
scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the
much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only
require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which
gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More
importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further
improve the model capacity. In addition, to perform token-level predictions as required by common pretraining
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence
via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on
a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading
comprehension.*
Tips:
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.
The base model therefore has a final sequence length that is a quarter of the original one. This model can be used
directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other
tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same
sequence length as the input.
- The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should be
used for [`FunnelModel`], [`FunnelForPreTraining`],
[`FunnelForMaskedLM`], [`FunnelForTokenClassification`] and
class:*~transformers.FunnelForQuestionAnswering*. The second ones should be used for
[`FunnelBaseModel`], [`FunnelForSequenceClassification`] and
[`FunnelForMultipleChoice`].
This model was contributed by [sgugger](https://huggingface.co/sgugger). The original code can be found [here](https://github.com/laiguokun/Funnel-Transformer).
## FunnelConfig
[[autodoc]] FunnelConfig
## FunnelTokenizer
[[autodoc]] FunnelTokenizer
- build_inputs_with_special_tokens
- get_special_tokens_mask
- create_token_type_ids_from_sequences
- save_vocabulary
## FunnelTokenizerFast
[[autodoc]] FunnelTokenizerFast
## Funnel specific outputs
[[autodoc]] models.funnel.modeling_funnel.FunnelForPreTrainingOutput
[[autodoc]] models.funnel.modeling_tf_funnel.TFFunnelForPreTrainingOutput
## FunnelBaseModel
[[autodoc]] FunnelBaseModel
- forward
## FunnelModel
[[autodoc]] FunnelModel
- forward
## FunnelModelForPreTraining
[[autodoc]] FunnelForPreTraining
- forward
## FunnelForMaskedLM
[[autodoc]] FunnelForMaskedLM
- forward
## FunnelForSequenceClassification
[[autodoc]] FunnelForSequenceClassification
- forward
## FunnelForMultipleChoice
[[autodoc]] FunnelForMultipleChoice
- forward
## FunnelForTokenClassification
[[autodoc]] FunnelForTokenClassification
- forward
## FunnelForQuestionAnswering
[[autodoc]] FunnelForQuestionAnswering
- forward
## TFFunnelBaseModel
[[autodoc]] TFFunnelBaseModel
- call
## TFFunnelModel
[[autodoc]] TFFunnelModel
- call
## TFFunnelModelForPreTraining
[[autodoc]] TFFunnelForPreTraining
- call
## TFFunnelForMaskedLM
[[autodoc]] TFFunnelForMaskedLM
- call
## TFFunnelForSequenceClassification
[[autodoc]] TFFunnelForSequenceClassification
- call
## TFFunnelForMultipleChoice
[[autodoc]] TFFunnelForMultipleChoice
- call
## TFFunnelForTokenClassification
[[autodoc]] TFFunnelForTokenClassification
- call
## TFFunnelForQuestionAnswering
[[autodoc]] TFFunnelForQuestionAnswering
- call