mirror of
https://github.com/huggingface/transformers.git
synced 2025-08-03 03:31:05 +06:00
Convert model files from rst to mdx (#14865)
* First pass * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
parent
d0422de563
commit
ec3567fe20
170
docs/source/model_doc/albert.mdx
Normal file
170
docs/source/model_doc/albert.mdx
Normal file
@ -0,0 +1,170 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ALBERT
|
||||
|
||||
## Overview
|
||||
|
||||
The ALBERT model was proposed in [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942) by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
|
||||
Radu Soricut. It presents two parameter-reduction techniques to lower memory consumption and increase the training
|
||||
speed of BERT:
|
||||
|
||||
- Splitting the embedding matrix into two smaller matrices.
|
||||
- Using repeating layers split among groups.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Increasing model size when pretraining natural language representations often results in improved performance on
|
||||
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
|
||||
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
|
||||
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
|
||||
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
|
||||
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks
|
||||
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and
|
||||
SQuAD benchmarks while having fewer parameters compared to BERT-large.*
|
||||
|
||||
Tips:
|
||||
|
||||
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
|
||||
than the left.
|
||||
- ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains
|
||||
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
|
||||
number of (repeating) layers.
|
||||
|
||||
This model was contributed by [lysandre](https://huggingface.co/lysandre). This model jax version was contributed by
|
||||
[kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/google-research/ALBERT).
|
||||
|
||||
## AlbertConfig
|
||||
|
||||
[[autodoc]] AlbertConfig
|
||||
|
||||
## AlbertTokenizer
|
||||
|
||||
[[autodoc]] AlbertTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## AlbertTokenizerFast
|
||||
|
||||
[[autodoc]] AlbertTokenizerFast
|
||||
|
||||
## Albert specific outputs
|
||||
|
||||
[[autodoc]] models.albert.modeling_albert.AlbertForPreTrainingOutput
|
||||
|
||||
[[autodoc]] models.albert.modeling_tf_albert.TFAlbertForPreTrainingOutput
|
||||
|
||||
## AlbertModel
|
||||
|
||||
[[autodoc]] AlbertModel
|
||||
- forward
|
||||
|
||||
## AlbertForPreTraining
|
||||
|
||||
[[autodoc]] AlbertForPreTraining
|
||||
- forward
|
||||
|
||||
## AlbertForMaskedLM
|
||||
|
||||
[[autodoc]] AlbertForMaskedLM
|
||||
- forward
|
||||
|
||||
## AlbertForSequenceClassification
|
||||
|
||||
[[autodoc]] AlbertForSequenceClassification
|
||||
- forward
|
||||
|
||||
## AlbertForMultipleChoice
|
||||
|
||||
[[autodoc]] AlbertForMultipleChoice
|
||||
|
||||
## AlbertForTokenClassification
|
||||
|
||||
[[autodoc]] AlbertForTokenClassification
|
||||
- forward
|
||||
|
||||
## AlbertForQuestionAnswering
|
||||
|
||||
[[autodoc]] AlbertForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFAlbertModel
|
||||
|
||||
[[autodoc]] TFAlbertModel
|
||||
- call
|
||||
|
||||
## TFAlbertForPreTraining
|
||||
|
||||
[[autodoc]] TFAlbertForPreTraining
|
||||
- call
|
||||
|
||||
## TFAlbertForMaskedLM
|
||||
|
||||
[[autodoc]] TFAlbertForMaskedLM
|
||||
- call
|
||||
|
||||
## TFAlbertForSequenceClassification
|
||||
|
||||
[[autodoc]] TFAlbertForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFAlbertForMultipleChoice
|
||||
|
||||
[[autodoc]] TFAlbertForMultipleChoice
|
||||
- call
|
||||
|
||||
## TFAlbertForTokenClassification
|
||||
|
||||
[[autodoc]] TFAlbertForTokenClassification
|
||||
- call
|
||||
|
||||
## TFAlbertForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFAlbertForQuestionAnswering
|
||||
- call
|
||||
|
||||
## FlaxAlbertModel
|
||||
|
||||
[[autodoc]] FlaxAlbertModel
|
||||
- __call__
|
||||
|
||||
## FlaxAlbertForPreTraining
|
||||
|
||||
[[autodoc]] FlaxAlbertForPreTraining
|
||||
- __call__
|
||||
|
||||
## FlaxAlbertForMaskedLM
|
||||
|
||||
[[autodoc]] FlaxAlbertForMaskedLM
|
||||
- __call__
|
||||
|
||||
## FlaxAlbertForSequenceClassification
|
||||
|
||||
[[autodoc]] FlaxAlbertForSequenceClassification
|
||||
- __call__
|
||||
|
||||
## FlaxAlbertForMultipleChoice
|
||||
|
||||
[[autodoc]] FlaxAlbertForMultipleChoice
|
||||
- __call__
|
||||
|
||||
## FlaxAlbertForTokenClassification
|
||||
|
||||
[[autodoc]] FlaxAlbertForTokenClassification
|
||||
- __call__
|
||||
|
||||
## FlaxAlbertForQuestionAnswering
|
||||
|
||||
[[autodoc]] FlaxAlbertForQuestionAnswering
|
||||
- __call__
|
@ -1,226 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
ALBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
|
||||
<https://arxiv.org/abs/1909.11942>`__ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
|
||||
Radu Soricut. It presents two parameter-reduction techniques to lower memory consumption and increase the training
|
||||
speed of BERT:
|
||||
|
||||
- Splitting the embedding matrix into two smaller matrices.
|
||||
- Using repeating layers split among groups.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Increasing model size when pretraining natural language representations often results in improved performance on
|
||||
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
|
||||
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
|
||||
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
|
||||
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
|
||||
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks
|
||||
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and
|
||||
SQuAD benchmarks while having fewer parameters compared to BERT-large.*
|
||||
|
||||
Tips:
|
||||
|
||||
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
|
||||
than the left.
|
||||
- ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains
|
||||
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
|
||||
number of (repeating) layers.
|
||||
|
||||
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. This model jax version was contributed by
|
||||
`kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found `here
|
||||
<https://github.com/google-research/ALBERT>`__.
|
||||
|
||||
AlbertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertConfig
|
||||
:members:
|
||||
|
||||
|
||||
AlbertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
AlbertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
Albert specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.albert.modeling_tf_albert.TFAlbertForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
|
||||
AlbertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
AlbertForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertForPreTraining
|
||||
:members: forward
|
||||
|
||||
|
||||
AlbertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
AlbertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
AlbertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertForMultipleChoice
|
||||
:members:
|
||||
|
||||
|
||||
AlbertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
AlbertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AlbertForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFAlbertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAlbertModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFAlbertForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAlbertForPreTraining
|
||||
:members: call
|
||||
|
||||
|
||||
TFAlbertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAlbertForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFAlbertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAlbertForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFAlbertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAlbertForMultipleChoice
|
||||
:members: call
|
||||
|
||||
|
||||
TFAlbertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAlbertForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFAlbertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAlbertForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxAlbertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForPreTraining
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForMaskedLM
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForSequenceClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForMultipleChoice
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForTokenClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForQuestionAnswering
|
||||
:members: __call__
|
151
docs/source/model_doc/bart.mdx
Normal file
151
docs/source/model_doc/bart.mdx
Normal file
@ -0,0 +1,151 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BART
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title) and assign
|
||||
@patrickvonplaten
|
||||
|
||||
## Overview
|
||||
|
||||
The Bart model was proposed in [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation,
|
||||
Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
|
||||
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
|
||||
|
||||
According to the abstract,
|
||||
|
||||
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a
|
||||
left-to-right decoder (like GPT).
|
||||
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme,
|
||||
where spans of text are replaced with a single mask token.
|
||||
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It
|
||||
matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new
|
||||
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
||||
of up to 6 ROUGE.
|
||||
|
||||
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).
|
||||
|
||||
|
||||
### Examples
|
||||
|
||||
- Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
|
||||
[examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization/README.md).
|
||||
- An example of how to train [`BartForConditionalGeneration`] with a Hugging Face `datasets`
|
||||
object can be found in this [forum discussion](https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904).
|
||||
- [Distilled checkpoints](https://huggingface.co/models?search=distilbart) are described in this [paper](https://arxiv.org/abs/2010.13002).
|
||||
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- Bart doesn't use `token_type_ids` for sequence classification. Use [`BartTokenizer`] or
|
||||
[`~BartTokenizer.encode`] to get the proper splitting.
|
||||
- The forward pass of [`BartModel`] will create the `decoder_input_ids` if they are not passed.
|
||||
This is different than some other modeling APIs. A typical use case of this feature is mask filling.
|
||||
- Model predictions are intended to be identical to the original implementation when
|
||||
`force_bos_token_to_be_generated=True`. This only works, however, if the string you pass to
|
||||
[`fairseq.encode`] starts with a space.
|
||||
- [`~generation_utils.GenerationMixin.generate`] should be used for conditional generation tasks like
|
||||
summarization, see the example in that docstrings.
|
||||
- Models that load the *facebook/bart-large-cnn* weights will not have a `mask_token_id`, or be able to perform
|
||||
mask-filling tasks.
|
||||
|
||||
## Mask Filling
|
||||
|
||||
The `facebook/bart-base` and `facebook/bart-large` checkpoints can be used to fill multi-token masks.
|
||||
|
||||
```python
|
||||
from transformers import BartForConditionalGeneration, BartTokenizer
|
||||
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0)
|
||||
tok = BartTokenizer.from_pretrained("facebook/bart-large")
|
||||
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
|
||||
batch = tok(example_english_phrase, return_tensors='pt')
|
||||
generated_ids = model.generate(batch['input_ids'])
|
||||
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == ['UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria']
|
||||
```
|
||||
|
||||
## BartConfig
|
||||
|
||||
[[autodoc]] BartConfig
|
||||
- all
|
||||
|
||||
## BartTokenizer
|
||||
|
||||
[[autodoc]] BartTokenizer
|
||||
- all
|
||||
|
||||
## BartTokenizerFast
|
||||
|
||||
[[autodoc]] BartTokenizerFast
|
||||
- all
|
||||
|
||||
## BartModel
|
||||
|
||||
[[autodoc]] BartModel
|
||||
- forward
|
||||
|
||||
## BartForConditionalGeneration
|
||||
|
||||
[[autodoc]] BartForConditionalGeneration
|
||||
- forward
|
||||
|
||||
## BartForSequenceClassification
|
||||
|
||||
[[autodoc]] BartForSequenceClassification
|
||||
- forward
|
||||
|
||||
## BartForQuestionAnswering
|
||||
|
||||
[[autodoc]] BartForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## BartForCausalLM
|
||||
|
||||
[[autodoc]] BartForCausalLM
|
||||
- forward
|
||||
|
||||
## TFBartModel
|
||||
|
||||
[[autodoc]] TFBartModel
|
||||
- call
|
||||
|
||||
## TFBartForConditionalGeneration
|
||||
|
||||
[[autodoc]] TFBartForConditionalGeneration
|
||||
- call
|
||||
|
||||
## FlaxBartModel
|
||||
|
||||
[[autodoc]] FlaxBartModel
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxBartForConditionalGeneration
|
||||
|
||||
[[autodoc]] FlaxBartForConditionalGeneration
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxBartForSequenceClassification
|
||||
|
||||
[[autodoc]] FlaxBartForSequenceClassification
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxBartForQuestionAnswering
|
||||
|
||||
[[autodoc]] FlaxBartForQuestionAnswering
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
@ -1,182 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BART
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
|
||||
@patrickvonplaten
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Bart model was proposed in `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation,
|
||||
Translation, and Comprehension <https://arxiv.org/abs/1910.13461>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
|
||||
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
|
||||
|
||||
According to the abstract,
|
||||
|
||||
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a
|
||||
left-to-right decoder (like GPT).
|
||||
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme,
|
||||
where spans of text are replaced with a single mask token.
|
||||
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It
|
||||
matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new
|
||||
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
||||
of up to 6 ROUGE.
|
||||
|
||||
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
|
||||
|
||||
|
||||
Examples
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
- Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
|
||||
:prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
|
||||
- An example of how to train :class:`~transformers.BartForConditionalGeneration` with a Hugging Face :obj:`datasets`
|
||||
object can be found in this `forum discussion
|
||||
<https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904>`__.
|
||||
- `Distilled checkpoints <https://huggingface.co/models?search=distilbart>`__ are described in this `paper
|
||||
<https://arxiv.org/abs/2010.13002>`__.
|
||||
|
||||
|
||||
Implementation Notes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use :class:`~transformers.BartTokenizer` or
|
||||
:meth:`~transformers.BartTokenizer.encode` to get the proper splitting.
|
||||
- The forward pass of :class:`~transformers.BartModel` will create the ``decoder_input_ids`` if they are not passed.
|
||||
This is different than some other modeling APIs. A typical use case of this feature is mask filling.
|
||||
- Model predictions are intended to be identical to the original implementation when
|
||||
:obj:`force_bos_token_to_be_generated=True`. This only works, however, if the string you pass to
|
||||
:func:`fairseq.encode` starts with a space.
|
||||
- :meth:`~transformers.generation_utils.GenerationMixin.generate` should be used for conditional generation tasks like
|
||||
summarization, see the example in that docstrings.
|
||||
- Models that load the `facebook/bart-large-cnn` weights will not have a :obj:`mask_token_id`, or be able to perform
|
||||
mask-filling tasks.
|
||||
|
||||
Mask Filling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The :obj:`facebook/bart-base` and :obj:`facebook/bart-large` checkpoints can be used to fill multi-token masks.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import BartForConditionalGeneration, BartTokenizer
|
||||
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0)
|
||||
tok = BartTokenizer.from_pretrained("facebook/bart-large")
|
||||
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
|
||||
batch = tok(example_english_phrase, return_tensors='pt')
|
||||
generated_ids = model.generate(batch['input_ids'])
|
||||
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == ['UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria']
|
||||
|
||||
|
||||
|
||||
BartConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartConfig
|
||||
:members:
|
||||
|
||||
|
||||
BartTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
BartTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
BartModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartModel
|
||||
:members: forward
|
||||
|
||||
|
||||
BartForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartForConditionalGeneration
|
||||
:members: forward
|
||||
|
||||
|
||||
BartForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
BartForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
BartForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
TFBartModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFBartModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFBartForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFBartForConditionalGeneration
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxBartModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBartModel
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxBartForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBartForConditionalGeneration
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxBartForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBartForSequenceClassification
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxBartForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBartForQuestionAnswering
|
||||
:members: __call__, encode, decode
|
||||
|
50
docs/source/model_doc/barthez.mdx
Normal file
50
docs/source/model_doc/barthez.mdx
Normal file
@ -0,0 +1,50 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BARThez
|
||||
|
||||
## Overview
|
||||
|
||||
The BARThez model was proposed in [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
|
||||
2020.
|
||||
|
||||
The abstract of the paper:
|
||||
|
||||
|
||||
*Inductive transfer learning, enabled by self-supervised learning, have taken the entire Natural Language Processing
|
||||
(NLP) field by storm, with models such as BERT and BART setting new state of the art on countless natural language
|
||||
understanding tasks. While there are some notable exceptions, most of the available models and research have been
|
||||
conducted for the English language. In this work, we introduce BARThez, the first BART model for the French language
|
||||
(to the best of our knowledge). BARThez was pretrained on a very large monolingual French corpus from past research
|
||||
that we adapted to suit BART's perturbation schemes. Unlike already existing BERT-based French language models such as
|
||||
CamemBERT and FlauBERT, BARThez is particularly well-suited for generative tasks, since not only its encoder but also
|
||||
its decoder is pretrained. In addition to discriminative tasks from the FLUE benchmark, we evaluate BARThez on a novel
|
||||
summarization dataset, OrangeSum, that we release with this paper. We also continue the pretraining of an already
|
||||
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
|
||||
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
|
||||
|
||||
This model was contributed by [moussakam](https://huggingface.co/moussakam). The Authors' code can be found [here](https://github.com/moussaKam/BARThez).
|
||||
|
||||
|
||||
### Examples
|
||||
|
||||
- BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
|
||||
[examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization/README.md).
|
||||
|
||||
|
||||
## BarthezTokenizer
|
||||
|
||||
[[autodoc]] BarthezTokenizer
|
||||
|
||||
## BarthezTokenizerFast
|
||||
|
||||
[[autodoc]] BarthezTokenizerFast
|
@ -1,60 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BARThez
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
|
||||
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
|
||||
2020.
|
||||
|
||||
The abstract of the paper:
|
||||
|
||||
|
||||
*Inductive transfer learning, enabled by self-supervised learning, have taken the entire Natural Language Processing
|
||||
(NLP) field by storm, with models such as BERT and BART setting new state of the art on countless natural language
|
||||
understanding tasks. While there are some notable exceptions, most of the available models and research have been
|
||||
conducted for the English language. In this work, we introduce BARThez, the first BART model for the French language
|
||||
(to the best of our knowledge). BARThez was pretrained on a very large monolingual French corpus from past research
|
||||
that we adapted to suit BART's perturbation schemes. Unlike already existing BERT-based French language models such as
|
||||
CamemBERT and FlauBERT, BARThez is particularly well-suited for generative tasks, since not only its encoder but also
|
||||
its decoder is pretrained. In addition to discriminative tasks from the FLUE benchmark, we evaluate BARThez on a novel
|
||||
summarization dataset, OrangeSum, that we release with this paper. We also continue the pretraining of an already
|
||||
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
|
||||
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
|
||||
|
||||
This model was contributed by `moussakam <https://huggingface.co/moussakam>`__. The Authors' code can be found `here
|
||||
<https://github.com/moussaKam/BARThez>`__.
|
||||
|
||||
|
||||
Examples
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
- BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
|
||||
:prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
|
||||
|
||||
|
||||
BarthezTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BarthezTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
BarthezTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BarthezTokenizerFast
|
||||
:members:
|
80
docs/source/model_doc/bartpho.mdx
Normal file
80
docs/source/model_doc/bartpho.mdx
Normal file
@ -0,0 +1,80 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BARTpho
|
||||
|
||||
## Overview
|
||||
|
||||
The BARTpho model was proposed in [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present BARTpho with two versions -- BARTpho_word and BARTpho_syllable -- the first public large-scale monolingual
|
||||
sequence-to-sequence models pre-trained for Vietnamese. Our BARTpho uses the "large" architecture and pre-training
|
||||
scheme of the sequence-to-sequence denoising model BART, thus especially suitable for generative NLP tasks. Experiments
|
||||
on a downstream task of Vietnamese text summarization show that in both automatic and human evaluations, our BARTpho
|
||||
outperforms the strong baseline mBART and improves the state-of-the-art. We release BARTpho to facilitate future
|
||||
research and applications of generative Vietnamese NLP tasks.*
|
||||
|
||||
Example of use:
|
||||
|
||||
```python
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bartpho = AutoModel.from_pretrained("vinai/bartpho-syllable")
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("vinai/bartpho-syllable")
|
||||
|
||||
>>> line = "Chúng tôi là những nghiên cứu viên."
|
||||
|
||||
>>> input_ids = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... features = bartpho(**input_ids) # Models outputs are now tuples
|
||||
|
||||
>>> # With TensorFlow 2.0+:
|
||||
>>> from transformers import TFAutoModel
|
||||
>>> bartpho = TFAutoModel.from_pretrained("vinai/bartpho-syllable")
|
||||
>>> input_ids = tokenizer(line, return_tensors="tf")
|
||||
>>> features = bartpho(**input_ids)
|
||||
```
|
||||
|
||||
Tips:
|
||||
|
||||
- Following mBART, BARTpho uses the "large" architecture of BART with an additional layer-normalization layer on top of
|
||||
both the encoder and decoder. Thus, usage examples in the [documentation of BART](bart), when adapting to use
|
||||
with BARTpho, should be adjusted by replacing the BART-specialized classes with the mBART-specialized counterparts.
|
||||
For example:
|
||||
|
||||
```python
|
||||
>>> from transformers import MBartForConditionalGeneration
|
||||
>>> bartpho = MBartForConditionalGeneration.from_pretrained("vinai/bartpho-syllable")
|
||||
>>> TXT = 'Chúng tôi là <mask> nghiên cứu viên.'
|
||||
>>> input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
|
||||
>>> logits = bartpho(input_ids).logits
|
||||
>>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
|
||||
>>> probs = logits[0, masked_index].softmax(dim=0)
|
||||
>>> values, predictions = probs.topk(5)
|
||||
>>> print(tokenizer.decode(predictions).split())
|
||||
```
|
||||
|
||||
- This implementation is only for tokenization: "monolingual_vocab_file" consists of Vietnamese-specialized types
|
||||
extracted from the pre-trained SentencePiece model "vocab_file" that is available from the multilingual XLM-RoBERTa.
|
||||
Other languages, if employing this pre-trained multilingual SentencePiece model "vocab_file" for subword
|
||||
segmentation, can reuse BartphoTokenizer with their own language-specialized "monolingual_vocab_file".
|
||||
|
||||
This model was contributed by [dqnguyen](https://huggingface.co/dqnguyen). The original code can be found [here](https://github.com/VinAIResearch/BARTpho).
|
||||
|
||||
## BartphoTokenizer
|
||||
|
||||
[[autodoc]] BartphoTokenizer
|
@ -1,86 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BARTpho
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BARTpho model was proposed in `BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
|
||||
<https://arxiv.org/abs/2109.09701>`__ by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present BARTpho with two versions -- BARTpho_word and BARTpho_syllable -- the first public large-scale monolingual
|
||||
sequence-to-sequence models pre-trained for Vietnamese. Our BARTpho uses the "large" architecture and pre-training
|
||||
scheme of the sequence-to-sequence denoising model BART, thus especially suitable for generative NLP tasks. Experiments
|
||||
on a downstream task of Vietnamese text summarization show that in both automatic and human evaluations, our BARTpho
|
||||
outperforms the strong baseline mBART and improves the state-of-the-art. We release BARTpho to facilitate future
|
||||
research and applications of generative Vietnamese NLP tasks.*
|
||||
|
||||
Example of use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bartpho = AutoModel.from_pretrained("vinai/bartpho-syllable")
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("vinai/bartpho-syllable")
|
||||
|
||||
>>> line = "Chúng tôi là những nghiên cứu viên."
|
||||
|
||||
>>> input_ids = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... features = bartpho(**input_ids) # Models outputs are now tuples
|
||||
|
||||
>>> # With TensorFlow 2.0+:
|
||||
>>> from transformers import TFAutoModel
|
||||
>>> bartpho = TFAutoModel.from_pretrained("vinai/bartpho-syllable")
|
||||
>>> input_ids = tokenizer(line, return_tensors="tf")
|
||||
>>> features = bartpho(**input_ids)
|
||||
|
||||
Tips:
|
||||
|
||||
- Following mBART, BARTpho uses the "large" architecture of BART with an additional layer-normalization layer on top of
|
||||
both the encoder and decoder. Thus, usage examples in the :doc:`documentation of BART <bart>`, when adapting to use
|
||||
with BARTpho, should be adjusted by replacing the BART-specialized classes with the mBART-specialized counterparts.
|
||||
For example:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import MBartForConditionalGeneration
|
||||
>>> bartpho = MBartForConditionalGeneration.from_pretrained("vinai/bartpho-syllable")
|
||||
>>> TXT = 'Chúng tôi là <mask> nghiên cứu viên.'
|
||||
>>> input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
|
||||
>>> logits = bartpho(input_ids).logits
|
||||
>>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
|
||||
>>> probs = logits[0, masked_index].softmax(dim=0)
|
||||
>>> values, predictions = probs.topk(5)
|
||||
>>> print(tokenizer.decode(predictions).split())
|
||||
|
||||
- This implementation is only for tokenization: "monolingual_vocab_file" consists of Vietnamese-specialized types
|
||||
extracted from the pre-trained SentencePiece model "vocab_file" that is available from the multilingual XLM-RoBERTa.
|
||||
Other languages, if employing this pre-trained multilingual SentencePiece model "vocab_file" for subword
|
||||
segmentation, can reuse BartphoTokenizer with their own language-specialized "monolingual_vocab_file".
|
||||
|
||||
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
|
||||
<https://github.com/VinAIResearch/BARTpho>`__.
|
||||
|
||||
BartphoTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartphoTokenizer
|
||||
:members:
|
114
docs/source/model_doc/beit.mdx
Normal file
114
docs/source/model_doc/beit.mdx
Normal file
@ -0,0 +1,114 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BEiT
|
||||
|
||||
## Overview
|
||||
|
||||
The BEiT model was proposed in [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by
|
||||
Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of
|
||||
Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class
|
||||
of an image (as done in the [original ViT paper](https://arxiv.org/abs/2010.11929)), BEiT models are pre-trained to
|
||||
predict visual tokens from the codebook of OpenAI's [DALL-E model](https://arxiv.org/abs/2102.12092) given masked
|
||||
patches.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation
|
||||
from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image
|
||||
modeling task to pretrain vision Transformers. Specifically, each image has two views in our pre-training, i.e, image
|
||||
patches (such as 16x16 pixels), and visual tokens (i.e., discrete tokens). We first "tokenize" the original image into
|
||||
visual tokens. Then we randomly mask some image patches and fed them into the backbone Transformer. The pre-training
|
||||
objective is to recover the original visual tokens based on the corrupted image patches. After pre-training BEiT, we
|
||||
directly fine-tune the model parameters on downstream tasks by appending task layers upon the pretrained encoder.
|
||||
Experimental results on image classification and semantic segmentation show that our model achieves competitive results
|
||||
with previous pre-training methods. For example, base-size BEiT achieves 83.2% top-1 accuracy on ImageNet-1K,
|
||||
significantly outperforming from-scratch DeiT training (81.8%) with the same setup. Moreover, large-size BEiT obtains
|
||||
86.3% only using ImageNet-1K, even outperforming ViT-L with supervised pre-training on ImageNet-22K (85.2%).*
|
||||
|
||||
Tips:
|
||||
|
||||
- BEiT models are regular Vision Transformers, but pre-trained in a self-supervised way rather than supervised. They
|
||||
outperform both the [original model (ViT)](vit) as well as [Data-efficient Image Transformers (DeiT)](deit) when fine-tuned on ImageNet-1K and CIFAR-100. You can check out demo notebooks regarding inference as well as
|
||||
fine-tuning on custom data [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer) (you can just replace
|
||||
[`ViTFeatureExtractor`] by [`BeitFeatureExtractor`] and
|
||||
[`ViTForImageClassification`] by [`BeitForImageClassification`]).
|
||||
- There's also a demo notebook available which showcases how to combine DALL-E's image tokenizer with BEiT for
|
||||
performing masked image modeling. You can find it [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/BEiT).
|
||||
- As the BEiT models expect each image to be of the same size (resolution), one can use
|
||||
[`BeitFeatureExtractor`] to resize (or rescale) and normalize images for the model.
|
||||
- Both the patch resolution and image resolution used during pre-training or fine-tuning are reflected in the name of
|
||||
each checkpoint. For example, `microsoft/beit-base-patch16-224` refers to a base-sized architecture with patch
|
||||
resolution of 16x16 and fine-tuning resolution of 224x224. All checkpoints can be found on the [hub](https://huggingface.co/models?search=microsoft/beit).
|
||||
- The available checkpoints are either (1) pre-trained on [ImageNet-22k](http://www.image-net.org/) (a collection of
|
||||
14 million images and 22k classes) only, (2) also fine-tuned on ImageNet-22k or (3) also fine-tuned on [ImageNet-1k](http://www.image-net.org/challenges/LSVRC/2012/) (also referred to as ILSVRC 2012, a collection of 1.3 million
|
||||
images and 1,000 classes).
|
||||
- BEiT uses relative position embeddings, inspired by the T5 model. During pre-training, the authors shared the
|
||||
relative position bias among the several self-attention layers. During fine-tuning, each layer's relative position
|
||||
bias is initialized with the shared relative position bias obtained after pre-training. Note that, if one wants to
|
||||
pre-train a model from scratch, one needs to either set the `use_relative_position_bias` or the
|
||||
`use_relative_position_bias` attribute of [`BeitConfig`] to `True` in order to add
|
||||
position embeddings.
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The JAX/FLAX version of this model was
|
||||
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/beit).
|
||||
|
||||
|
||||
## BEiT specific outputs
|
||||
|
||||
[[autodoc]] models.beit.modeling_beit.BeitModelOutputWithPooling
|
||||
|
||||
[[autodoc]] models.beit.modeling_flax_beit.FlaxBeitModelOutputWithPooling
|
||||
|
||||
## BeitConfig
|
||||
|
||||
[[autodoc]] BeitConfig
|
||||
|
||||
## BeitFeatureExtractor
|
||||
|
||||
[[autodoc]] BeitFeatureExtractor
|
||||
- __call__
|
||||
|
||||
## BeitModel
|
||||
|
||||
[[autodoc]] BeitModel
|
||||
- forward
|
||||
|
||||
## BeitForMaskedImageModeling
|
||||
|
||||
[[autodoc]] BeitForMaskedImageModeling
|
||||
- forward
|
||||
|
||||
## BeitForImageClassification
|
||||
|
||||
[[autodoc]] BeitForImageClassification
|
||||
- forward
|
||||
|
||||
## BeitForSemanticSegmentation
|
||||
|
||||
[[autodoc]] BeitForSemanticSegmentation
|
||||
- forward
|
||||
|
||||
## FlaxBeitModel
|
||||
|
||||
[[autodoc]] FlaxBeitModel
|
||||
- __call__
|
||||
|
||||
## FlaxBeitForMaskedImageModeling
|
||||
|
||||
[[autodoc]] FlaxBeitForMaskedImageModeling
|
||||
- __call__
|
||||
|
||||
## FlaxBeitForImageClassification
|
||||
|
||||
[[autodoc]] FlaxBeitForImageClassification
|
||||
- __call__
|
@ -1,144 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BEiT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BEiT model was proposed in `BEiT: BERT Pre-Training of Image Transformers <https://arxiv.org/abs/2106.08254>`__ by
|
||||
Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of
|
||||
Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class
|
||||
of an image (as done in the `original ViT paper <https://arxiv.org/abs/2010.11929>`__), BEiT models are pre-trained to
|
||||
predict visual tokens from the codebook of OpenAI's `DALL-E model <https://arxiv.org/abs/2102.12092>`__ given masked
|
||||
patches.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation
|
||||
from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image
|
||||
modeling task to pretrain vision Transformers. Specifically, each image has two views in our pre-training, i.e, image
|
||||
patches (such as 16x16 pixels), and visual tokens (i.e., discrete tokens). We first "tokenize" the original image into
|
||||
visual tokens. Then we randomly mask some image patches and fed them into the backbone Transformer. The pre-training
|
||||
objective is to recover the original visual tokens based on the corrupted image patches. After pre-training BEiT, we
|
||||
directly fine-tune the model parameters on downstream tasks by appending task layers upon the pretrained encoder.
|
||||
Experimental results on image classification and semantic segmentation show that our model achieves competitive results
|
||||
with previous pre-training methods. For example, base-size BEiT achieves 83.2% top-1 accuracy on ImageNet-1K,
|
||||
significantly outperforming from-scratch DeiT training (81.8%) with the same setup. Moreover, large-size BEiT obtains
|
||||
86.3% only using ImageNet-1K, even outperforming ViT-L with supervised pre-training on ImageNet-22K (85.2%).*
|
||||
|
||||
Tips:
|
||||
|
||||
- BEiT models are regular Vision Transformers, but pre-trained in a self-supervised way rather than supervised. They
|
||||
outperform both the :doc:`original model (ViT) <vit>` as well as :doc:`Data-efficient Image Transformers (DeiT)
|
||||
<deit>` when fine-tuned on ImageNet-1K and CIFAR-100. You can check out demo notebooks regarding inference as well as
|
||||
fine-tuning on custom data `here
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer>`__ (you can just replace
|
||||
:class:`~transformers.ViTFeatureExtractor` by :class:`~transformers.BeitFeatureExtractor` and
|
||||
:class:`~transformers.ViTForImageClassification` by :class:`~transformers.BeitForImageClassification`).
|
||||
- There's also a demo notebook available which showcases how to combine DALL-E's image tokenizer with BEiT for
|
||||
performing masked image modeling. You can find it `here
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/BEiT>`__.
|
||||
- As the BEiT models expect each image to be of the same size (resolution), one can use
|
||||
:class:`~transformers.BeitFeatureExtractor` to resize (or rescale) and normalize images for the model.
|
||||
- Both the patch resolution and image resolution used during pre-training or fine-tuning are reflected in the name of
|
||||
each checkpoint. For example, :obj:`microsoft/beit-base-patch16-224` refers to a base-sized architecture with patch
|
||||
resolution of 16x16 and fine-tuning resolution of 224x224. All checkpoints can be found on the `hub
|
||||
<https://huggingface.co/models?search=microsoft/beit>`__.
|
||||
- The available checkpoints are either (1) pre-trained on `ImageNet-22k <http://www.image-net.org/>`__ (a collection of
|
||||
14 million images and 22k classes) only, (2) also fine-tuned on ImageNet-22k or (3) also fine-tuned on `ImageNet-1k
|
||||
<http://www.image-net.org/challenges/LSVRC/2012/>`__ (also referred to as ILSVRC 2012, a collection of 1.3 million
|
||||
images and 1,000 classes).
|
||||
- BEiT uses relative position embeddings, inspired by the T5 model. During pre-training, the authors shared the
|
||||
relative position bias among the several self-attention layers. During fine-tuning, each layer's relative position
|
||||
bias is initialized with the shared relative position bias obtained after pre-training. Note that, if one wants to
|
||||
pre-train a model from scratch, one needs to either set the :obj:`use_relative_position_bias` or the
|
||||
:obj:`use_relative_position_bias` attribute of :class:`~transformers.BeitConfig` to :obj:`True` in order to add
|
||||
position embeddings.
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The JAX/FLAX version of this model was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/unilm/tree/master/beit>`__.
|
||||
|
||||
|
||||
BEiT specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.beit.modeling_beit.BeitModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.beit.modeling_flax_beit.FlaxBeitModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
|
||||
BeitConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitConfig
|
||||
:members:
|
||||
|
||||
|
||||
BeitFeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitFeatureExtractor
|
||||
:members: __call__
|
||||
|
||||
|
||||
BeitModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitModel
|
||||
:members: forward
|
||||
|
||||
|
||||
BeitForMaskedImageModeling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitForMaskedImageModeling
|
||||
:members: forward
|
||||
|
||||
|
||||
BeitForImageClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitForImageClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
BeitForSemanticSegmentation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitForSemanticSegmentation
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaxBeitModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBeitModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBeitForMaskedImageModeling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBeitForMaskedImageModeling
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBeitForImageClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBeitForImageClassification
|
||||
:members: __call__
|
74
docs/source/model_doc/bert_japanese.mdx
Normal file
74
docs/source/model_doc/bert_japanese.mdx
Normal file
@ -0,0 +1,74 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BertJapanese
|
||||
|
||||
## Overview
|
||||
|
||||
The BERT models trained on Japanese text.
|
||||
|
||||
There are models with two different tokenization methods:
|
||||
|
||||
- Tokenize with MeCab and WordPiece. This requires some extra dependencies, [fugashi](https://github.com/polm/fugashi) which is a wrapper around [MeCab](https://taku910.github.io/mecab/).
|
||||
- Tokenize into characters.
|
||||
|
||||
To use *MecabTokenizer*, you should `pip install transformers["ja"]` (or `pip install -e .["ja"]` if you install
|
||||
from source) to install dependencies.
|
||||
|
||||
See [details on cl-tohoku repository](https://github.com/cl-tohoku/bert-japanese).
|
||||
|
||||
Example of using a model with MeCab and WordPiece tokenization:
|
||||
|
||||
```python
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾輩 は 猫 で ある 。 [SEP]
|
||||
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
```
|
||||
|
||||
Example of using a model with Character tokenization:
|
||||
|
||||
```python
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾 輩 は 猫 で あ る 。 [SEP]
|
||||
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
```
|
||||
|
||||
Tips:
|
||||
|
||||
- This implementation is the same as BERT, except for tokenization method. Refer to the [documentation of BERT](bert) for more usage examples.
|
||||
|
||||
This model was contributed by [cl-tohoku](https://huggingface.co/cl-tohoku).
|
||||
|
||||
## BertJapaneseTokenizer
|
||||
|
||||
[[autodoc]] BertJapaneseTokenizer
|
@ -1,80 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BertJapanese
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BERT models trained on Japanese text.
|
||||
|
||||
There are models with two different tokenization methods:
|
||||
|
||||
- Tokenize with MeCab and WordPiece. This requires some extra dependencies, `fugashi
|
||||
<https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__.
|
||||
- Tokenize into characters.
|
||||
|
||||
To use `MecabTokenizer`, you should ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install
|
||||
from source) to install dependencies.
|
||||
|
||||
See `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__.
|
||||
|
||||
Example of using a model with MeCab and WordPiece tokenization:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
|
||||
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾輩 は 猫 で ある 。 [SEP]
|
||||
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
|
||||
Example of using a model with Character tokenization:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char")
|
||||
|
||||
>>> ## Input Japanese Text
|
||||
>>> line = "吾輩は猫である。"
|
||||
|
||||
>>> inputs = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> print(tokenizer.decode(inputs['input_ids'][0]))
|
||||
[CLS] 吾 輩 は 猫 で あ る 。 [SEP]
|
||||
|
||||
>>> outputs = bertjapanese(**inputs)
|
||||
|
||||
Tips:
|
||||
|
||||
- This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT
|
||||
<bert>` for more usage examples.
|
||||
|
||||
This model was contributed by `cl-tohoku <https://huggingface.co/cl-tohoku>`__.
|
||||
|
||||
BertJapaneseTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BertJapaneseTokenizer
|
||||
:members:
|
98
docs/source/model_doc/bertgeneration.mdx
Normal file
98
docs/source/model_doc/bertgeneration.mdx
Normal file
@ -0,0 +1,98 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BertGeneration
|
||||
|
||||
## Overview
|
||||
|
||||
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using
|
||||
[`EncoderDecoderModel`] as proposed in [Leveraging Pre-trained Checkpoints for Sequence Generation
|
||||
Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Unsupervised pretraining of large neural models has recently revolutionized Natural Language Processing. By
|
||||
warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple
|
||||
benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language
|
||||
Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We
|
||||
developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT,
|
||||
GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both
|
||||
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
|
||||
Text Summarization, Sentence Splitting, and Sentence Fusion.*
|
||||
|
||||
Usage:
|
||||
|
||||
- The model can be used in combination with the [`EncoderDecoderModel`] to leverage two pretrained
|
||||
BERT checkpoints for subsequent fine-tuning.
|
||||
|
||||
```python
|
||||
>>> # leverage checkpoints for Bert2Bert model...
|
||||
>>> # use BERT's cls token as BOS token and sep token as EOS token
|
||||
>>> encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
|
||||
>>> # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
|
||||
>>> decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
|
||||
>>> bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
|
||||
|
||||
>>> # create tokenizer...
|
||||
>>> tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
|
||||
|
||||
>>> input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
>>> labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
|
||||
|
||||
>>> # train...
|
||||
>>> loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
|
||||
>>> loss.backward()
|
||||
```
|
||||
|
||||
- Pretrained [`EncoderDecoderModel`] are also directly available in the model hub, e.g.,
|
||||
|
||||
|
||||
```python
|
||||
>>> # instantiate sentence fusion model
|
||||
>>> sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
|
||||
>>> input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
|
||||
>>> outputs = sentence_fuser.generate(input_ids)
|
||||
|
||||
>>> print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
Tips:
|
||||
|
||||
- [`BertGenerationEncoder`] and [`BertGenerationDecoder`] should be used in
|
||||
combination with [`EncoderDecoder`].
|
||||
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
|
||||
Therefore, no EOS token should be added to the end of the input.
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be
|
||||
found [here](https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder).
|
||||
|
||||
## BertGenerationConfig
|
||||
|
||||
[[autodoc]] BertGenerationConfig
|
||||
|
||||
## BertGenerationTokenizer
|
||||
|
||||
[[autodoc]] BertGenerationTokenizer
|
||||
- save_vocabulary
|
||||
|
||||
## BertGenerationEncoder
|
||||
|
||||
[[autodoc]] BertGenerationEncoder
|
||||
- forward
|
||||
|
||||
## BertGenerationDecoder
|
||||
|
||||
[[autodoc]] BertGenerationDecoder
|
||||
- forward
|
@ -1,109 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BertGeneration
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using
|
||||
:class:`~transformers.EncoderDecoderModel` as proposed in `Leveraging Pre-trained Checkpoints for Sequence Generation
|
||||
Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Unsupervised pretraining of large neural models has recently revolutionized Natural Language Processing. By
|
||||
warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple
|
||||
benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language
|
||||
Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We
|
||||
developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT,
|
||||
GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both
|
||||
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
|
||||
Text Summarization, Sentence Splitting, and Sentence Fusion.*
|
||||
|
||||
Usage:
|
||||
|
||||
- The model can be used in combination with the :class:`~transformers.EncoderDecoderModel` to leverage two pretrained
|
||||
BERT checkpoints for subsequent fine-tuning.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> # leverage checkpoints for Bert2Bert model...
|
||||
>>> # use BERT's cls token as BOS token and sep token as EOS token
|
||||
>>> encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
|
||||
>>> # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
|
||||
>>> decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
|
||||
>>> bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
|
||||
|
||||
>>> # create tokenizer...
|
||||
>>> tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
|
||||
|
||||
>>> input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
>>> labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
|
||||
|
||||
>>> # train...
|
||||
>>> loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
|
||||
>>> loss.backward()
|
||||
|
||||
|
||||
- Pretrained :class:`~transformers.EncoderDecoderModel` are also directly available in the model hub, e.g.,
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> # instantiate sentence fusion model
|
||||
>>> sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
|
||||
|
||||
>>> input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
|
||||
|
||||
>>> outputs = sentence_fuser.generate(input_ids)
|
||||
|
||||
>>> print(tokenizer.decode(outputs[0]))
|
||||
|
||||
|
||||
Tips:
|
||||
|
||||
- :class:`~transformers.BertGenerationEncoder` and :class:`~transformers.BertGenerationDecoder` should be used in
|
||||
combination with :class:`~transformers.EncoderDecoder`.
|
||||
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
|
||||
Therefore, no EOS token should be added to the end of the input.
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||
found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
|
||||
|
||||
BertGenerationConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BertGenerationConfig
|
||||
:members:
|
||||
|
||||
|
||||
BertGenerationTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BertGenerationTokenizer
|
||||
:members: save_vocabulary
|
||||
|
||||
BertGenerationEncoder
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BertGenerationEncoder
|
||||
:members: forward
|
||||
|
||||
|
||||
BertGenerationDecoder
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BertGenerationDecoder
|
||||
:members: forward
|
58
docs/source/model_doc/bertweet.mdx
Normal file
58
docs/source/model_doc/bertweet.mdx
Normal file
@ -0,0 +1,58 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BERTweet
|
||||
|
||||
## Overview
|
||||
|
||||
The BERTweet model was proposed in [BERTweet: A pre-trained language model for English Tweets](https://www.aclweb.org/anthology/2020.emnlp-demos.2.pdf) by Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having
|
||||
the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et
|
||||
al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al.,
|
||||
2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks:
|
||||
Part-of-speech tagging, Named-entity recognition and text classification.*
|
||||
|
||||
Example of use:
|
||||
|
||||
```python
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
>>> # For transformers v4.x+:
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
|
||||
|
||||
>>> # For transformers v3.x:
|
||||
>>> # tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
>>> # INPUT TWEET IS ALREADY NORMALIZED!
|
||||
>>> line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
|
||||
|
||||
>>> input_ids = torch.tensor([tokenizer.encode(line)])
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... features = bertweet(input_ids) # Models outputs are now tuples
|
||||
|
||||
>>> # With TensorFlow 2.0+:
|
||||
>>> # from transformers import TFAutoModel
|
||||
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
|
||||
```
|
||||
|
||||
This model was contributed by [dqnguyen](https://huggingface.co/dqnguyen). The original code can be found [here](https://github.com/VinAIResearch/BERTweet).
|
||||
|
||||
## BertweetTokenizer
|
||||
|
||||
[[autodoc]] BertweetTokenizer
|
@ -1,64 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BERTweet
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BERTweet model was proposed in `BERTweet: A pre-trained language model for English Tweets
|
||||
<https://www.aclweb.org/anthology/2020.emnlp-demos.2.pdf>`__ by Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having
|
||||
the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et
|
||||
al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al.,
|
||||
2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks:
|
||||
Part-of-speech tagging, Named-entity recognition and text classification.*
|
||||
|
||||
Example of use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
>>> # For transformers v4.x+:
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
|
||||
|
||||
>>> # For transformers v3.x:
|
||||
>>> # tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
>>> # INPUT TWEET IS ALREADY NORMALIZED!
|
||||
>>> line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
|
||||
|
||||
>>> input_ids = torch.tensor([tokenizer.encode(line)])
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... features = bertweet(input_ids) # Models outputs are now tuples
|
||||
|
||||
>>> # With TensorFlow 2.0+:
|
||||
>>> # from transformers import TFAutoModel
|
||||
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
|
||||
|
||||
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
|
||||
<https://github.com/VinAIResearch/BERTweet>`__.
|
||||
|
||||
BertweetTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BertweetTokenizer
|
||||
:members:
|
146
docs/source/model_doc/bigbird.mdx
Normal file
146
docs/source/model_doc/bigbird.mdx
Normal file
@ -0,0 +1,146 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BigBird
|
||||
|
||||
## Overview
|
||||
|
||||
The BigBird model was proposed in [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by
|
||||
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
|
||||
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
|
||||
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
|
||||
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
|
||||
has been shown that applying sparse, global, and random attention approximates full attention, while being
|
||||
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
|
||||
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
|
||||
summarization, compared to BERT or RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
|
||||
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
|
||||
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
|
||||
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
|
||||
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
|
||||
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
|
||||
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
|
||||
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
|
||||
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
|
||||
propose novel applications to genomics data.*
|
||||
|
||||
Tips:
|
||||
|
||||
- For an in-detail explanation on how BigBird's attention works, see [this blog post](https://huggingface.co/blog/big-bird).
|
||||
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
|
||||
**original_full** is advised as there is no benefit in using **block_sparse** attention.
|
||||
- The code currently uses window size of 3 blocks and 2 global blocks.
|
||||
- Sequence length must be divisible by block size.
|
||||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**
|
||||
|
||||
This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
|
||||
[here](https://github.com/google-research/bigbird).
|
||||
|
||||
## BigBirdConfig
|
||||
|
||||
[[autodoc]] BigBirdConfig
|
||||
|
||||
## BigBirdTokenizer
|
||||
|
||||
[[autodoc]] BigBirdTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## BigBirdTokenizerFast
|
||||
|
||||
[[autodoc]] BigBirdTokenizerFast
|
||||
|
||||
## BigBird specific outputs
|
||||
|
||||
[[autodoc]] models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput
|
||||
|
||||
## BigBirdModel
|
||||
|
||||
[[autodoc]] BigBirdModel
|
||||
- forward
|
||||
|
||||
## BigBirdForPreTraining
|
||||
|
||||
[[autodoc]] BigBirdForPreTraining
|
||||
- forward
|
||||
|
||||
## BigBirdForCausalLM
|
||||
|
||||
[[autodoc]] BigBirdForCausalLM
|
||||
- forward
|
||||
|
||||
## BigBirdForMaskedLM
|
||||
|
||||
[[autodoc]] BigBirdForMaskedLM
|
||||
- forward
|
||||
|
||||
## BigBirdForSequenceClassification
|
||||
|
||||
[[autodoc]] BigBirdForSequenceClassification
|
||||
- forward
|
||||
|
||||
## BigBirdForMultipleChoice
|
||||
|
||||
[[autodoc]] BigBirdForMultipleChoice
|
||||
- forward
|
||||
|
||||
## BigBirdForTokenClassification
|
||||
|
||||
[[autodoc]] BigBirdForTokenClassification
|
||||
- forward
|
||||
|
||||
## BigBirdForQuestionAnswering
|
||||
|
||||
[[autodoc]] BigBirdForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## FlaxBigBirdModel
|
||||
|
||||
[[autodoc]] FlaxBigBirdModel
|
||||
- __call__
|
||||
|
||||
## FlaxBigBirdForPreTraining
|
||||
|
||||
[[autodoc]] FlaxBigBirdForPreTraining
|
||||
- __call__
|
||||
|
||||
## FlaxBigBirdForMaskedLM
|
||||
|
||||
[[autodoc]] FlaxBigBirdForMaskedLM
|
||||
- __call__
|
||||
|
||||
## FlaxBigBirdForSequenceClassification
|
||||
|
||||
[[autodoc]] FlaxBigBirdForSequenceClassification
|
||||
- __call__
|
||||
|
||||
## FlaxBigBirdForMultipleChoice
|
||||
|
||||
[[autodoc]] FlaxBigBirdForMultipleChoice
|
||||
- __call__
|
||||
|
||||
## FlaxBigBirdForTokenClassification
|
||||
|
||||
[[autodoc]] FlaxBigBirdForTokenClassification
|
||||
- __call__
|
||||
|
||||
## FlaxBigBirdForQuestionAnswering
|
||||
|
||||
[[autodoc]] FlaxBigBirdForQuestionAnswering
|
||||
- __call__
|
@ -1,185 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BigBird
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BigBird model was proposed in `Big Bird: Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by
|
||||
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
|
||||
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
|
||||
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
|
||||
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
|
||||
has been shown that applying sparse, global, and random attention approximates full attention, while being
|
||||
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
|
||||
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
|
||||
summarization, compared to BERT or RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
|
||||
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
|
||||
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
|
||||
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
|
||||
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
|
||||
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
|
||||
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
|
||||
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
|
||||
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
|
||||
propose novel applications to genomics data.*
|
||||
|
||||
Tips:
|
||||
|
||||
- For an in-detail explanation on how BigBird's attention works, see `this blog post
|
||||
<https://huggingface.co/blog/big-bird>`__.
|
||||
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
|
||||
**original_full** is advised as there is no benefit in using **block_sparse** attention.
|
||||
- The code currently uses window size of 3 blocks and 2 global blocks.
|
||||
- Sequence length must be divisible by block size.
|
||||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**
|
||||
|
||||
This model was contributed by `vasudevgupta <https://huggingface.co/vasudevgupta>`__. The original code can be found
|
||||
`here <https://github.com/google-research/bigbird>`__.
|
||||
|
||||
BigBirdConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdConfig
|
||||
:members:
|
||||
|
||||
|
||||
BigBirdTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
BigBirdTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdTokenizerFast
|
||||
:members:
|
||||
|
||||
BigBird specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
|
||||
BigBirdModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdModel
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdForPreTraining
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaxBigBirdModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBigBirdModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBigBirdForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBigBirdForPreTraining
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBigBirdForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBigBirdForMaskedLM
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBigBirdForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBigBirdForSequenceClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBigBirdForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBigBirdForMultipleChoice
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBigBirdForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBigBirdForTokenClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBigBirdForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBigBirdForQuestionAnswering
|
||||
:members: __call__
|
81
docs/source/model_doc/bigbird_pegasus.mdx
Normal file
81
docs/source/model_doc/bigbird_pegasus.mdx
Normal file
@ -0,0 +1,81 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# BigBirdPegasus
|
||||
|
||||
## Overview
|
||||
|
||||
The BigBird model was proposed in [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by
|
||||
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
|
||||
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
|
||||
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
|
||||
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
|
||||
has been shown that applying sparse, global, and random attention approximates full attention, while being
|
||||
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
|
||||
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
|
||||
summarization, compared to BERT or RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
|
||||
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
|
||||
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
|
||||
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
|
||||
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
|
||||
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
|
||||
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
|
||||
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
|
||||
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
|
||||
propose novel applications to genomics data.*
|
||||
|
||||
Tips:
|
||||
|
||||
- For an in-detail explanation on how BigBird's attention works, see [this blog post](https://huggingface.co/blog/big-bird).
|
||||
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
|
||||
**original_full** is advised as there is no benefit in using **block_sparse** attention.
|
||||
- The code currently uses window size of 3 blocks and 2 global blocks.
|
||||
- Sequence length must be divisible by block size.
|
||||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**.
|
||||
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/master/src/transformers/models/pegasus/tokenization_pegasus.py).
|
||||
|
||||
The original code can be found [here](https://github.com/google-research/bigbird).
|
||||
|
||||
## BigBirdPegasusConfig
|
||||
|
||||
[[autodoc]] BigBirdPegasusConfig
|
||||
- all
|
||||
|
||||
## BigBirdPegasusModel
|
||||
|
||||
[[autodoc]] BigBirdPegasusModel
|
||||
- forward
|
||||
|
||||
## BigBirdPegasusForConditionalGeneration
|
||||
|
||||
[[autodoc]] BigBirdPegasusForConditionalGeneration
|
||||
- forward
|
||||
|
||||
## BigBirdPegasusForSequenceClassification
|
||||
|
||||
[[autodoc]] BigBirdPegasusForSequenceClassification
|
||||
- forward
|
||||
|
||||
## BigBirdPegasusForQuestionAnswering
|
||||
|
||||
[[autodoc]] BigBirdPegasusForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## BigBirdPegasusForCausalLM
|
||||
|
||||
[[autodoc]] BigBirdPegasusForCausalLM
|
||||
- forward
|
@ -1,98 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BigBirdPegasus
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BigBird model was proposed in `Big Bird: Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by
|
||||
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
|
||||
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
|
||||
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
|
||||
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
|
||||
has been shown that applying sparse, global, and random attention approximates full attention, while being
|
||||
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
|
||||
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
|
||||
summarization, compared to BERT or RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
|
||||
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
|
||||
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
|
||||
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
|
||||
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
|
||||
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
|
||||
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
|
||||
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
|
||||
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
|
||||
propose novel applications to genomics data.*
|
||||
|
||||
Tips:
|
||||
|
||||
- For an in-detail explanation on how BigBird's attention works, see `this blog post
|
||||
<https://huggingface.co/blog/big-bird>`__.
|
||||
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
|
||||
**original_full** is advised as there is no benefit in using **block_sparse** attention.
|
||||
- The code currently uses window size of 3 blocks and 2 global blocks.
|
||||
- Sequence length must be divisible by block size.
|
||||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**.
|
||||
- BigBirdPegasus uses the `PegasusTokenizer
|
||||
<https://github.com/huggingface/transformers/blob/master/src/transformers/models/pegasus/tokenization_pegasus.py>`__.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/bigbird>`__.
|
||||
|
||||
BigBirdPegasusConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdPegasusConfig
|
||||
:members:
|
||||
|
||||
|
||||
BigBirdPegasusModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdPegasusModel
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdPegasusForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdPegasusForConditionalGeneration
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdPegasusForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdPegasusForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdPegasusForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdPegasusForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
BigBirdPegasusForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdPegasusForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
118
docs/source/model_doc/blenderbot.mdx
Normal file
118
docs/source/model_doc/blenderbot.mdx
Normal file
@ -0,0 +1,118 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Blenderbot
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title) .
|
||||
|
||||
## Overview
|
||||
|
||||
The Blender chatbot model was proposed in [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
|
||||
Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
|
||||
|
||||
The abstract of the paper is the following:
|
||||
|
||||
*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that
|
||||
scaling neural models in the number of parameters and the size of the data they are trained on gives improved results,
|
||||
we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of
|
||||
skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to
|
||||
their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent
|
||||
persona. We show that large scale models can learn these skills when given appropriate training data and choice of
|
||||
generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models
|
||||
and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn
|
||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) .
|
||||
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- Blenderbot uses a standard [seq2seq model transformer](https://arxiv.org/pdf/1706.03762.pdf) based architecture.
|
||||
- Available checkpoints can be found in the [model hub](https://huggingface.co/models?search=blenderbot).
|
||||
- This is the *default* Blenderbot model class. However, some smaller checkpoints, such as
|
||||
`facebook/blenderbot_small_90M`, have a different architecture and consequently should be used with
|
||||
[BlenderbotSmall](blenderbot_small).
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
Here is an example of model usage:
|
||||
|
||||
```python
|
||||
>>> from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
|
||||
>>> mname = 'facebook/blenderbot-400M-distill'
|
||||
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
|
||||
>>> tokenizer = BlenderbotTokenizer.from_pretrained(mname)
|
||||
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
|
||||
>>> inputs = tokenizer([UTTERANCE], return_tensors='pt')
|
||||
>>> reply_ids = model.generate(**inputs)
|
||||
>>> print(tokenizer.batch_decode(reply_ids))
|
||||
["<s> That's unfortunate. Are they trying to lose weight or are they just trying to be healthier?</s>"]
|
||||
```
|
||||
|
||||
## BlenderbotConfig
|
||||
|
||||
[[autodoc]] BlenderbotConfig
|
||||
|
||||
## BlenderbotTokenizer
|
||||
|
||||
[[autodoc]] BlenderbotTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
|
||||
## BlenderbotTokenizerFast
|
||||
|
||||
[[autodoc]] BlenderbotTokenizerFast
|
||||
- build_inputs_with_special_tokens
|
||||
|
||||
## BlenderbotModel
|
||||
|
||||
See `transformers.BartModel` for arguments to *forward* and *generate*
|
||||
|
||||
[[autodoc]] BlenderbotModel
|
||||
- forward
|
||||
|
||||
## BlenderbotForConditionalGeneration
|
||||
|
||||
See [`~transformers.BartForConditionalGeneration`] for arguments to *forward* and *generate*
|
||||
|
||||
[[autodoc]] BlenderbotForConditionalGeneration
|
||||
- forward
|
||||
|
||||
## BlenderbotForCausalLM
|
||||
|
||||
[[autodoc]] BlenderbotForCausalLM
|
||||
- forward
|
||||
|
||||
## TFBlenderbotModel
|
||||
|
||||
[[autodoc]] TFBlenderbotModel
|
||||
- call
|
||||
|
||||
## TFBlenderbotForConditionalGeneration
|
||||
|
||||
[[autodoc]] TFBlenderbotForConditionalGeneration
|
||||
- call
|
||||
|
||||
## FlaxBlenderbotModel
|
||||
|
||||
[[autodoc]] FlaxBlenderbotModel
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxBlenderbotForConditionalGeneration
|
||||
|
||||
[[autodoc]] FlaxBlenderbotForConditionalGeneration
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
@ -1,141 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Blenderbot
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ .
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Blender chatbot model was proposed in `Recipes for building an open-domain chatbot
|
||||
<https://arxiv.org/pdf/2004.13637.pdf>`__ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
|
||||
Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
|
||||
|
||||
The abstract of the paper is the following:
|
||||
|
||||
*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that
|
||||
scaling neural models in the number of parameters and the size of the data they are trained on gives improved results,
|
||||
we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of
|
||||
skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to
|
||||
their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent
|
||||
persona. We show that large scale models can learn these skills when given appropriate training data and choice of
|
||||
generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models
|
||||
and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn
|
||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The authors' code can be found `here
|
||||
<https://github.com/facebookresearch/ParlAI>`__ .
|
||||
|
||||
|
||||
Implementation Notes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Blenderbot uses a standard `seq2seq model transformer <https://arxiv.org/pdf/1706.03762.pdf>`__ based architecture.
|
||||
- Available checkpoints can be found in the `model hub <https://huggingface.co/models?search=blenderbot>`__.
|
||||
- This is the `default` Blenderbot model class. However, some smaller checkpoints, such as
|
||||
``facebook/blenderbot_small_90M``, have a different architecture and consequently should be used with
|
||||
`BlenderbotSmall <blenderbot_small>`__.
|
||||
|
||||
|
||||
Usage
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Here is an example of model usage:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
|
||||
>>> mname = 'facebook/blenderbot-400M-distill'
|
||||
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
|
||||
>>> tokenizer = BlenderbotTokenizer.from_pretrained(mname)
|
||||
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
|
||||
>>> inputs = tokenizer([UTTERANCE], return_tensors='pt')
|
||||
>>> reply_ids = model.generate(**inputs)
|
||||
>>> print(tokenizer.batch_decode(reply_ids))
|
||||
["<s> That's unfortunate. Are they trying to lose weight or are they just trying to be healthier?</s>"]
|
||||
|
||||
|
||||
BlenderbotConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotConfig
|
||||
:members:
|
||||
|
||||
BlenderbotTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotTokenizer
|
||||
:members: build_inputs_with_special_tokens
|
||||
|
||||
|
||||
BlenderbotTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotTokenizerFast
|
||||
:members: build_inputs_with_special_tokens
|
||||
|
||||
|
||||
BlenderbotModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
See :obj:`transformers.BartModel` for arguments to `forward` and `generate`
|
||||
|
||||
.. autoclass:: transformers.BlenderbotModel
|
||||
:members: forward
|
||||
|
||||
|
||||
BlenderbotForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
See :obj:`transformers.BartForConditionalGeneration` for arguments to `forward` and `generate`
|
||||
|
||||
.. autoclass:: transformers.BlenderbotForConditionalGeneration
|
||||
:members: forward
|
||||
|
||||
|
||||
BlenderbotForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
TFBlenderbotModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFBlenderbotModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFBlenderbotForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFBlenderbotForConditionalGeneration
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxBlenderbotModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotModel
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxBlenderbotForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotForConditionalGeneration
|
||||
:members: __call__, encode, decode
|
95
docs/source/model_doc/blenderbot_small.mdx
Normal file
95
docs/source/model_doc/blenderbot_small.mdx
Normal file
@ -0,0 +1,95 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Blenderbot Small
|
||||
|
||||
Note that [`BlenderbotSmallModel`] and
|
||||
[`BlenderbotSmallForConditionalGeneration`] are only used in combination with the checkpoint
|
||||
[facebook/blenderbot-90M](https://huggingface.co/facebook/blenderbot-90M). Larger Blenderbot checkpoints should
|
||||
instead be used with [`BlenderbotModel`] and
|
||||
[`BlenderbotForConditionalGeneration`]
|
||||
|
||||
## Overview
|
||||
|
||||
The Blender chatbot model was proposed in [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
|
||||
Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
|
||||
|
||||
The abstract of the paper is the following:
|
||||
|
||||
*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that
|
||||
scaling neural models in the number of parameters and the size of the data they are trained on gives improved results,
|
||||
we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of
|
||||
skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to
|
||||
their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent
|
||||
persona. We show that large scale models can learn these skills when given appropriate training data and choice of
|
||||
generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models
|
||||
and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn
|
||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be
|
||||
found [here](https://github.com/facebookresearch/ParlAI) .
|
||||
|
||||
## BlenderbotSmallConfig
|
||||
|
||||
[[autodoc]] BlenderbotSmallConfig
|
||||
|
||||
## BlenderbotSmallTokenizer
|
||||
|
||||
[[autodoc]] BlenderbotSmallTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## BlenderbotSmallTokenizerFast
|
||||
|
||||
[[autodoc]] BlenderbotSmallTokenizerFast
|
||||
|
||||
## BlenderbotSmallModel
|
||||
|
||||
[[autodoc]] BlenderbotSmallModel
|
||||
- forward
|
||||
|
||||
## BlenderbotSmallForConditionalGeneration
|
||||
|
||||
[[autodoc]] BlenderbotSmallForConditionalGeneration
|
||||
- forward
|
||||
|
||||
## BlenderbotSmallForCausalLM
|
||||
|
||||
[[autodoc]] BlenderbotSmallForCausalLM
|
||||
- forward
|
||||
|
||||
## TFBlenderbotSmallModel
|
||||
|
||||
[[autodoc]] TFBlenderbotSmallModel
|
||||
- call
|
||||
|
||||
## TFBlenderbotSmallForConditionalGeneration
|
||||
|
||||
[[autodoc]] TFBlenderbotSmallForConditionalGeneration
|
||||
- call
|
||||
|
||||
## FlaxBlenderbotSmallModel
|
||||
|
||||
[[autodoc]] FlaxBlenderbotSmallModel
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxBlenderbotForConditionalGeneration
|
||||
|
||||
[[autodoc]] FlaxBlenderbotSmallForConditionalGeneration
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
@ -1,113 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Blenderbot Small
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Note that :class:`~transformers.BlenderbotSmallModel` and
|
||||
:class:`~transformers.BlenderbotSmallForConditionalGeneration` are only used in combination with the checkpoint
|
||||
`facebook/blenderbot-90M <https://huggingface.co/facebook/blenderbot-90M>`__. Larger Blenderbot checkpoints should
|
||||
instead be used with :class:`~transformers.BlenderbotModel` and
|
||||
:class:`~transformers.BlenderbotForConditionalGeneration`
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Blender chatbot model was proposed in `Recipes for building an open-domain chatbot
|
||||
<https://arxiv.org/pdf/2004.13637.pdf>`__ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
|
||||
Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
|
||||
|
||||
The abstract of the paper is the following:
|
||||
|
||||
*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that
|
||||
scaling neural models in the number of parameters and the size of the data they are trained on gives improved results,
|
||||
we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of
|
||||
skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to
|
||||
their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent
|
||||
persona. We show that large scale models can learn these skills when given appropriate training data and choice of
|
||||
generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models
|
||||
and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn
|
||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The authors' code can be
|
||||
found `here <https://github.com/facebookresearch/ParlAI>`__ .
|
||||
|
||||
BlenderbotSmallConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotSmallConfig
|
||||
:members:
|
||||
|
||||
|
||||
BlenderbotSmallTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotSmallTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
BlenderbotSmallTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotSmallTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
BlenderbotSmallModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotSmallModel
|
||||
:members: forward
|
||||
|
||||
|
||||
BlenderbotSmallForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotSmallForConditionalGeneration
|
||||
:members: forward
|
||||
|
||||
|
||||
BlenderbotSmallForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotSmallForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
TFBlenderbotSmallModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFBlenderbotSmallModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFBlenderbotSmallForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFBlenderbotSmallForConditionalGeneration
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxBlenderbotSmallModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotSmallModel
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxBlenderbotForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotSmallForConditionalGeneration
|
||||
:members: __call__, encode, decode
|
@ -1,22 +1,20 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
BORT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
# BORT
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
## Overview
|
||||
|
||||
The BORT model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by
|
||||
The BORT model was proposed in [Optimal Subarchitecture Extraction for BERT](https://arxiv.org/abs/2010.10499) by
|
||||
Adrian de Wynter and Daniel J. Perry. It is an optimal subset of architectural parameters for the BERT, which the
|
||||
authors refer to as "Bort".
|
||||
|
||||
@ -34,14 +32,11 @@ absolute, with respect to BERT-large, on multiple public natural language unders
|
||||
|
||||
Tips:
|
||||
|
||||
- BORT's model architecture is based on BERT, so one can refer to :doc:`BERT's documentation page <bert>` for the
|
||||
- BORT's model architecture is based on BERT, so one can refer to [BERT's documentation page](bert) for the
|
||||
model's API as well as usage examples.
|
||||
- BORT uses the RoBERTa tokenizer instead of the BERT tokenizer, so one can refer to :doc:`RoBERTa's documentation page
|
||||
<roberta>` for the tokenizer's API as well as usage examples.
|
||||
- BORT requires a specific fine-tuning algorithm, called `Agora
|
||||
<https://adewynter.github.io/notes/bort_algorithms_and_applications.html#fine-tuning-with-algebraic-topology>`__ ,
|
||||
- BORT uses the RoBERTa tokenizer instead of the BERT tokenizer, so one can refer to [RoBERTa's documentation page](roberta) for the tokenizer's API as well as usage examples.
|
||||
- BORT requires a specific fine-tuning algorithm, called [Agora](https://adewynter.github.io/notes/bort_algorithms_and_applications.html#fine-tuning-with-algebraic-topology) ,
|
||||
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
|
||||
algorithm to make BORT fine-tuning work.
|
||||
|
||||
This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
|
||||
<https://github.com/alexa/bort/>`__.
|
||||
This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/alexa/bort/).
|
80
docs/source/model_doc/byt5.mdx
Normal file
80
docs/source/model_doc/byt5.mdx
Normal file
@ -0,0 +1,80 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ByT5
|
||||
|
||||
## Overview
|
||||
|
||||
The ByT5 model was presented in [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir
|
||||
Kale, Adam Roberts, Colin Raffel.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units.
|
||||
Encoding text as a sequence of tokens requires a tokenizer, which is typically created as an independent artifact from
|
||||
the model. Token-free models that instead operate directly on raw text (bytes or characters) have many benefits: they
|
||||
can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by
|
||||
removing complex and error-prone text preprocessing pipelines. Since byte or character sequences are longer than token
|
||||
sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of
|
||||
operating directly on raw text. In this paper, we show that a standard Transformer architecture can be used with
|
||||
minimal modifications to process byte sequences. We carefully characterize the trade-offs in terms of parameter count,
|
||||
training FLOPs, and inference speed, and show that byte-level models are competitive with their token-level
|
||||
counterparts. We also demonstrate that byte-level models are significantly more robust to noise and perform better on
|
||||
tasks that are sensitive to spelling and pronunciation. As part of our contribution, we release a new set of
|
||||
pre-trained byte-level Transformer models based on the T5 architecture, as well as all code and data used in our
|
||||
experiments.*
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be
|
||||
found [here](https://github.com/google-research/byt5).
|
||||
|
||||
ByT5's architecture is based on the T5v1.1 model, so one can refer to [T5v1.1's documentation page](t5v1.1). They
|
||||
only differ in how inputs should be prepared for the model, see the code examples below.
|
||||
|
||||
Since ByT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
|
||||
fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
|
||||
|
||||
|
||||
### Example
|
||||
|
||||
ByT5 works on raw UTF-8 bytes, so it can be used without a tokenizer:
|
||||
|
||||
```python
|
||||
from transformers import T5ForConditionalGeneration
|
||||
import torch
|
||||
|
||||
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
|
||||
|
||||
input_ids = torch.tensor([list("Life is like a box of chocolates.".encode("utf-8"))]) + 3 # add 3 for special tokens
|
||||
labels = torch.tensor([list("La vie est comme une boîte de chocolat.".encode("utf-8"))]) + 3 # add 3 for special tokens
|
||||
|
||||
loss = model(input_ids, labels=labels).loss # forward pass
|
||||
```
|
||||
|
||||
For batched inference and training it is however recommended to make use of the tokenizer:
|
||||
|
||||
```python
|
||||
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
||||
|
||||
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
|
||||
tokenizer = AutoTokenizer.from_pretrained('google/byt5-small')
|
||||
|
||||
model_inputs = tokenizer(["Life is like a box of chocolates.", "Today is Monday."], padding="longest", return_tensors="pt")
|
||||
labels = tokenizer(["La vie est comme une boîte de chocolat.", "Aujourd'hui c'est lundi."], padding="longest", return_tensors="pt").input_ids
|
||||
|
||||
loss = model(**model_inputs, labels=labels).loss # forward pass
|
||||
```
|
||||
|
||||
## ByT5Tokenizer
|
||||
|
||||
[[autodoc]] ByT5Tokenizer
|
||||
|
||||
See [`ByT5Tokenizer`] for all details.
|
@ -1,86 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
ByT5
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ByT5 model was presented in `ByT5: Towards a token-free future with pre-trained byte-to-byte models
|
||||
<https://arxiv.org/abs/2105.13626>`_ by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir
|
||||
Kale, Adam Roberts, Colin Raffel.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Most widely-used pre-trained language models operate on sequences of tokens corresponding to word or subword units.
|
||||
Encoding text as a sequence of tokens requires a tokenizer, which is typically created as an independent artifact from
|
||||
the model. Token-free models that instead operate directly on raw text (bytes or characters) have many benefits: they
|
||||
can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by
|
||||
removing complex and error-prone text preprocessing pipelines. Since byte or character sequences are longer than token
|
||||
sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of
|
||||
operating directly on raw text. In this paper, we show that a standard Transformer architecture can be used with
|
||||
minimal modifications to process byte sequences. We carefully characterize the trade-offs in terms of parameter count,
|
||||
training FLOPs, and inference speed, and show that byte-level models are competitive with their token-level
|
||||
counterparts. We also demonstrate that byte-level models are significantly more robust to noise and perform better on
|
||||
tasks that are sensitive to spelling and pronunciation. As part of our contribution, we release a new set of
|
||||
pre-trained byte-level Transformer models based on the T5 architecture, as well as all code and data used in our
|
||||
experiments.*
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||
found `here <https://github.com/google-research/byt5>`__.
|
||||
|
||||
ByT5's architecture is based on the T5v1.1 model, so one can refer to :doc:`T5v1.1's documentation page <t5v1.1>`. They
|
||||
only differ in how inputs should be prepared for the model, see the code examples below.
|
||||
|
||||
Since ByT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
|
||||
fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
|
||||
|
||||
|
||||
Example
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
ByT5 works on raw UTF-8 bytes, so it can be used without a tokenizer:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import T5ForConditionalGeneration
|
||||
import torch
|
||||
|
||||
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
|
||||
|
||||
input_ids = torch.tensor([list("Life is like a box of chocolates.".encode("utf-8"))]) + 3 # add 3 for special tokens
|
||||
labels = torch.tensor([list("La vie est comme une boîte de chocolat.".encode("utf-8"))]) + 3 # add 3 for special tokens
|
||||
|
||||
loss = model(input_ids, labels=labels).loss # forward pass
|
||||
|
||||
|
||||
For batched inference and training it is however recommended to make use of the tokenizer:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
||||
|
||||
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
|
||||
tokenizer = AutoTokenizer.from_pretrained('google/byt5-small')
|
||||
|
||||
model_inputs = tokenizer(["Life is like a box of chocolates.", "Today is Monday."], padding="longest", return_tensors="pt")
|
||||
labels = tokenizer(["La vie est comme une boîte de chocolat.", "Aujourd'hui c'est lundi."], padding="longest", return_tensors="pt").input_ids
|
||||
|
||||
loss = model(**model_inputs, labels=labels).loss # forward pass
|
||||
|
||||
ByT5Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ByT5Tokenizer
|
||||
|
||||
See :class:`~transformers.ByT5Tokenizer` for all details.
|
106
docs/source/model_doc/camembert.mdx
Normal file
106
docs/source/model_doc/camembert.mdx
Normal file
@ -0,0 +1,106 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# CamemBERT
|
||||
|
||||
## Overview
|
||||
|
||||
The CamemBERT model was proposed in [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by
|
||||
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la
|
||||
Clergerie, Djamé Seddah, and Benoît Sagot. It is based on Facebook's RoBERTa model released in 2019. It is a model
|
||||
trained on 138GB of French text.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available
|
||||
models have either been trained on English data or on the concatenation of data in multiple languages. This makes
|
||||
practical use of such models --in all languages except English-- very limited. Aiming to address this issue for French,
|
||||
we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the
|
||||
performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging,
|
||||
dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art
|
||||
for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and
|
||||
downstream applications for French NLP.*
|
||||
|
||||
Tips:
|
||||
|
||||
- This implementation is the same as RoBERTa. Refer to the [documentation of RoBERTa](roberta) for usage examples
|
||||
as well as the information relative to the inputs and outputs.
|
||||
|
||||
This model was contributed by [camembert](https://huggingface.co/camembert). The original code can be found [here](https://camembert-model.fr/).
|
||||
|
||||
## CamembertConfig
|
||||
|
||||
[[autodoc]] CamembertConfig
|
||||
|
||||
## CamembertTokenizer
|
||||
|
||||
[[autodoc]] CamembertTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## CamembertTokenizerFast
|
||||
|
||||
[[autodoc]] CamembertTokenizerFast
|
||||
|
||||
## CamembertModel
|
||||
|
||||
[[autodoc]] CamembertModel
|
||||
|
||||
## CamembertForCausalLM
|
||||
|
||||
[[autodoc]] CamembertForCausalLM
|
||||
|
||||
## CamembertForMaskedLM
|
||||
|
||||
[[autodoc]] CamembertForMaskedLM
|
||||
|
||||
## CamembertForSequenceClassification
|
||||
|
||||
[[autodoc]] CamembertForSequenceClassification
|
||||
|
||||
## CamembertForMultipleChoice
|
||||
|
||||
[[autodoc]] CamembertForMultipleChoice
|
||||
|
||||
## CamembertForTokenClassification
|
||||
|
||||
[[autodoc]] CamembertForTokenClassification
|
||||
|
||||
## CamembertForQuestionAnswering
|
||||
|
||||
[[autodoc]] CamembertForQuestionAnswering
|
||||
|
||||
## TFCamembertModel
|
||||
|
||||
[[autodoc]] TFCamembertModel
|
||||
|
||||
## TFCamembertForMaskedLM
|
||||
|
||||
[[autodoc]] TFCamembertForMaskedLM
|
||||
|
||||
## TFCamembertForSequenceClassification
|
||||
|
||||
[[autodoc]] TFCamembertForSequenceClassification
|
||||
|
||||
## TFCamembertForMultipleChoice
|
||||
|
||||
[[autodoc]] TFCamembertForMultipleChoice
|
||||
|
||||
## TFCamembertForTokenClassification
|
||||
|
||||
[[autodoc]] TFCamembertForTokenClassification
|
||||
|
||||
## TFCamembertForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFCamembertForQuestionAnswering
|
@ -1,153 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
CamemBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The CamemBERT model was proposed in `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`__ by
|
||||
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la
|
||||
Clergerie, Djamé Seddah, and Benoît Sagot. It is based on Facebook's RoBERTa model released in 2019. It is a model
|
||||
trained on 138GB of French text.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available
|
||||
models have either been trained on English data or on the concatenation of data in multiple languages. This makes
|
||||
practical use of such models --in all languages except English-- very limited. Aiming to address this issue for French,
|
||||
we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the
|
||||
performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging,
|
||||
dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art
|
||||
for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and
|
||||
downstream applications for French NLP.*
|
||||
|
||||
Tips:
|
||||
|
||||
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
||||
as well as the information relative to the inputs and outputs.
|
||||
|
||||
This model was contributed by `camembert <https://huggingface.co/camembert>`__. The original code can be found `here
|
||||
<https://camembert-model.fr/>`__.
|
||||
|
||||
CamembertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertConfig
|
||||
:members:
|
||||
|
||||
|
||||
CamembertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
CamembertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
CamembertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertModel
|
||||
:members:
|
||||
|
||||
|
||||
CamembertForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertForCausalLM
|
||||
:members:
|
||||
|
||||
|
||||
CamembertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertForMaskedLM
|
||||
:members:
|
||||
|
||||
|
||||
CamembertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertForSequenceClassification
|
||||
:members:
|
||||
|
||||
|
||||
CamembertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertForMultipleChoice
|
||||
:members:
|
||||
|
||||
|
||||
CamembertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertForTokenClassification
|
||||
:members:
|
||||
|
||||
|
||||
CamembertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CamembertForQuestionAnswering
|
||||
:members:
|
||||
|
||||
|
||||
TFCamembertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCamembertModel
|
||||
:members:
|
||||
|
||||
|
||||
TFCamembertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCamembertForMaskedLM
|
||||
:members:
|
||||
|
||||
|
||||
TFCamembertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCamembertForSequenceClassification
|
||||
:members:
|
||||
|
||||
|
||||
TFCamembertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCamembertForMultipleChoice
|
||||
:members:
|
||||
|
||||
|
||||
TFCamembertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCamembertForTokenClassification
|
||||
:members:
|
||||
|
||||
|
||||
TFCamembertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCamembertForQuestionAnswering
|
||||
:members:
|
133
docs/source/model_doc/canine.mdx
Normal file
133
docs/source/model_doc/canine.mdx
Normal file
@ -0,0 +1,133 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# CANINE
|
||||
|
||||
## Overview
|
||||
|
||||
The CANINE model was proposed in [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
|
||||
Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. It's
|
||||
among the first papers that trains a Transformer without using an explicit tokenization step (such as Byte Pair
|
||||
Encoding (BPE), WordPiece or SentencePiece). Instead, the model is trained directly at a Unicode character-level.
|
||||
Training at a character-level inevitably comes with a longer sequence length, which CANINE solves with an efficient
|
||||
downsampling strategy, before applying a deep Transformer encoder.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models
|
||||
still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword
|
||||
lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all
|
||||
languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE,
|
||||
a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a
|
||||
pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias.
|
||||
To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input
|
||||
sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by
|
||||
2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.*
|
||||
|
||||
Tips:
|
||||
|
||||
- CANINE uses no less than 3 Transformer encoders internally: 2 "shallow" encoders (which only consist of a single
|
||||
layer) and 1 "deep" encoder (which is a regular BERT encoder). First, a "shallow" encoder is used to contextualize
|
||||
the character embeddings, using local attention. Next, after downsampling, a "deep" encoder is applied. Finally,
|
||||
after upsampling, a "shallow" encoder is used to create the final character embeddings. Details regarding up- and
|
||||
downsampling can be found in the paper.
|
||||
- CANINE uses a max sequence length of 2048 characters by default. One can use [`CanineTokenizer`]
|
||||
to prepare text for the model.
|
||||
- Classification can be done by placing a linear layer on top of the final hidden state of the special [CLS] token
|
||||
(which has a predefined Unicode code point). For token classification tasks however, the downsampled sequence of
|
||||
tokens needs to be upsampled again to match the length of the original character sequence (which is 2048). The
|
||||
details for this can be found in the paper.
|
||||
- Models:
|
||||
|
||||
- [google/canine-c](https://huggingface.co/google/canine-c): Pre-trained with autoregressive character loss,
|
||||
12-layer, 768-hidden, 12-heads, 121M parameters (size ~500 MB).
|
||||
- [google/canine-s](https://huggingface.co/google/canine-s): Pre-trained with subword loss, 12-layer,
|
||||
768-hidden, 12-heads, 121M parameters (size ~500 MB).
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/google-research/language/tree/master/language/canine).
|
||||
|
||||
|
||||
### Example
|
||||
|
||||
CANINE works on raw characters, so it can be used without a tokenizer:
|
||||
|
||||
```python
|
||||
>>> from transformers import CanineModel
|
||||
>>> import torch
|
||||
|
||||
>>> model = CanineModel.from_pretrained('google/canine-c') # model pre-trained with autoregressive character loss
|
||||
|
||||
>>> text = "hello world"
|
||||
>>> # use Python's built-in ord() function to turn each character into its unicode code point id
|
||||
>>> input_ids = torch.tensor([[ord(char) for char in text]])
|
||||
|
||||
>>> outputs = model(input_ids) # forward pass
|
||||
>>> pooled_output = outputs.pooler_output
|
||||
>>> sequence_output = outputs.last_hidden_state
|
||||
```
|
||||
|
||||
For batched inference and training, it is however recommended to make use of the tokenizer (to pad/truncate all
|
||||
sequences to the same length):
|
||||
|
||||
```python
|
||||
>>> from transformers import CanineTokenizer, CanineModel
|
||||
|
||||
>>> model = CanineModel.from_pretrained('google/canine-c')
|
||||
>>> tokenizer = CanineTokenizer.from_pretrained('google/canine-c')
|
||||
|
||||
>>> inputs = ["Life is like a box of chocolates.", "You never know what you gonna get."]
|
||||
>>> encoding = tokenizer(inputs, padding="longest", truncation=True, return_tensors="pt")
|
||||
|
||||
>>> outputs = model(**encoding) # forward pass
|
||||
>>> pooled_output = outputs.pooler_output
|
||||
>>> sequence_output = outputs.last_hidden_state
|
||||
```
|
||||
|
||||
## CANINE specific outputs
|
||||
|
||||
[[autodoc]] models.canine.modeling_canine.CanineModelOutputWithPooling
|
||||
|
||||
## CanineConfig
|
||||
|
||||
[[autodoc]] CanineConfig
|
||||
|
||||
## CanineTokenizer
|
||||
|
||||
[[autodoc]] CanineTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
|
||||
## CanineModel
|
||||
|
||||
[[autodoc]] CanineModel
|
||||
- forward
|
||||
|
||||
## CanineForSequenceClassification
|
||||
|
||||
[[autodoc]] CanineForSequenceClassification
|
||||
- forward
|
||||
|
||||
## CanineForMultipleChoice
|
||||
|
||||
[[autodoc]] CanineForMultipleChoice
|
||||
- forward
|
||||
|
||||
## CanineForTokenClassification
|
||||
|
||||
[[autodoc]] CanineForTokenClassification
|
||||
- forward
|
||||
|
||||
## CanineForQuestionAnswering
|
||||
|
||||
[[autodoc]] CanineForQuestionAnswering
|
||||
- forward
|
@ -1,155 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
CANINE
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The CANINE model was proposed in `CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
|
||||
Representation <https://arxiv.org/abs/2103.06874>`__ by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. It's
|
||||
among the first papers that trains a Transformer without using an explicit tokenization step (such as Byte Pair
|
||||
Encoding (BPE), WordPiece or SentencePiece). Instead, the model is trained directly at a Unicode character-level.
|
||||
Training at a character-level inevitably comes with a longer sequence length, which CANINE solves with an efficient
|
||||
downsampling strategy, before applying a deep Transformer encoder.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models
|
||||
still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword
|
||||
lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all
|
||||
languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE,
|
||||
a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a
|
||||
pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias.
|
||||
To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input
|
||||
sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by
|
||||
2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.*
|
||||
|
||||
Tips:
|
||||
|
||||
- CANINE uses no less than 3 Transformer encoders internally: 2 "shallow" encoders (which only consist of a single
|
||||
layer) and 1 "deep" encoder (which is a regular BERT encoder). First, a "shallow" encoder is used to contextualize
|
||||
the character embeddings, using local attention. Next, after downsampling, a "deep" encoder is applied. Finally,
|
||||
after upsampling, a "shallow" encoder is used to create the final character embeddings. Details regarding up- and
|
||||
downsampling can be found in the paper.
|
||||
- CANINE uses a max sequence length of 2048 characters by default. One can use :class:`~transformers.CanineTokenizer`
|
||||
to prepare text for the model.
|
||||
- Classification can be done by placing a linear layer on top of the final hidden state of the special [CLS] token
|
||||
(which has a predefined Unicode code point). For token classification tasks however, the downsampled sequence of
|
||||
tokens needs to be upsampled again to match the length of the original character sequence (which is 2048). The
|
||||
details for this can be found in the paper.
|
||||
- Models:
|
||||
|
||||
- `google/canine-c <https://huggingface.co/google/canine-c>`__: Pre-trained with autoregressive character loss,
|
||||
12-layer, 768-hidden, 12-heads, 121M parameters (size ~500 MB).
|
||||
- `google/canine-s <https://huggingface.co/google/canine-s>`__: Pre-trained with subword loss, 12-layer,
|
||||
768-hidden, 12-heads, 121M parameters (size ~500 MB).
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
|
||||
<https://github.com/google-research/language/tree/master/language/canine>`__.
|
||||
|
||||
|
||||
Example
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
CANINE works on raw characters, so it can be used without a tokenizer:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import CanineModel
|
||||
import torch
|
||||
|
||||
model = CanineModel.from_pretrained('google/canine-c') # model pre-trained with autoregressive character loss
|
||||
|
||||
text = "hello world"
|
||||
# use Python's built-in ord() function to turn each character into its unicode code point id
|
||||
input_ids = torch.tensor([[ord(char) for char in text]])
|
||||
|
||||
outputs = model(input_ids) # forward pass
|
||||
pooled_output = outputs.pooler_output
|
||||
sequence_output = outputs.last_hidden_state
|
||||
|
||||
|
||||
For batched inference and training, it is however recommended to make use of the tokenizer (to pad/truncate all
|
||||
sequences to the same length):
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import CanineTokenizer, CanineModel
|
||||
|
||||
model = CanineModel.from_pretrained('google/canine-c')
|
||||
tokenizer = CanineTokenizer.from_pretrained('google/canine-c')
|
||||
|
||||
inputs = ["Life is like a box of chocolates.", "You never know what you gonna get."]
|
||||
encoding = tokenizer(inputs, padding="longest", truncation=True, return_tensors="pt")
|
||||
|
||||
outputs = model(**encoding) # forward pass
|
||||
pooled_output = outputs.pooler_output
|
||||
sequence_output = outputs.last_hidden_state
|
||||
|
||||
|
||||
CANINE specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.canine.modeling_canine.CanineModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
|
||||
CanineConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CanineConfig
|
||||
:members:
|
||||
|
||||
|
||||
CanineTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CanineTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences
|
||||
|
||||
|
||||
CanineModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CanineModel
|
||||
:members: forward
|
||||
|
||||
|
||||
CanineForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CanineForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
CanineForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CanineForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
CanineForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CanineForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
CanineForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CanineForQuestionAnswering
|
||||
:members: forward
|
143
docs/source/model_doc/clip.mdx
Normal file
143
docs/source/model_doc/clip.mdx
Normal file
@ -0,0 +1,143 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# CLIP
|
||||
|
||||
## Overview
|
||||
|
||||
The CLIP model was proposed in [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
|
||||
Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. CLIP
|
||||
(Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be
|
||||
instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing
|
||||
for the task, similarly to the zero-shot capabilities of GPT-2 and 3.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This
|
||||
restricted form of supervision limits their generality and usability since additional labeled data is needed to specify
|
||||
any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a
|
||||
much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes
|
||||
with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400
|
||||
million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference
|
||||
learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study
|
||||
the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks
|
||||
such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The
|
||||
model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need
|
||||
for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot
|
||||
without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained
|
||||
model weights at this https URL.*
|
||||
|
||||
## Usage
|
||||
|
||||
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image
|
||||
classification. CLIP uses a ViT like transformer to get visual features and a causal language model to get the text
|
||||
features. Both the text and visual features are then projected to a latent space with identical dimension. The dot
|
||||
product between the projected image and text features is then used as a similar score.
|
||||
|
||||
To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
|
||||
which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
|
||||
also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
|
||||
The [`CLIPFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model.
|
||||
|
||||
The [`CLIPTokenizer`] is used to encode the text. The [`CLIPProcessor`] wraps
|
||||
[`CLIPFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both
|
||||
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
|
||||
[`CLIPProcessor`] and [`CLIPModel`].
|
||||
|
||||
|
||||
```python
|
||||
>>> from PIL import Image
|
||||
>>> import requests
|
||||
|
||||
>>> from transformers import CLIPProcessor, CLIPModel
|
||||
|
||||
>>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
|
||||
>>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
|
||||
|
||||
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
>>> inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
|
||||
|
||||
>>> outputs = model(**inputs)
|
||||
>>> logits_per_image = outputs.logits_per_image # this is the image-text similarity score
|
||||
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
||||
```
|
||||
|
||||
This model was contributed by [valhalla](https://huggingface.co/valhalla). The original code can be found [here](https://github.com/openai/CLIP).
|
||||
|
||||
## CLIPConfig
|
||||
|
||||
[[autodoc]] CLIPConfig
|
||||
- from_text_vision_configs
|
||||
|
||||
## CLIPTextConfig
|
||||
|
||||
[[autodoc]] CLIPTextConfig
|
||||
|
||||
## CLIPVisionConfig
|
||||
|
||||
[[autodoc]] CLIPVisionConfig
|
||||
|
||||
## CLIPTokenizer
|
||||
|
||||
[[autodoc]] CLIPTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## CLIPTokenizerFast
|
||||
|
||||
[[autodoc]] CLIPTokenizerFast
|
||||
|
||||
## CLIPFeatureExtractor
|
||||
|
||||
[[autodoc]] CLIPFeatureExtractor
|
||||
|
||||
## CLIPProcessor
|
||||
|
||||
[[autodoc]] CLIPProcessor
|
||||
|
||||
## CLIPModel
|
||||
|
||||
[[autodoc]] CLIPModel
|
||||
- forward
|
||||
- get_text_features
|
||||
- get_image_features
|
||||
|
||||
## CLIPTextModel
|
||||
|
||||
[[autodoc]] CLIPTextModel
|
||||
- forward
|
||||
|
||||
## CLIPVisionModel
|
||||
|
||||
[[autodoc]] CLIPVisionModel
|
||||
- forward
|
||||
|
||||
## FlaxCLIPModel
|
||||
|
||||
[[autodoc]] FlaxCLIPModel
|
||||
- __call__
|
||||
- get_text_features
|
||||
- get_image_features
|
||||
|
||||
## FlaxCLIPTextModel
|
||||
|
||||
[[autodoc]] FlaxCLIPTextModel
|
||||
- __call__
|
||||
|
||||
## FlaxCLIPVisionModel
|
||||
|
||||
[[autodoc]] FlaxCLIPVisionModel
|
||||
- __call__
|
@ -1,174 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
CLIP
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The CLIP model was proposed in `Learning Transferable Visual Models From Natural Language Supervision
|
||||
<https://arxiv.org/abs/2103.00020>`__ by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
|
||||
Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. CLIP
|
||||
(Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be
|
||||
instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing
|
||||
for the task, similarly to the zero-shot capabilities of GPT-2 and 3.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This
|
||||
restricted form of supervision limits their generality and usability since additional labeled data is needed to specify
|
||||
any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a
|
||||
much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes
|
||||
with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400
|
||||
million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference
|
||||
learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study
|
||||
the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks
|
||||
such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The
|
||||
model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need
|
||||
for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot
|
||||
without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained
|
||||
model weights at this https URL.*
|
||||
|
||||
Usage
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image
|
||||
classification. CLIP uses a ViT like transformer to get visual features and a causal language model to get the text
|
||||
features. Both the text and visual features are then projected to a latent space with identical dimension. The dot
|
||||
product between the projected image and text features is then used as a similar score.
|
||||
|
||||
To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
|
||||
which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
|
||||
also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
|
||||
The :class:`~transformers.CLIPFeatureExtractor` can be used to resize (or rescale) and normalize images for the model.
|
||||
|
||||
The :class:`~transformers.CLIPTokenizer` is used to encode the text. The :class:`~transformers.CLIPProcessor` wraps
|
||||
:class:`~transformers.CLIPFeatureExtractor` and :class:`~transformers.CLIPTokenizer` into a single instance to both
|
||||
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
|
||||
:class:`~transformers.CLIPProcessor` and :class:`~transformers.CLIPModel`.
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from PIL import Image
|
||||
>>> import requests
|
||||
|
||||
>>> from transformers import CLIPProcessor, CLIPModel
|
||||
|
||||
>>> model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
|
||||
>>> processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
|
||||
|
||||
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
>>> inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
|
||||
|
||||
>>> outputs = model(**inputs)
|
||||
>>> logits_per_image = outputs.logits_per_image # this is the image-text similarity score
|
||||
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
||||
|
||||
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The original code can be found `here
|
||||
<https://github.com/openai/CLIP>`__.
|
||||
|
||||
CLIPConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPConfig
|
||||
:members: from_text_vision_configs
|
||||
|
||||
|
||||
CLIPTextConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPTextConfig
|
||||
:members:
|
||||
|
||||
|
||||
CLIPVisionConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPVisionConfig
|
||||
:members:
|
||||
|
||||
|
||||
|
||||
CLIPTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
CLIPTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
CLIPFeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPFeatureExtractor
|
||||
:members:
|
||||
|
||||
|
||||
CLIPProcessor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPProcessor
|
||||
:members:
|
||||
|
||||
|
||||
|
||||
CLIPModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPModel
|
||||
:members: forward, get_text_features, get_image_features
|
||||
|
||||
|
||||
CLIPTextModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPTextModel
|
||||
:members: forward
|
||||
|
||||
|
||||
CLIPVisionModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CLIPVisionModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaxCLIPModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxCLIPModel
|
||||
:members: __call__, get_text_features, get_image_features
|
||||
|
||||
|
||||
FlaxCLIPTextModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxCLIPTextModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxCLIPVisionModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxCLIPVisionModel
|
||||
:members: __call__
|
113
docs/source/model_doc/convbert.mdx
Normal file
113
docs/source/model_doc/convbert.mdx
Normal file
@ -0,0 +1,113 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ConvBERT
|
||||
|
||||
## Overview
|
||||
|
||||
The ConvBERT model was proposed in [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng
|
||||
Yan.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pre-trained language models like BERT and its variants have recently achieved impressive performance in various
|
||||
natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers
|
||||
large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for
|
||||
generating the attention map from a global perspective, we observe some heads only need to learn local dependencies,
|
||||
which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to
|
||||
replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the
|
||||
rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context
|
||||
learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that
|
||||
ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and
|
||||
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
|
||||
using less than 1/4 training cost. Code and pre-trained models will be released.*
|
||||
|
||||
ConvBERT training tips are similar to those of BERT.
|
||||
|
||||
This model was contributed by [abhishek](https://huggingface.co/abhishek). The original implementation can be found
|
||||
here: https://github.com/yitu-opensource/ConvBert
|
||||
|
||||
## ConvBertConfig
|
||||
|
||||
[[autodoc]] ConvBertConfig
|
||||
|
||||
## ConvBertTokenizer
|
||||
|
||||
[[autodoc]] ConvBertTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## ConvBertTokenizerFast
|
||||
|
||||
[[autodoc]] ConvBertTokenizerFast
|
||||
|
||||
## ConvBertModel
|
||||
|
||||
[[autodoc]] ConvBertModel
|
||||
- forward
|
||||
|
||||
## ConvBertForMaskedLM
|
||||
|
||||
[[autodoc]] ConvBertForMaskedLM
|
||||
- forward
|
||||
|
||||
## ConvBertForSequenceClassification
|
||||
|
||||
[[autodoc]] ConvBertForSequenceClassification
|
||||
- forward
|
||||
|
||||
## ConvBertForMultipleChoice
|
||||
|
||||
[[autodoc]] ConvBertForMultipleChoice
|
||||
- forward
|
||||
|
||||
## ConvBertForTokenClassification
|
||||
|
||||
[[autodoc]] ConvBertForTokenClassification
|
||||
- forward
|
||||
|
||||
## ConvBertForQuestionAnswering
|
||||
|
||||
[[autodoc]] ConvBertForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFConvBertModel
|
||||
|
||||
[[autodoc]] TFConvBertModel
|
||||
- call
|
||||
|
||||
## TFConvBertForMaskedLM
|
||||
|
||||
[[autodoc]] TFConvBertForMaskedLM
|
||||
- call
|
||||
|
||||
## TFConvBertForSequenceClassification
|
||||
|
||||
[[autodoc]] TFConvBertForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFConvBertForMultipleChoice
|
||||
|
||||
[[autodoc]] TFConvBertForMultipleChoice
|
||||
- call
|
||||
|
||||
## TFConvBertForTokenClassification
|
||||
|
||||
[[autodoc]] TFConvBertForTokenClassification
|
||||
- call
|
||||
|
||||
## TFConvBertForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFConvBertForQuestionAnswering
|
||||
- call
|
@ -1,145 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
ConvBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ConvBERT model was proposed in `ConvBERT: Improving BERT with Span-based Dynamic Convolution
|
||||
<https://arxiv.org/abs/2008.02496>`__ by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng
|
||||
Yan.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pre-trained language models like BERT and its variants have recently achieved impressive performance in various
|
||||
natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers
|
||||
large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for
|
||||
generating the attention map from a global perspective, we observe some heads only need to learn local dependencies,
|
||||
which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to
|
||||
replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the
|
||||
rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context
|
||||
learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that
|
||||
ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and
|
||||
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
|
||||
using less than 1/4 training cost. Code and pre-trained models will be released.*
|
||||
|
||||
ConvBERT training tips are similar to those of BERT.
|
||||
|
||||
This model was contributed by `abhishek <https://huggingface.co/abhishek>`__. The original implementation can be found
|
||||
here: https://github.com/yitu-opensource/ConvBert
|
||||
|
||||
ConvBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertConfig
|
||||
:members:
|
||||
|
||||
|
||||
ConvBertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
ConvBertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
ConvBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
ConvBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
ConvBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
ConvBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
ConvBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
ConvBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ConvBertForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFConvBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFConvBertModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFConvBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFConvBertForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFConvBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFConvBertForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFConvBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFConvBertForMultipleChoice
|
||||
:members: call
|
||||
|
||||
|
||||
TFConvBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFConvBertForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFConvBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFConvBertForQuestionAnswering
|
||||
:members: call
|
@ -1,23 +1,20 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
CPM
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
# CPM
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
## Overview
|
||||
|
||||
The CPM model was proposed in `CPM: A Large-scale Generative Chinese Pre-trained Language Model
|
||||
<https://arxiv.org/abs/2012.00413>`__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin,
|
||||
The CPM model was proposed in [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin,
|
||||
Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen,
|
||||
Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
|
||||
@ -33,13 +30,11 @@ language model, which could facilitate several downstream Chinese NLP tasks, suc
|
||||
cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many
|
||||
NLP tasks in the settings of few-shot (even zero-shot) learning.*
|
||||
|
||||
This model was contributed by `canwenxu <https://huggingface.co/canwenxu>`__. The original implementation can be found
|
||||
This model was contributed by [canwenxu](https://huggingface.co/canwenxu). The original implementation can be found
|
||||
here: https://github.com/TsinghuaAI/CPM-Generate
|
||||
|
||||
Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.
|
||||
|
||||
CpmTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
## CpmTokenizer
|
||||
|
||||
.. autoclass:: transformers.CpmTokenizer
|
||||
:members:
|
||||
[[autodoc]] CpmTokenizer
|
87
docs/source/model_doc/ctrl.mdx
Normal file
87
docs/source/model_doc/ctrl.mdx
Normal file
@ -0,0 +1,87 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# CTRL
|
||||
|
||||
## Overview
|
||||
|
||||
CTRL model was proposed in [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and
|
||||
Richard Socher. It's a causal (unidirectional) transformer pre-trained using language modeling on a very large corpus
|
||||
of ~140 GB of text data with the first token reserved as a control code (such as Links, Books, Wikipedia etc.).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Large-scale language models show promising text generation capabilities, but users cannot easily control particular
|
||||
aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model,
|
||||
trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were
|
||||
derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while
|
||||
providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the
|
||||
training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data
|
||||
via model-based source attribution.*
|
||||
|
||||
Tips:
|
||||
|
||||
- CTRL makes use of control codes to generate text: it requires generations to be started by certain words, sentences
|
||||
or links to generate coherent text. Refer to the [original implementation](https://github.com/salesforce/ctrl) for
|
||||
more information.
|
||||
- CTRL is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- CTRL was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows CTRL to generate syntactically coherent text as it can be
|
||||
observed in the *run_generation.py* example script.
|
||||
- The PyTorch models can take the *past* as input, which is the previously computed key/value attention pairs. Using
|
||||
this *past* value prevents the model from re-computing pre-computed values in the context of text generation. See
|
||||
[reusing the past in generative models](../quickstart#using-the-past) for more information on the usage of
|
||||
this argument.
|
||||
|
||||
This model was contributed by [keskarnitishr](https://huggingface.co/keskarnitishr). The original code can be found
|
||||
[here](https://github.com/salesforce/ctrl).
|
||||
|
||||
|
||||
## CTRLConfig
|
||||
|
||||
[[autodoc]] CTRLConfig
|
||||
|
||||
## CTRLTokenizer
|
||||
|
||||
[[autodoc]] CTRLTokenizer
|
||||
- save_vocabulary
|
||||
|
||||
## CTRLModel
|
||||
|
||||
[[autodoc]] CTRLModel
|
||||
- forward
|
||||
|
||||
## CTRLLMHeadModel
|
||||
|
||||
[[autodoc]] CTRLLMHeadModel
|
||||
- forward
|
||||
|
||||
## CTRLForSequenceClassification
|
||||
|
||||
[[autodoc]] CTRLForSequenceClassification
|
||||
- forward
|
||||
|
||||
## TFCTRLModel
|
||||
|
||||
[[autodoc]] TFCTRLModel
|
||||
- call
|
||||
|
||||
## TFCTRLLMHeadModel
|
||||
|
||||
[[autodoc]] TFCTRLLMHeadModel
|
||||
- call
|
||||
|
||||
## TFCTRLForSequenceClassification
|
||||
|
||||
[[autodoc]] TFCTRLForSequenceClassification
|
||||
- call
|
@ -1,105 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
CTRL
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation
|
||||
<https://arxiv.org/abs/1909.05858>`_ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and
|
||||
Richard Socher. It's a causal (unidirectional) transformer pre-trained using language modeling on a very large corpus
|
||||
of ~140 GB of text data with the first token reserved as a control code (such as Links, Books, Wikipedia etc.).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Large-scale language models show promising text generation capabilities, but users cannot easily control particular
|
||||
aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model,
|
||||
trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were
|
||||
derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while
|
||||
providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the
|
||||
training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data
|
||||
via model-based source attribution.*
|
||||
|
||||
Tips:
|
||||
|
||||
- CTRL makes use of control codes to generate text: it requires generations to be started by certain words, sentences
|
||||
or links to generate coherent text. Refer to the `original implementation <https://github.com/salesforce/ctrl>`__ for
|
||||
more information.
|
||||
- CTRL is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- CTRL was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows CTRL to generate syntactically coherent text as it can be
|
||||
observed in the `run_generation.py` example script.
|
||||
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
|
||||
this `past` value prevents the model from re-computing pre-computed values in the context of text generation. See
|
||||
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
|
||||
this argument.
|
||||
|
||||
This model was contributed by `keskarnitishr <https://huggingface.co/keskarnitishr>`__. The original code can be found
|
||||
`here <https://github.com/salesforce/ctrl>`__.
|
||||
|
||||
|
||||
CTRLConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CTRLConfig
|
||||
:members:
|
||||
|
||||
|
||||
CTRLTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CTRLTokenizer
|
||||
:members: save_vocabulary
|
||||
|
||||
|
||||
CTRLModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CTRLModel
|
||||
:members: forward
|
||||
|
||||
|
||||
CTRLLMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CTRLLMHeadModel
|
||||
:members: forward
|
||||
|
||||
|
||||
CTRLForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.CTRLForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
TFCTRLModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCTRLModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFCTRLLMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCTRLLMHeadModel
|
||||
:members: call
|
||||
|
||||
TFCTRLForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFCTRLForSequenceClassification
|
||||
:members: call
|
117
docs/source/model_doc/deberta.mdx
Normal file
117
docs/source/model_doc/deberta.mdx
Normal file
@ -0,0 +1,117 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# DeBERTa
|
||||
|
||||
## Overview
|
||||
|
||||
The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
|
||||
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
|
||||
|
||||
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
|
||||
RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
|
||||
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
|
||||
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
|
||||
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
|
||||
position, respectively, and the attention weights among words are computed using disentangled matrices on their
|
||||
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
|
||||
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
|
||||
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
|
||||
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
|
||||
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
|
||||
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
||||
|
||||
|
||||
This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was
|
||||
contributed by [kamalkraj](https://huggingface.co/kamalkraj) . The original code can be found [here](https://github.com/microsoft/DeBERTa).
|
||||
|
||||
|
||||
## DebertaConfig
|
||||
|
||||
[[autodoc]] DebertaConfig
|
||||
|
||||
## DebertaTokenizer
|
||||
|
||||
[[autodoc]] DebertaTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## DebertaTokenizerFast
|
||||
|
||||
[[autodoc]] DebertaTokenizerFast
|
||||
- build_inputs_with_special_tokens
|
||||
- create_token_type_ids_from_sequences
|
||||
|
||||
## DebertaModel
|
||||
|
||||
[[autodoc]] DebertaModel
|
||||
- forward
|
||||
|
||||
## DebertaPreTrainedModel
|
||||
|
||||
[[autodoc]] DebertaPreTrainedModel
|
||||
|
||||
## DebertaForMaskedLM
|
||||
|
||||
[[autodoc]] DebertaForMaskedLM
|
||||
- forward
|
||||
|
||||
## DebertaForSequenceClassification
|
||||
|
||||
[[autodoc]] DebertaForSequenceClassification
|
||||
- forward
|
||||
|
||||
## DebertaForTokenClassification
|
||||
|
||||
[[autodoc]] DebertaForTokenClassification
|
||||
- forward
|
||||
|
||||
## DebertaForQuestionAnswering
|
||||
|
||||
[[autodoc]] DebertaForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFDebertaModel
|
||||
|
||||
[[autodoc]] TFDebertaModel
|
||||
- call
|
||||
|
||||
## TFDebertaPreTrainedModel
|
||||
|
||||
[[autodoc]] TFDebertaPreTrainedModel
|
||||
- call
|
||||
|
||||
## TFDebertaForMaskedLM
|
||||
|
||||
[[autodoc]] TFDebertaForMaskedLM
|
||||
- call
|
||||
|
||||
## TFDebertaForSequenceClassification
|
||||
|
||||
[[autodoc]] TFDebertaForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFDebertaForTokenClassification
|
||||
|
||||
[[autodoc]] TFDebertaForTokenClassification
|
||||
- call
|
||||
|
||||
## TFDebertaForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFDebertaForQuestionAnswering
|
||||
- call
|
@ -1,148 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
DeBERTa
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The DeBERTa model was proposed in `DeBERTa: Decoding-enhanced BERT with Disentangled Attention
|
||||
<https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
|
||||
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
|
||||
|
||||
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
|
||||
RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
|
||||
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
|
||||
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
|
||||
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
|
||||
position, respectively, and the attention weights among words are computed using disentangled matrices on their
|
||||
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
|
||||
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
|
||||
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
|
||||
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
|
||||
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
|
||||
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
||||
|
||||
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. This model TF 2.0 implementation was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__ . The original code can be found `here
|
||||
<https://github.com/microsoft/DeBERTa>`__.
|
||||
|
||||
|
||||
DebertaConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaConfig
|
||||
:members:
|
||||
|
||||
|
||||
DebertaTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
DebertaTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaTokenizerFast
|
||||
:members: build_inputs_with_special_tokens, create_token_type_ids_from_sequences
|
||||
|
||||
|
||||
DebertaModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaModel
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaPreTrainedModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaPreTrainedModel
|
||||
:members:
|
||||
|
||||
|
||||
DebertaForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFDebertaModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaPreTrainedModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaPreTrainedModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForQuestionAnswering
|
||||
:members: call
|
132
docs/source/model_doc/deberta_v2.mdx
Normal file
132
docs/source/model_doc/deberta_v2.mdx
Normal file
@ -0,0 +1,132 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# DeBERTa-v2
|
||||
|
||||
## Overview
|
||||
|
||||
The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
|
||||
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
|
||||
|
||||
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
|
||||
RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
|
||||
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
|
||||
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
|
||||
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
|
||||
position, respectively, and the attention weights among words are computed using disentangled matrices on their
|
||||
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
|
||||
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
|
||||
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
|
||||
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
|
||||
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
|
||||
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
||||
|
||||
|
||||
The following information is visible directly on the [original implementation
|
||||
repository](https://github.com/microsoft/DeBERTa). DeBERTa v2 is the second version of the DeBERTa model. It includes
|
||||
the 1.5B model used for the SuperGLUE single-model submission and achieving 89.9, versus human baseline 89.8. You can
|
||||
find more details about this submission in the authors'
|
||||
[blog](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/)
|
||||
|
||||
New in v2:
|
||||
|
||||
- **Vocabulary** In v2 the tokenizer is changed to use a new vocabulary of size 128K built from the training data.
|
||||
Instead of a GPT2-based tokenizer, the tokenizer is now
|
||||
[sentencepiece-based](https://github.com/google/sentencepiece) tokenizer.
|
||||
- **nGiE(nGram Induced Input Encoding)** The DeBERTa-v2 model uses an additional convolution layer aside with the first
|
||||
transformer layer to better learn the local dependency of input tokens.
|
||||
- **Sharing position projection matrix with content projection matrix in attention layer** Based on previous
|
||||
experiments, this can save parameters without affecting the performance.
|
||||
- **Apply bucket to encode relative positions** The DeBERTa-v2 model uses log bucket to encode relative positions
|
||||
similar to T5.
|
||||
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
|
||||
performance of downstream tasks.
|
||||
|
||||
This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was
|
||||
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/DeBERTa).
|
||||
|
||||
|
||||
## DebertaV2Config
|
||||
|
||||
[[autodoc]] DebertaV2Config
|
||||
|
||||
## DebertaV2Tokenizer
|
||||
|
||||
[[autodoc]] DebertaV2Tokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## DebertaV2Model
|
||||
|
||||
[[autodoc]] DebertaV2Model
|
||||
- forward
|
||||
|
||||
## DebertaV2PreTrainedModel
|
||||
|
||||
[[autodoc]] DebertaV2PreTrainedModel
|
||||
- forward
|
||||
|
||||
## DebertaV2ForMaskedLM
|
||||
|
||||
[[autodoc]] DebertaV2ForMaskedLM
|
||||
- forward
|
||||
|
||||
## DebertaV2ForSequenceClassification
|
||||
|
||||
[[autodoc]] DebertaV2ForSequenceClassification
|
||||
- forward
|
||||
|
||||
## DebertaV2ForTokenClassification
|
||||
|
||||
[[autodoc]] DebertaV2ForTokenClassification
|
||||
- forward
|
||||
|
||||
## DebertaV2ForQuestionAnswering
|
||||
|
||||
[[autodoc]] DebertaV2ForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFDebertaV2Model
|
||||
|
||||
[[autodoc]] TFDebertaV2Model
|
||||
- call
|
||||
|
||||
## TFDebertaV2PreTrainedModel
|
||||
|
||||
[[autodoc]] TFDebertaV2PreTrainedModel
|
||||
- call
|
||||
|
||||
## TFDebertaV2ForMaskedLM
|
||||
|
||||
[[autodoc]] TFDebertaV2ForMaskedLM
|
||||
- call
|
||||
|
||||
## TFDebertaV2ForSequenceClassification
|
||||
|
||||
[[autodoc]] TFDebertaV2ForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFDebertaV2ForTokenClassification
|
||||
|
||||
[[autodoc]] TFDebertaV2ForTokenClassification
|
||||
- call
|
||||
|
||||
## TFDebertaV2ForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFDebertaV2ForQuestionAnswering
|
||||
- call
|
@ -1,162 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
DeBERTa-v2
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The DeBERTa model was proposed in `DeBERTa: Decoding-enhanced BERT with Disentangled Attention
|
||||
<https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
|
||||
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
|
||||
|
||||
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
|
||||
RoBERTa.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
|
||||
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
|
||||
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
|
||||
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
|
||||
position, respectively, and the attention weights among words are computed using disentangled matrices on their
|
||||
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
|
||||
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
|
||||
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
|
||||
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
|
||||
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
|
||||
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
||||
|
||||
|
||||
The following information is visible directly on the [original implementation
|
||||
repository](https://github.com/microsoft/DeBERTa). DeBERTa v2 is the second version of the DeBERTa model. It includes
|
||||
the 1.5B model used for the SuperGLUE single-model submission and achieving 89.9, versus human baseline 89.8. You can
|
||||
find more details about this submission in the authors'
|
||||
[blog](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/)
|
||||
|
||||
New in v2:
|
||||
|
||||
- **Vocabulary** In v2 the tokenizer is changed to use a new vocabulary of size 128K built from the training data.
|
||||
Instead of a GPT2-based tokenizer, the tokenizer is now
|
||||
[sentencepiece-based](https://github.com/google/sentencepiece) tokenizer.
|
||||
- **nGiE(nGram Induced Input Encoding)** The DeBERTa-v2 model uses an additional convolution layer aside with the first
|
||||
transformer layer to better learn the local dependency of input tokens.
|
||||
- **Sharing position projection matrix with content projection matrix in attention layer** Based on previous
|
||||
experiments, this can save parameters without affecting the performance.
|
||||
- **Apply bucket to encode relative positions** The DeBERTa-v2 model uses log bucket to encode relative positions
|
||||
similar to T5.
|
||||
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
|
||||
performance of downstream tasks.
|
||||
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. This model TF 2.0 implementation was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/DeBERTa>`__.
|
||||
|
||||
|
||||
DebertaV2Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2Config
|
||||
:members:
|
||||
|
||||
|
||||
DebertaV2Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2Tokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
DebertaV2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2Model
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaV2PreTrainedModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2PreTrainedModel
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaV2ForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2ForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaV2ForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2ForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaV2ForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2ForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
DebertaV2ForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DebertaV2ForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFDebertaV2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2Model
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2PreTrainedModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2PreTrainedModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForQuestionAnswering
|
||||
:members: call
|
@ -1,32 +1,28 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
DeiT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
# DeiT
|
||||
|
||||
.. note::
|
||||
<Tip>
|
||||
|
||||
This is a recently introduced model so the API hasn't been tested extensively. There may be some bugs or slight
|
||||
breaking changes to fix it in the future. If you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
|
||||
This is a recently introduced model so the API hasn't been tested extensively. There may be some bugs or slight
|
||||
breaking changes to fix it in the future. If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title).
|
||||
|
||||
</Tip>
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
## Overview
|
||||
|
||||
The DeiT model was proposed in `Training data-efficient image transformers & distillation through attention
|
||||
<https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre
|
||||
Sablayrolles, Hervé Jégou. The `Vision Transformer (ViT) <vit>`__ introduced in `Dosovitskiy et al., 2020
|
||||
<https://arxiv.org/abs/2010.11929>`__ has shown that one can match or even outperform existing convolutional neural
|
||||
The DeiT model was proposed in [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre
|
||||
Sablayrolles, Hervé Jégou. The [Vision Transformer (ViT)](vit) introduced in [Dosovitskiy et al., 2020](https://arxiv.org/abs/2010.11929) has shown that one can match or even outperform existing convolutional neural
|
||||
networks using a Transformer encoder (BERT-like). However, the ViT models introduced in that paper required training on
|
||||
expensive infrastructure for multiple weeks, using external data. DeiT (data-efficient image transformers) are more
|
||||
efficiently trained transformers for image classification, requiring far less data and far less computing resources
|
||||
@ -58,54 +54,44 @@ Tips:
|
||||
distillation head and the label predicted by the teacher). At inference time, one takes the average prediction
|
||||
between both heads as final prediction. (2) is also called "fine-tuning with distillation", because one relies on a
|
||||
teacher that has already been fine-tuned on the downstream dataset. In terms of models, (1) corresponds to
|
||||
:class:`~transformers.DeiTForImageClassification` and (2) corresponds to
|
||||
:class:`~transformers.DeiTForImageClassificationWithTeacher`.
|
||||
[`DeiTForImageClassification`] and (2) corresponds to
|
||||
[`DeiTForImageClassificationWithTeacher`].
|
||||
- Note that the authors also did try soft distillation for (2) (in which case the distillation prediction head is
|
||||
trained using KL divergence to match the softmax output of the teacher), but hard distillation gave the best results.
|
||||
- All released checkpoints were pre-trained and fine-tuned on ImageNet-1k only. No external data was used. This is in
|
||||
contrast with the original ViT model, which used external data like the JFT-300M dataset/Imagenet-21k for
|
||||
pre-training.
|
||||
- The authors of DeiT also released more efficiently trained ViT models, which you can directly plug into
|
||||
:class:`~transformers.ViTModel` or :class:`~transformers.ViTForImageClassification`. Techniques like data
|
||||
[`ViTModel`] or [`ViTForImageClassification`]. Techniques like data
|
||||
augmentation, optimization, and regularization were used in order to simulate training on a much larger dataset
|
||||
(while only using ImageNet-1k for pre-training). There are 4 variants available (in 3 different sizes):
|
||||
`facebook/deit-tiny-patch16-224`, `facebook/deit-small-patch16-224`, `facebook/deit-base-patch16-224` and
|
||||
`facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to
|
||||
*facebook/deit-tiny-patch16-224*, *facebook/deit-small-patch16-224*, *facebook/deit-base-patch16-224* and
|
||||
*facebook/deit-base-patch16-384*. Note that one should use [`DeiTFeatureExtractor`] in order to
|
||||
prepare images for the model.
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__.
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr).
|
||||
|
||||
|
||||
DeiTConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
## DeiTConfig
|
||||
|
||||
.. autoclass:: transformers.DeiTConfig
|
||||
:members:
|
||||
[[autodoc]] DeiTConfig
|
||||
|
||||
## DeiTFeatureExtractor
|
||||
|
||||
DeiTFeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
[[autodoc]] DeiTFeatureExtractor
|
||||
- __call__
|
||||
|
||||
.. autoclass:: transformers.DeiTFeatureExtractor
|
||||
:members: __call__
|
||||
## DeiTModel
|
||||
|
||||
[[autodoc]] DeiTModel
|
||||
- forward
|
||||
|
||||
DeiTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
## DeiTForImageClassification
|
||||
|
||||
.. autoclass:: transformers.DeiTModel
|
||||
:members: forward
|
||||
[[autodoc]] DeiTForImageClassification
|
||||
- forward
|
||||
|
||||
## DeiTForImageClassificationWithTeacher
|
||||
|
||||
DeiTForImageClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DeiTForImageClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
DeiTForImageClassificationWithTeacher
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DeiTForImageClassificationWithTeacher
|
||||
:members: forward
|
||||
[[autodoc]] DeiTForImageClassificationWithTeacher
|
||||
- forward
|
@ -1,23 +1,20 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
DialoGPT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
# DialoGPT
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
## Overview
|
||||
|
||||
DialoGPT was proposed in `DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
|
||||
<https://arxiv.org/abs/1911.00536>`_ by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao,
|
||||
DialoGPT was proposed in [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao,
|
||||
Jianfeng Gao, Jingjing Liu, Bill Dolan. It's a GPT2 Model trained on 147M conversation-like exchanges extracted from
|
||||
Reddit.
|
||||
|
||||
@ -37,8 +34,7 @@ Tips:
|
||||
than the left.
|
||||
- DialoGPT was trained with a causal language modeling (CLM) objective on conversational data and is therefore powerful
|
||||
at response generation in open-domain dialogue systems.
|
||||
- DialoGPT enables the user to create a chat bot in just 10 lines of code as shown on `DialoGPT's model card
|
||||
<https://huggingface.co/microsoft/DialoGPT-medium>`_.
|
||||
- DialoGPT enables the user to create a chat bot in just 10 lines of code as shown on [DialoGPT's model card](https://huggingface.co/microsoft/DialoGPT-medium).
|
||||
|
||||
Training:
|
||||
|
||||
@ -48,6 +44,6 @@ modeling. We first concatenate all dialog turns within a dialogue session into a
|
||||
sequence length), ended by the end-of-text token.* For more information please confer to the original paper.
|
||||
|
||||
|
||||
DialoGPT's architecture is based on the GPT2 model, so one can refer to :doc:`GPT2's documentation page <gpt2>`.
|
||||
DialoGPT's architecture is based on the GPT2 model, so one can refer to [GPT2's documentation page](gpt2).
|
||||
|
||||
The original code can be found `here <https://github.com/microsoft/DialoGPT>`_.
|
||||
The original code can be found [here](https://github.com/microsoft/DialoGPT).
|
149
docs/source/model_doc/distilbert.mdx
Normal file
149
docs/source/model_doc/distilbert.mdx
Normal file
@ -0,0 +1,149 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# DistilBERT
|
||||
|
||||
## Overview
|
||||
|
||||
The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a
|
||||
distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, a
|
||||
distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a
|
||||
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than
|
||||
*bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language
|
||||
understanding benchmark.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP),
|
||||
operating these large models in on-the-edge and/or under constrained computational training or inference budgets
|
||||
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation
|
||||
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger
|
||||
counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage
|
||||
knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by
|
||||
40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive
|
||||
biases learned by larger models during pretraining, we introduce a triple loss combining language modeling,
|
||||
distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we
|
||||
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device
|
||||
study.*
|
||||
|
||||
Tips:
|
||||
|
||||
- DistilBERT doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just
|
||||
separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`).
|
||||
- DistilBERT doesn't have options to select the input positions (`position_ids` input). This could be added if
|
||||
necessary though, just let us know if you need this option.
|
||||
|
||||
This model was contributed by [victorsanh](https://huggingface.co/victorsanh). This model jax version was
|
||||
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation).
|
||||
|
||||
|
||||
## DistilBertConfig
|
||||
|
||||
[[autodoc]] DistilBertConfig
|
||||
|
||||
## DistilBertTokenizer
|
||||
|
||||
[[autodoc]] DistilBertTokenizer
|
||||
|
||||
## DistilBertTokenizerFast
|
||||
|
||||
[[autodoc]] DistilBertTokenizerFast
|
||||
|
||||
## DistilBertModel
|
||||
|
||||
[[autodoc]] DistilBertModel
|
||||
- forward
|
||||
|
||||
## DistilBertForMaskedLM
|
||||
|
||||
[[autodoc]] DistilBertForMaskedLM
|
||||
- forward
|
||||
|
||||
## DistilBertForSequenceClassification
|
||||
|
||||
[[autodoc]] DistilBertForSequenceClassification
|
||||
- forward
|
||||
|
||||
## DistilBertForMultipleChoice
|
||||
|
||||
[[autodoc]] DistilBertForMultipleChoice
|
||||
- forward
|
||||
|
||||
## DistilBertForTokenClassification
|
||||
|
||||
[[autodoc]] DistilBertForTokenClassification
|
||||
- forward
|
||||
|
||||
## DistilBertForQuestionAnswering
|
||||
|
||||
[[autodoc]] DistilBertForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFDistilBertModel
|
||||
|
||||
[[autodoc]] TFDistilBertModel
|
||||
- call
|
||||
|
||||
## TFDistilBertForMaskedLM
|
||||
|
||||
[[autodoc]] TFDistilBertForMaskedLM
|
||||
- call
|
||||
|
||||
## TFDistilBertForSequenceClassification
|
||||
|
||||
[[autodoc]] TFDistilBertForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFDistilBertForMultipleChoice
|
||||
|
||||
[[autodoc]] TFDistilBertForMultipleChoice
|
||||
- call
|
||||
|
||||
## TFDistilBertForTokenClassification
|
||||
|
||||
[[autodoc]] TFDistilBertForTokenClassification
|
||||
- call
|
||||
|
||||
## TFDistilBertForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFDistilBertForQuestionAnswering
|
||||
- call
|
||||
|
||||
## FlaxDistilBertModel
|
||||
|
||||
[[autodoc]] FlaxDistilBertModel
|
||||
- __call__
|
||||
|
||||
## FlaxDistilBertForMaskedLM
|
||||
|
||||
[[autodoc]] FlaxDistilBertForMaskedLM
|
||||
- __call__
|
||||
|
||||
## FlaxDistilBertForSequenceClassification
|
||||
|
||||
[[autodoc]] FlaxDistilBertForSequenceClassification
|
||||
- __call__
|
||||
|
||||
## FlaxDistilBertForMultipleChoice
|
||||
|
||||
[[autodoc]] FlaxDistilBertForMultipleChoice
|
||||
- __call__
|
||||
|
||||
## FlaxDistilBertForTokenClassification
|
||||
|
||||
[[autodoc]] FlaxDistilBertForTokenClassification
|
||||
- __call__
|
||||
|
||||
## FlaxDistilBertForQuestionAnswering
|
||||
|
||||
[[autodoc]] FlaxDistilBertForQuestionAnswering
|
||||
- __call__
|
@ -1,197 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
DistilBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The DistilBERT model was proposed in the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a
|
||||
distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__, and the paper `DistilBERT, a
|
||||
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__. DistilBERT is a
|
||||
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than
|
||||
`bert-base-uncased`, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language
|
||||
understanding benchmark.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP),
|
||||
operating these large models in on-the-edge and/or under constrained computational training or inference budgets
|
||||
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation
|
||||
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger
|
||||
counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage
|
||||
knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by
|
||||
40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive
|
||||
biases learned by larger models during pretraining, we introduce a triple loss combining language modeling,
|
||||
distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we
|
||||
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device
|
||||
study.*
|
||||
|
||||
Tips:
|
||||
|
||||
- DistilBERT doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
|
||||
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`[SEP]`).
|
||||
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
|
||||
necessary though, just let us know if you need this option.
|
||||
|
||||
This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. This model jax version was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found :prefix_link:`here
|
||||
<examples/research_projects/distillation>`.
|
||||
|
||||
|
||||
DistilBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertConfig
|
||||
:members:
|
||||
|
||||
|
||||
DistilBertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
DistilBertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
DistilBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
DistilBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
DistilBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
DistilBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
DistilBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
DistilBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
TFDistilBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDistilBertModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFDistilBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDistilBertForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFDistilBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDistilBertForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
|
||||
TFDistilBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDistilBertForMultipleChoice
|
||||
:members: call
|
||||
|
||||
|
||||
|
||||
TFDistilBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDistilBertForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDistilBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxDistilBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForMaskedLM
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForSequenceClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForMultipleChoice
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForTokenClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForQuestionAnswering
|
||||
:members: __call__
|
98
docs/source/model_doc/dpr.mdx
Normal file
98
docs/source/model_doc/dpr.mdx
Normal file
@ -0,0 +1,98 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# DPR
|
||||
|
||||
## Overview
|
||||
|
||||
Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research. It was
|
||||
introduced in [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by
|
||||
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional
|
||||
sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can
|
||||
be practically implemented using dense representations alone, where embeddings are learned from a small number of
|
||||
questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets,
|
||||
our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage
|
||||
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
|
||||
benchmarks.*
|
||||
|
||||
This model was contributed by [lhoestq](https://huggingface.co/lhoestq). The original code can be found [here](https://github.com/facebookresearch/DPR).
|
||||
|
||||
|
||||
## DPRConfig
|
||||
|
||||
[[autodoc]] DPRConfig
|
||||
|
||||
## DPRContextEncoderTokenizer
|
||||
|
||||
[[autodoc]] DPRContextEncoderTokenizer
|
||||
|
||||
## DPRContextEncoderTokenizerFast
|
||||
|
||||
[[autodoc]] DPRContextEncoderTokenizerFast
|
||||
|
||||
## DPRQuestionEncoderTokenizer
|
||||
|
||||
[[autodoc]] DPRQuestionEncoderTokenizer
|
||||
|
||||
## DPRQuestionEncoderTokenizerFast
|
||||
|
||||
[[autodoc]] DPRQuestionEncoderTokenizerFast
|
||||
|
||||
## DPRReaderTokenizer
|
||||
|
||||
[[autodoc]] DPRReaderTokenizer
|
||||
|
||||
## DPRReaderTokenizerFast
|
||||
|
||||
[[autodoc]] DPRReaderTokenizerFast
|
||||
|
||||
## DPR specific outputs
|
||||
|
||||
[[autodoc]] models.dpr.modeling_dpr.DPRContextEncoderOutput
|
||||
|
||||
[[autodoc]] models.dpr.modeling_dpr.DPRQuestionEncoderOutput
|
||||
|
||||
[[autodoc]] models.dpr.modeling_dpr.DPRReaderOutput
|
||||
|
||||
## DPRContextEncoder
|
||||
|
||||
[[autodoc]] DPRContextEncoder
|
||||
- forward
|
||||
|
||||
## DPRQuestionEncoder
|
||||
|
||||
[[autodoc]] DPRQuestionEncoder
|
||||
- forward
|
||||
|
||||
## DPRReader
|
||||
|
||||
[[autodoc]] DPRReader
|
||||
- forward
|
||||
|
||||
## TFDPRContextEncoder
|
||||
|
||||
[[autodoc]] TFDPRContextEncoder
|
||||
- call
|
||||
|
||||
## TFDPRQuestionEncoder
|
||||
|
||||
[[autodoc]] TFDPRQuestionEncoder
|
||||
- call
|
||||
|
||||
## TFDPRReader
|
||||
|
||||
[[autodoc]] TFDPRReader
|
||||
- call
|
@ -1,133 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
DPR
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research. It was
|
||||
introduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__ by
|
||||
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional
|
||||
sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can
|
||||
be practically implemented using dense representations alone, where embeddings are learned from a small number of
|
||||
questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets,
|
||||
our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage
|
||||
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
|
||||
benchmarks.*
|
||||
|
||||
This model was contributed by `lhoestq <https://huggingface.co/lhoestq>`__. The original code can be found `here
|
||||
<https://github.com/facebookresearch/DPR>`__.
|
||||
|
||||
|
||||
DPRConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRConfig
|
||||
:members:
|
||||
|
||||
|
||||
DPRContextEncoderTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRContextEncoderTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
DPRContextEncoderTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRContextEncoderTokenizerFast
|
||||
:members:
|
||||
|
||||
DPRQuestionEncoderTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRQuestionEncoderTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
DPRQuestionEncoderTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRQuestionEncoderTokenizerFast
|
||||
:members:
|
||||
|
||||
DPRReaderTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRReaderTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
DPRReaderTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRReaderTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
DPR specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.dpr.modeling_dpr.DPRContextEncoderOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.dpr.modeling_dpr.DPRQuestionEncoderOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.dpr.modeling_dpr.DPRReaderOutput
|
||||
:members:
|
||||
|
||||
|
||||
DPRContextEncoder
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRContextEncoder
|
||||
:members: forward
|
||||
|
||||
DPRQuestionEncoder
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRQuestionEncoder
|
||||
:members: forward
|
||||
|
||||
|
||||
DPRReader
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DPRReader
|
||||
:members: forward
|
||||
|
||||
TFDPRContextEncoder
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDPRContextEncoder
|
||||
:members: call
|
||||
|
||||
TFDPRQuestionEncoder
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDPRQuestionEncoder
|
||||
:members: call
|
||||
|
||||
|
||||
TFDPRReader
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDPRReader
|
||||
:members: call
|
179
docs/source/model_doc/electra.mdx
Normal file
179
docs/source/model_doc/electra.mdx
Normal file
@ -0,0 +1,179 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# ELECTRA
|
||||
|
||||
## Overview
|
||||
|
||||
The ELECTRA model was proposed in the paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
|
||||
Generators](https://openreview.net/pdf?id=r1xMH1BtvB). ELECTRA is a new pretraining approach which trains two
|
||||
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
|
||||
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
|
||||
identify which tokens were replaced by the generator in the sequence.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Masked language modeling (MLM) pretraining methods such as BERT corrupt the input by replacing some tokens with [MASK]
|
||||
and then train a model to reconstruct the original tokens. While they produce good results when transferred to
|
||||
downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a
|
||||
more sample-efficient pretraining task called replaced token detection. Instead of masking the input, our approach
|
||||
corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead
|
||||
of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that
|
||||
predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments
|
||||
demonstrate this new pretraining task is more efficient than MLM because the task is defined over all input tokens
|
||||
rather than just the small subset that was masked out. As a result, the contextual representations learned by our
|
||||
approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are
|
||||
particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained
|
||||
using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale,
|
||||
where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when
|
||||
using the same amount of compute.*
|
||||
|
||||
Tips:
|
||||
|
||||
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
|
||||
only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
|
||||
while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from their
|
||||
embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no projection
|
||||
layer is used.
|
||||
- The ELECTRA checkpoints saved using [Google Research's implementation](https://github.com/google-research/electra)
|
||||
contain both the generator and discriminator. The conversion script requires the user to name which model to export
|
||||
into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
|
||||
available ELECTRA models, however. This means that the discriminator may be loaded in the
|
||||
[`ElectraForMaskedLM`] model, and the generator may be loaded in the
|
||||
[`ElectraForPreTraining`] model (the classification head will be randomly initialized as it
|
||||
doesn't exist in the generator).
|
||||
|
||||
This model was contributed by [lysandre](https://huggingface.co/lysandre). The original code can be found [here](https://github.com/google-research/electra).
|
||||
|
||||
|
||||
## ElectraConfig
|
||||
|
||||
[[autodoc]] ElectraConfig
|
||||
|
||||
## ElectraTokenizer
|
||||
|
||||
[[autodoc]] ElectraTokenizer
|
||||
|
||||
## ElectraTokenizerFast
|
||||
|
||||
[[autodoc]] ElectraTokenizerFast
|
||||
|
||||
## Electra specific outputs
|
||||
|
||||
[[autodoc]] models.electra.modeling_electra.ElectraForPreTrainingOutput
|
||||
|
||||
[[autodoc]] models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
|
||||
|
||||
## ElectraModel
|
||||
|
||||
[[autodoc]] ElectraModel
|
||||
- forward
|
||||
|
||||
## ElectraForPreTraining
|
||||
|
||||
[[autodoc]] ElectraForPreTraining
|
||||
- forward
|
||||
|
||||
## ElectraForMaskedLM
|
||||
|
||||
[[autodoc]] ElectraForMaskedLM
|
||||
- forward
|
||||
|
||||
## ElectraForSequenceClassification
|
||||
|
||||
[[autodoc]] ElectraForSequenceClassification
|
||||
- forward
|
||||
|
||||
## ElectraForMultipleChoice
|
||||
|
||||
[[autodoc]] ElectraForMultipleChoice
|
||||
- forward
|
||||
|
||||
## ElectraForTokenClassification
|
||||
|
||||
[[autodoc]] ElectraForTokenClassification
|
||||
- forward
|
||||
|
||||
## ElectraForQuestionAnswering
|
||||
|
||||
[[autodoc]] ElectraForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFElectraModel
|
||||
|
||||
[[autodoc]] TFElectraModel
|
||||
- call
|
||||
|
||||
## TFElectraForPreTraining
|
||||
|
||||
[[autodoc]] TFElectraForPreTraining
|
||||
- call
|
||||
|
||||
## TFElectraForMaskedLM
|
||||
|
||||
[[autodoc]] TFElectraForMaskedLM
|
||||
- call
|
||||
|
||||
## TFElectraForSequenceClassification
|
||||
|
||||
[[autodoc]] TFElectraForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFElectraForMultipleChoice
|
||||
|
||||
[[autodoc]] TFElectraForMultipleChoice
|
||||
- call
|
||||
|
||||
## TFElectraForTokenClassification
|
||||
|
||||
[[autodoc]] TFElectraForTokenClassification
|
||||
- call
|
||||
|
||||
## TFElectraForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFElectraForQuestionAnswering
|
||||
- call
|
||||
|
||||
## FlaxElectraModel
|
||||
|
||||
[[autodoc]] FlaxElectraModel
|
||||
- __call__
|
||||
|
||||
## FlaxElectraForPreTraining
|
||||
|
||||
[[autodoc]] FlaxElectraForPreTraining
|
||||
- __call__
|
||||
|
||||
## FlaxElectraForMaskedLM
|
||||
|
||||
[[autodoc]] FlaxElectraForMaskedLM
|
||||
- __call__
|
||||
|
||||
## FlaxElectraForSequenceClassification
|
||||
|
||||
[[autodoc]] FlaxElectraForSequenceClassification
|
||||
- __call__
|
||||
|
||||
## FlaxElectraForMultipleChoice
|
||||
|
||||
[[autodoc]] FlaxElectraForMultipleChoice
|
||||
- __call__
|
||||
|
||||
## FlaxElectraForTokenClassification
|
||||
|
||||
[[autodoc]] FlaxElectraForTokenClassification
|
||||
- __call__
|
||||
|
||||
## FlaxElectraForQuestionAnswering
|
||||
|
||||
[[autodoc]] FlaxElectraForQuestionAnswering
|
||||
- __call__
|
@ -1,236 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
ELECTRA
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
|
||||
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
|
||||
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
|
||||
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
|
||||
identify which tokens were replaced by the generator in the sequence.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Masked language modeling (MLM) pretraining methods such as BERT corrupt the input by replacing some tokens with [MASK]
|
||||
and then train a model to reconstruct the original tokens. While they produce good results when transferred to
|
||||
downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a
|
||||
more sample-efficient pretraining task called replaced token detection. Instead of masking the input, our approach
|
||||
corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead
|
||||
of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that
|
||||
predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments
|
||||
demonstrate this new pretraining task is more efficient than MLM because the task is defined over all input tokens
|
||||
rather than just the small subset that was masked out. As a result, the contextual representations learned by our
|
||||
approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are
|
||||
particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained
|
||||
using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale,
|
||||
where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when
|
||||
using the same amount of compute.*
|
||||
|
||||
Tips:
|
||||
|
||||
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
|
||||
only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
|
||||
while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from their
|
||||
embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no projection
|
||||
layer is used.
|
||||
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
|
||||
contain both the generator and discriminator. The conversion script requires the user to name which model to export
|
||||
into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
|
||||
available ELECTRA models, however. This means that the discriminator may be loaded in the
|
||||
:class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
|
||||
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
|
||||
doesn't exist in the generator).
|
||||
|
||||
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
|
||||
<https://github.com/google-research/electra>`__.
|
||||
|
||||
|
||||
ElectraConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraConfig
|
||||
:members:
|
||||
|
||||
|
||||
ElectraTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
ElectraTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
Electra specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.electra.modeling_electra.ElectraForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
|
||||
ElectraModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraModel
|
||||
:members: forward
|
||||
|
||||
|
||||
ElectraForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraForPreTraining
|
||||
:members: forward
|
||||
|
||||
|
||||
ElectraForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
ElectraForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
ElectraForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
ElectraForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
ElectraForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFElectraModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFElectraModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFElectraForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFElectraForPreTraining
|
||||
:members: call
|
||||
|
||||
|
||||
TFElectraForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFElectraForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFElectraForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFElectraForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFElectraForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFElectraForMultipleChoice
|
||||
:members: call
|
||||
|
||||
|
||||
TFElectraForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFElectraForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFElectraForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFElectraForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxElectraModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxElectraModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxElectraForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxElectraForPreTraining
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxElectraForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxElectraForMaskedLM
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxElectraForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxElectraForSequenceClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxElectraForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxElectraForMultipleChoice
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxElectraForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxElectraForTokenClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxElectraForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxElectraForQuestionAnswering
|
||||
:members: __call__
|
68
docs/source/model_doc/encoderdecoder.mdx
Normal file
68
docs/source/model_doc/encoderdecoder.mdx
Normal file
@ -0,0 +1,68 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Encoder Decoder Models
|
||||
|
||||
The [`EncoderDecoderModel`] can be used to initialize a sequence-to-sequence model with any
|
||||
pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
|
||||
|
||||
The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
|
||||
was shown in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by
|
||||
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
|
||||
After such an [`EncoderDecoderModel`] has been trained/fine-tuned, it can be saved/loaded just like
|
||||
any other models (see the examples for more information).
|
||||
|
||||
An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder
|
||||
and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://arxiv.org/abs/1908.08345) by Yang Liu and Mirella Lapata.
|
||||
|
||||
The [`~TFEncoderDecoderModel.from_pretrained`] currently doesn't support initializing the model from a
|
||||
pytorch checkpoint. Passing `from_pt=True` to this method will throw an exception. If there are only pytorch
|
||||
checkpoints for a particular encoder-decoder model, a workaround is:
|
||||
|
||||
```python
|
||||
>>> # a workaround to load from pytorch checkpoint
|
||||
>>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
|
||||
>>> _model.encoder.save_pretrained("./encoder")
|
||||
>>> _model.decoder.save_pretrained("./decoder")
|
||||
>>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained(
|
||||
... "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True
|
||||
... )
|
||||
>>> # This is only for copying some specific attributes of this particular model.
|
||||
>>> model.config = _model.config
|
||||
```
|
||||
|
||||
This model was contributed by [thomwolf](https://github.com/thomwolf). This model's TensorFlow and Flax versions
|
||||
were contributed by [ydshieh](https://github.com/ydshieh).
|
||||
|
||||
|
||||
## EncoderDecoderConfig
|
||||
|
||||
[[autodoc]] EncoderDecoderConfig
|
||||
|
||||
## EncoderDecoderModel
|
||||
|
||||
[[autodoc]] EncoderDecoderModel
|
||||
- forward
|
||||
- from_encoder_decoder_pretrained
|
||||
|
||||
## TFEncoderDecoderModel
|
||||
|
||||
[[autodoc]] TFEncoderDecoderModel
|
||||
- call
|
||||
- from_encoder_decoder_pretrained
|
||||
|
||||
## FlaxEncoderDecoderModel
|
||||
|
||||
[[autodoc]] FlaxEncoderDecoderModel
|
||||
- __call__
|
||||
- from_encoder_decoder_pretrained
|
@ -1,75 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Encoder Decoder Models
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
The :class:`~transformers.EncoderDecoderModel` can be used to initialize a sequence-to-sequence model with any
|
||||
pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
|
||||
|
||||
The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
|
||||
was shown in `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by
|
||||
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
|
||||
After such an :class:`~transformers.EncoderDecoderModel` has been trained/fine-tuned, it can be saved/loaded just like
|
||||
any other models (see the examples for more information).
|
||||
|
||||
An application of this architecture could be to leverage two pretrained :class:`~transformers.BertModel` as the encoder
|
||||
and decoder for a summarization model as was shown in: `Text Summarization with Pretrained Encoders
|
||||
<https://arxiv.org/abs/1908.08345>`__ by Yang Liu and Mirella Lapata.
|
||||
|
||||
The :meth:`~transformers.TFEncoderDecoderModel.from_pretrained` currently doesn't support initializing the model from a
|
||||
pytorch checkpoint. Passing ``from_pt=True`` to this method will throw an exception. If there are only pytorch
|
||||
checkpoints for a particular encoder-decoder model, a workaround is:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> # a workaround to load from pytorch checkpoint
|
||||
>>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
|
||||
>>> _model.encoder.save_pretrained("./encoder")
|
||||
>>> _model.decoder.save_pretrained("./decoder")
|
||||
>>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained(
|
||||
... "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True
|
||||
... )
|
||||
>>> # This is only for copying some specific attributes of this particular model.
|
||||
>>> model.config = _model.config
|
||||
|
||||
This model was contributed by `thomwolf <https://github.com/thomwolf>`__. This model's TensorFlow and Flax versions
|
||||
were contributed by `ydshieh <https://github.com/ydshieh>`__.
|
||||
|
||||
|
||||
EncoderDecoderConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.EncoderDecoderConfig
|
||||
:members:
|
||||
|
||||
|
||||
EncoderDecoderModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.EncoderDecoderModel
|
||||
:members: forward, from_encoder_decoder_pretrained
|
||||
|
||||
|
||||
TFEncoderDecoderModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFEncoderDecoderModel
|
||||
:members: call, from_encoder_decoder_pretrained
|
||||
|
||||
|
||||
FlaxEncoderDecoderModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxEncoderDecoderModel
|
||||
:members: __call__, from_encoder_decoder_pretrained
|
109
docs/source/model_doc/flaubert.mdx
Normal file
109
docs/source/model_doc/flaubert.mdx
Normal file
@ -0,0 +1,109 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# FlauBERT
|
||||
|
||||
## Overview
|
||||
|
||||
The FlauBERT model was proposed in the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le et al. It's a transformer model pretrained using a masked language
|
||||
modeling (MLM) objective (like BERT).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Language models have become a key step to achieve state-of-the art results in many different Natural Language
|
||||
Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way
|
||||
to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their
|
||||
contextualization at the sentence level. This has been widely demonstrated for English using contextualized
|
||||
representations (Dai and Le, 2015; Peters et al., 2018; Howard and Ruder, 2018; Radford et al., 2018; Devlin et al.,
|
||||
2019; Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large and
|
||||
heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre for
|
||||
Scientific Research) Jean Zay supercomputer. We apply our French language models to diverse NLP tasks (text
|
||||
classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the
|
||||
time they outperform other pretraining approaches. Different versions of FlauBERT as well as a unified evaluation
|
||||
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
|
||||
community for further reproducible experiments in French NLP.*
|
||||
|
||||
This model was contributed by [formiel](https://huggingface.co/formiel). The original code can be found [here](https://github.com/getalp/Flaubert).
|
||||
|
||||
|
||||
## FlaubertConfig
|
||||
|
||||
[[autodoc]] FlaubertConfig
|
||||
|
||||
## FlaubertTokenizer
|
||||
|
||||
[[autodoc]] FlaubertTokenizer
|
||||
|
||||
## FlaubertModel
|
||||
|
||||
[[autodoc]] FlaubertModel
|
||||
- forward
|
||||
|
||||
## FlaubertWithLMHeadModel
|
||||
|
||||
[[autodoc]] FlaubertWithLMHeadModel
|
||||
- forward
|
||||
|
||||
## FlaubertForSequenceClassification
|
||||
|
||||
[[autodoc]] FlaubertForSequenceClassification
|
||||
- forward
|
||||
|
||||
## FlaubertForMultipleChoice
|
||||
|
||||
[[autodoc]] FlaubertForMultipleChoice
|
||||
- forward
|
||||
|
||||
## FlaubertForTokenClassification
|
||||
|
||||
[[autodoc]] FlaubertForTokenClassification
|
||||
- forward
|
||||
|
||||
## FlaubertForQuestionAnsweringSimple
|
||||
|
||||
[[autodoc]] FlaubertForQuestionAnsweringSimple
|
||||
- forward
|
||||
|
||||
## FlaubertForQuestionAnswering
|
||||
|
||||
[[autodoc]] FlaubertForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFFlaubertModel
|
||||
|
||||
[[autodoc]] TFFlaubertModel
|
||||
- call
|
||||
|
||||
## TFFlaubertWithLMHeadModel
|
||||
|
||||
[[autodoc]] TFFlaubertWithLMHeadModel
|
||||
- call
|
||||
|
||||
## TFFlaubertForSequenceClassification
|
||||
|
||||
[[autodoc]] TFFlaubertForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFFlaubertForMultipleChoice
|
||||
|
||||
[[autodoc]] TFFlaubertForMultipleChoice
|
||||
- call
|
||||
|
||||
## TFFlaubertForTokenClassification
|
||||
|
||||
[[autodoc]] TFFlaubertForTokenClassification
|
||||
- call
|
||||
|
||||
## TFFlaubertForQuestionAnsweringSimple
|
||||
|
||||
[[autodoc]] TFFlaubertForQuestionAnsweringSimple
|
||||
- call
|
@ -1,144 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
FlauBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The FlauBERT model was proposed in the paper `FlauBERT: Unsupervised Language Model Pre-training for French
|
||||
<https://arxiv.org/abs/1912.05372>`__ by Hang Le et al. It's a transformer model pretrained using a masked language
|
||||
modeling (MLM) objective (like BERT).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Language models have become a key step to achieve state-of-the art results in many different Natural Language
|
||||
Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way
|
||||
to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their
|
||||
contextualization at the sentence level. This has been widely demonstrated for English using contextualized
|
||||
representations (Dai and Le, 2015; Peters et al., 2018; Howard and Ruder, 2018; Radford et al., 2018; Devlin et al.,
|
||||
2019; Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large and
|
||||
heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre for
|
||||
Scientific Research) Jean Zay supercomputer. We apply our French language models to diverse NLP tasks (text
|
||||
classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the
|
||||
time they outperform other pretraining approaches. Different versions of FlauBERT as well as a unified evaluation
|
||||
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
|
||||
community for further reproducible experiments in French NLP.*
|
||||
|
||||
This model was contributed by `formiel <https://huggingface.co/formiel>`__. The original code can be found `here
|
||||
<https://github.com/getalp/Flaubert>`__.
|
||||
|
||||
|
||||
FlaubertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertConfig
|
||||
:members:
|
||||
|
||||
|
||||
FlaubertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
FlaubertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaubertWithLMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertWithLMHeadModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaubertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaubertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaubertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaubertForQuestionAnsweringSimple
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertForQuestionAnsweringSimple
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaubertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaubertForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFFlaubertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFlaubertModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFFlaubertWithLMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFlaubertWithLMHeadModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFFlaubertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFlaubertForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFFlaubertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFlaubertForMultipleChoice
|
||||
:members: call
|
||||
|
||||
|
||||
TFFlaubertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFlaubertForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFFlaubertForQuestionAnsweringSimple
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFlaubertForQuestionAnsweringSimple
|
||||
:members: call
|
98
docs/source/model_doc/fnet.mdx
Normal file
98
docs/source/model_doc/fnet.mdx
Normal file
@ -0,0 +1,98 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# FNet
|
||||
|
||||
## Overview
|
||||
|
||||
The FNet model was proposed in [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by
|
||||
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. The model replaces the self-attention layer in a BERT
|
||||
model with a fourier transform which returns only the real parts of the transform. The model is significantly faster
|
||||
than the BERT model because it has fewer parameters and is more memory efficient. The model achieves about 92-97%
|
||||
accuracy of BERT counterparts on GLUE benchmark, and trains much faster than the BERT model. The abstract from the
|
||||
paper is the following:
|
||||
|
||||
*We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the
|
||||
self-attention sublayers with simple linear transformations that "mix" input tokens. These linear mixers, along with
|
||||
standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text
|
||||
classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder
|
||||
with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE
|
||||
benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths,
|
||||
our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena
|
||||
benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all
|
||||
sequence lengths on GPUs (and across relatively shorter lengths on TPUs). Finally, FNet has a light memory footprint
|
||||
and is particularly efficient at smaller model sizes; for a fixed speed and accuracy budget, small FNet models
|
||||
outperform Transformer counterparts.*
|
||||
|
||||
Tips on usage:
|
||||
|
||||
- The model was trained without an attention mask as it is based on Fourier Transform. The model was trained with
|
||||
maximum sequence length 512 which includes pad tokens. Hence, it is highly recommended to use the same maximum
|
||||
sequence length for fine-tuning and inference.
|
||||
|
||||
This model was contributed by [gchhablani](https://huggingface.co/gchhablani). The original code can be found [here](https://github.com/google-research/google-research/tree/master/f_net).
|
||||
|
||||
## FNetConfig
|
||||
|
||||
[[autodoc]] FNetConfig
|
||||
|
||||
## FNetTokenizer
|
||||
|
||||
[[autodoc]] FNetTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## FNetTokenizerFast
|
||||
|
||||
[[autodoc]] FNetTokenizerFast
|
||||
|
||||
## FNetModel
|
||||
|
||||
[[autodoc]] FNetModel
|
||||
- forward
|
||||
|
||||
## FNetForPreTraining
|
||||
|
||||
[[autodoc]] FNetForPreTraining
|
||||
- forward
|
||||
|
||||
## FNetForMaskedLM
|
||||
|
||||
[[autodoc]] FNetForMaskedLM
|
||||
- forward
|
||||
|
||||
## FNetForNextSentencePrediction
|
||||
|
||||
[[autodoc]] FNetForNextSentencePrediction
|
||||
- forward
|
||||
|
||||
## FNetForSequenceClassification
|
||||
|
||||
[[autodoc]] FNetForSequenceClassification
|
||||
- forward
|
||||
|
||||
## FNetForMultipleChoice
|
||||
|
||||
[[autodoc]] FNetForMultipleChoice
|
||||
- forward
|
||||
|
||||
## FNetForTokenClassification
|
||||
|
||||
[[autodoc]] FNetForTokenClassification
|
||||
- forward
|
||||
|
||||
## FNetForQuestionAnswering
|
||||
|
||||
[[autodoc]] FNetForQuestionAnswering
|
||||
- forward
|
@ -1,121 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
FNet
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The FNet model was proposed in `FNet: Mixing Tokens with Fourier Transforms <https://arxiv.org/abs/2105.03824>`__ by
|
||||
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. The model replaces the self-attention layer in a BERT
|
||||
model with a fourier transform which returns only the real parts of the transform. The model is significantly faster
|
||||
than the BERT model because it has fewer parameters and is more memory efficient. The model achieves about 92-97%
|
||||
accuracy of BERT counterparts on GLUE benchmark, and trains much faster than the BERT model. The abstract from the
|
||||
paper is the following:
|
||||
|
||||
*We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the
|
||||
self-attention sublayers with simple linear transformations that "mix" input tokens. These linear mixers, along with
|
||||
standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text
|
||||
classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder
|
||||
with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE
|
||||
benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths,
|
||||
our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena
|
||||
benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all
|
||||
sequence lengths on GPUs (and across relatively shorter lengths on TPUs). Finally, FNet has a light memory footprint
|
||||
and is particularly efficient at smaller model sizes; for a fixed speed and accuracy budget, small FNet models
|
||||
outperform Transformer counterparts.*
|
||||
|
||||
Tips on usage:
|
||||
|
||||
- The model was trained without an attention mask as it is based on Fourier Transform. The model was trained with
|
||||
maximum sequence length 512 which includes pad tokens. Hence, it is highly recommended to use the same maximum
|
||||
sequence length for fine-tuning and inference.
|
||||
|
||||
This model was contributed by `gchhablani <https://huggingface.co/gchhablani>`__. The original code can be found `here
|
||||
<https://github.com/google-research/google-research/tree/master/f_net>`__.
|
||||
|
||||
FNetConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetConfig
|
||||
:members:
|
||||
|
||||
|
||||
FNetTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
FNetTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
FNetModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForPreTraining
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForNextSentencePrediction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForNextSentencePrediction
|
||||
:members: forward
|
||||
|
||||
FNetForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForQuestionAnswering
|
||||
:members: forward
|
63
docs/source/model_doc/fsmt.mdx
Normal file
63
docs/source/model_doc/fsmt.mdx
Normal file
@ -0,0 +1,63 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# FSMT
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title) and assign
|
||||
@stas00.
|
||||
|
||||
## Overview
|
||||
|
||||
FSMT (FairSeq MachineTranslation) models were introduced in [Facebook FAIR's WMT19 News Translation Task Submission](https://arxiv.org/abs/1907.06616) by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.
|
||||
|
||||
The abstract of the paper is the following:
|
||||
|
||||
*This paper describes Facebook FAIR's submission to the WMT19 shared news translation task. We participate in two
|
||||
language pairs and four language directions, English <-> German and English <-> Russian. Following our submission from
|
||||
last year, our baseline systems are large BPE-based transformer models trained with the Fairseq sequence modeling
|
||||
toolkit which rely on sampled back-translations. This year we experiment with different bitext data filtering schemes,
|
||||
as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific
|
||||
data, then decode using noisy channel model reranking. Our submissions are ranked first in all four directions of the
|
||||
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
|
||||
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
|
||||
|
||||
This model was contributed by [stas](https://huggingface.co/stas). The original code can be found
|
||||
[here](https://github.com/pytorch/fairseq/tree/master/examples/wmt19).
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- FSMT uses source and target vocabulary pairs that aren't combined into one. It doesn't share embeddings tokens
|
||||
either. Its tokenizer is very similar to [`XLMTokenizer`] and the main model is derived from
|
||||
[`BartModel`].
|
||||
|
||||
|
||||
## FSMTConfig
|
||||
|
||||
[[autodoc]] FSMTConfig
|
||||
|
||||
## FSMTTokenizer
|
||||
|
||||
[[autodoc]] FSMTTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## FSMTModel
|
||||
|
||||
[[autodoc]] FSMTModel
|
||||
- forward
|
||||
|
||||
## FSMTForConditionalGeneration
|
||||
|
||||
[[autodoc]] FSMTForConditionalGeneration
|
||||
- forward
|
@ -1,74 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
FSMT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
|
||||
@stas00.
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
FSMT (FairSeq MachineTranslation) models were introduced in `Facebook FAIR's WMT19 News Translation Task Submission
|
||||
<https://arxiv.org/abs/1907.06616>`__ by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.
|
||||
|
||||
The abstract of the paper is the following:
|
||||
|
||||
*This paper describes Facebook FAIR's submission to the WMT19 shared news translation task. We participate in two
|
||||
language pairs and four language directions, English <-> German and English <-> Russian. Following our submission from
|
||||
last year, our baseline systems are large BPE-based transformer models trained with the Fairseq sequence modeling
|
||||
toolkit which rely on sampled back-translations. This year we experiment with different bitext data filtering schemes,
|
||||
as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific
|
||||
data, then decode using noisy channel model reranking. Our submissions are ranked first in all four directions of the
|
||||
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
|
||||
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
|
||||
|
||||
This model was contributed by `stas <https://huggingface.co/stas>`__. The original code can be found here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
|
||||
|
||||
Implementation Notes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- FSMT uses source and target vocabulary pairs that aren't combined into one. It doesn't share embeddings tokens
|
||||
either. Its tokenizer is very similar to :class:`~transformers.XLMTokenizer` and the main model is derived from
|
||||
:class:`~transformers.BartModel`.
|
||||
|
||||
|
||||
FSMTConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FSMTConfig
|
||||
:members:
|
||||
|
||||
|
||||
FSMTTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FSMTTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
FSMTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FSMTModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FSMTForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FSMTForConditionalGeneration
|
||||
:members: forward
|
153
docs/source/model_doc/funnel.mdx
Normal file
153
docs/source/model_doc/funnel.mdx
Normal file
@ -0,0 +1,153 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Funnel Transformer
|
||||
|
||||
## Overview
|
||||
|
||||
The Funnel Transformer model was proposed in the paper [Funnel-Transformer: Filtering out Sequential Redundancy for
|
||||
Efficient Language Processing](https://arxiv.org/abs/2006.03236). It is a bidirectional transformer model, like
|
||||
BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks
|
||||
(CNN) in computer vision.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*With the success of language pretraining, it is highly desirable to develop more efficient architectures of good
|
||||
scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the
|
||||
much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only
|
||||
require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which
|
||||
gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More
|
||||
importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further
|
||||
improve the model capacity. In addition, to perform token-level predictions as required by common pretraining
|
||||
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence
|
||||
via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on
|
||||
a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading
|
||||
comprehension.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.
|
||||
The base model therefore has a final sequence length that is a quarter of the original one. This model can be used
|
||||
directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other
|
||||
tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same
|
||||
sequence length as the input.
|
||||
- The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should be
|
||||
used for [`FunnelModel`], [`FunnelForPreTraining`],
|
||||
[`FunnelForMaskedLM`], [`FunnelForTokenClassification`] and
|
||||
class:*~transformers.FunnelForQuestionAnswering*. The second ones should be used for
|
||||
[`FunnelBaseModel`], [`FunnelForSequenceClassification`] and
|
||||
[`FunnelForMultipleChoice`].
|
||||
|
||||
This model was contributed by [sgugger](https://huggingface.co/sgugger). The original code can be found [here](https://github.com/laiguokun/Funnel-Transformer).
|
||||
|
||||
|
||||
## FunnelConfig
|
||||
|
||||
[[autodoc]] FunnelConfig
|
||||
|
||||
## FunnelTokenizer
|
||||
|
||||
[[autodoc]] FunnelTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## FunnelTokenizerFast
|
||||
|
||||
[[autodoc]] FunnelTokenizerFast
|
||||
|
||||
## Funnel specific outputs
|
||||
|
||||
[[autodoc]] models.funnel.modeling_funnel.FunnelForPreTrainingOutput
|
||||
|
||||
[[autodoc]] models.funnel.modeling_tf_funnel.TFFunnelForPreTrainingOutput
|
||||
|
||||
## FunnelBaseModel
|
||||
|
||||
[[autodoc]] FunnelBaseModel
|
||||
- forward
|
||||
|
||||
## FunnelModel
|
||||
|
||||
[[autodoc]] FunnelModel
|
||||
- forward
|
||||
|
||||
## FunnelModelForPreTraining
|
||||
|
||||
[[autodoc]] FunnelForPreTraining
|
||||
- forward
|
||||
|
||||
## FunnelForMaskedLM
|
||||
|
||||
[[autodoc]] FunnelForMaskedLM
|
||||
- forward
|
||||
|
||||
## FunnelForSequenceClassification
|
||||
|
||||
[[autodoc]] FunnelForSequenceClassification
|
||||
- forward
|
||||
|
||||
## FunnelForMultipleChoice
|
||||
|
||||
[[autodoc]] FunnelForMultipleChoice
|
||||
- forward
|
||||
|
||||
## FunnelForTokenClassification
|
||||
|
||||
[[autodoc]] FunnelForTokenClassification
|
||||
- forward
|
||||
|
||||
## FunnelForQuestionAnswering
|
||||
|
||||
[[autodoc]] FunnelForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFFunnelBaseModel
|
||||
|
||||
[[autodoc]] TFFunnelBaseModel
|
||||
- call
|
||||
|
||||
## TFFunnelModel
|
||||
|
||||
[[autodoc]] TFFunnelModel
|
||||
- call
|
||||
|
||||
## TFFunnelModelForPreTraining
|
||||
|
||||
[[autodoc]] TFFunnelForPreTraining
|
||||
- call
|
||||
|
||||
## TFFunnelForMaskedLM
|
||||
|
||||
[[autodoc]] TFFunnelForMaskedLM
|
||||
- call
|
||||
|
||||
## TFFunnelForSequenceClassification
|
||||
|
||||
[[autodoc]] TFFunnelForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFFunnelForMultipleChoice
|
||||
|
||||
[[autodoc]] TFFunnelForMultipleChoice
|
||||
- call
|
||||
|
||||
## TFFunnelForTokenClassification
|
||||
|
||||
[[autodoc]] TFFunnelForTokenClassification
|
||||
- call
|
||||
|
||||
## TFFunnelForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFFunnelForQuestionAnswering
|
||||
- call
|
@ -1,197 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Funnel Transformer
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Funnel Transformer model was proposed in the paper `Funnel-Transformer: Filtering out Sequential Redundancy for
|
||||
Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__. It is a bidirectional transformer model, like
|
||||
BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks
|
||||
(CNN) in computer vision.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*With the success of language pretraining, it is highly desirable to develop more efficient architectures of good
|
||||
scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the
|
||||
much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only
|
||||
require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which
|
||||
gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More
|
||||
importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further
|
||||
improve the model capacity. In addition, to perform token-level predictions as required by common pretraining
|
||||
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence
|
||||
via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on
|
||||
a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading
|
||||
comprehension.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.
|
||||
The base model therefore has a final sequence length that is a quarter of the original one. This model can be used
|
||||
directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other
|
||||
tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same
|
||||
sequence length as the input.
|
||||
- The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should be
|
||||
used for :class:`~transformers.FunnelModel`, :class:`~transformers.FunnelForPreTraining`,
|
||||
:class:`~transformers.FunnelForMaskedLM`, :class:`~transformers.FunnelForTokenClassification` and
|
||||
class:`~transformers.FunnelForQuestionAnswering`. The second ones should be used for
|
||||
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
|
||||
:class:`~transformers.FunnelForMultipleChoice`.
|
||||
|
||||
This model was contributed by `sgugger <https://huggingface.co/sgugger>`__. The original code can be found `here
|
||||
<https://github.com/laiguokun/Funnel-Transformer>`__.
|
||||
|
||||
|
||||
FunnelConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelConfig
|
||||
:members:
|
||||
|
||||
|
||||
FunnelTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
FunnelTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
Funnel specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.funnel.modeling_funnel.FunnelForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.funnel.modeling_tf_funnel.TFFunnelForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
|
||||
FunnelBaseModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelBaseModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FunnelModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FunnelModelForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelForPreTraining
|
||||
:members: forward
|
||||
|
||||
|
||||
FunnelForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
FunnelForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FunnelForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
FunnelForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FunnelForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FunnelForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFFunnelBaseModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelBaseModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFFunnelModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFFunnelModelForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelForPreTraining
|
||||
:members: call
|
||||
|
||||
|
||||
TFFunnelForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFFunnelForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFFunnelForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelForMultipleChoice
|
||||
:members: call
|
||||
|
||||
|
||||
TFFunnelForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFFunnelForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFFunnelForQuestionAnswering
|
||||
:members: call
|
117
docs/source/model_doc/gpt.mdx
Normal file
117
docs/source/model_doc/gpt.mdx
Normal file
@ -0,0 +1,117 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# OpenAI GPT
|
||||
|
||||
## Overview
|
||||
|
||||
OpenAI GPT model was proposed in [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
|
||||
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional) transformer
|
||||
pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering,
|
||||
semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant,
|
||||
labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to
|
||||
perform adequately. We demonstrate that large gains on these tasks can be realized by generative pretraining of a
|
||||
language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In
|
||||
contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve
|
||||
effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our
|
||||
approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms
|
||||
discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon
|
||||
the state of the art in 9 out of the 12 tasks studied.*
|
||||
|
||||
Tips:
|
||||
|
||||
- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
|
||||
observed in the *run_generation.py* example script.
|
||||
|
||||
[Write With Transformer](https://transformer.huggingface.co/doc/gpt) is a webapp created and hosted by Hugging Face
|
||||
showcasing the generative capabilities of several models. GPT is one of them.
|
||||
|
||||
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/openai/finetune-transformer-lm).
|
||||
|
||||
Note:
|
||||
|
||||
If you want to reproduce the original tokenization process of the *OpenAI GPT* paper, you will need to install `ftfy`
|
||||
and `SpaCy`:
|
||||
|
||||
```bash
|
||||
pip install spacy ftfy==4.4.3
|
||||
python -m spacy download en
|
||||
```
|
||||
|
||||
If you don't install `ftfy` and `SpaCy`, the [`OpenAIGPTTokenizer`] will default to tokenize
|
||||
using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
|
||||
|
||||
## OpenAIGPTConfig
|
||||
|
||||
[[autodoc]] OpenAIGPTConfig
|
||||
|
||||
## OpenAIGPTTokenizer
|
||||
|
||||
[[autodoc]] OpenAIGPTTokenizer
|
||||
- save_vocabulary
|
||||
|
||||
## OpenAIGPTTokenizerFast
|
||||
|
||||
[[autodoc]] OpenAIGPTTokenizerFast
|
||||
|
||||
## OpenAI specific outputs
|
||||
|
||||
[[autodoc]] models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput
|
||||
|
||||
[[autodoc]] models.openai.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput
|
||||
|
||||
## OpenAIGPTModel
|
||||
|
||||
[[autodoc]] OpenAIGPTModel
|
||||
- forward
|
||||
|
||||
## OpenAIGPTLMHeadModel
|
||||
|
||||
[[autodoc]] OpenAIGPTLMHeadModel
|
||||
- forward
|
||||
|
||||
## OpenAIGPTDoubleHeadsModel
|
||||
|
||||
[[autodoc]] OpenAIGPTDoubleHeadsModel
|
||||
- forward
|
||||
|
||||
## OpenAIGPTForSequenceClassification
|
||||
|
||||
[[autodoc]] OpenAIGPTForSequenceClassification
|
||||
- forward
|
||||
|
||||
## TFOpenAIGPTModel
|
||||
|
||||
[[autodoc]] TFOpenAIGPTModel
|
||||
- call
|
||||
|
||||
## TFOpenAIGPTLMHeadModel
|
||||
|
||||
[[autodoc]] TFOpenAIGPTLMHeadModel
|
||||
- call
|
||||
|
||||
## TFOpenAIGPTDoubleHeadsModel
|
||||
|
||||
[[autodoc]] TFOpenAIGPTDoubleHeadsModel
|
||||
- call
|
||||
|
||||
## TFOpenAIGPTForSequenceClassification
|
||||
|
||||
[[autodoc]] TFOpenAIGPTForSequenceClassification
|
||||
- call
|
@ -1,147 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
OpenAI GPT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training
|
||||
<https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
|
||||
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional) transformer
|
||||
pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering,
|
||||
semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant,
|
||||
labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to
|
||||
perform adequately. We demonstrate that large gains on these tasks can be realized by generative pretraining of a
|
||||
language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In
|
||||
contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve
|
||||
effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our
|
||||
approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms
|
||||
discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon
|
||||
the state of the art in 9 out of the 12 tasks studied.*
|
||||
|
||||
Tips:
|
||||
|
||||
- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
|
||||
observed in the `run_generation.py` example script.
|
||||
|
||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
|
||||
showcasing the generative capabilities of several models. GPT is one of them.
|
||||
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://github.com/openai/finetune-transformer-lm>`__.
|
||||
|
||||
Note:
|
||||
|
||||
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install ``ftfy``
|
||||
and ``SpaCy``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install spacy ftfy==4.4.3
|
||||
python -m spacy download en
|
||||
|
||||
If you don't install ``ftfy`` and ``SpaCy``, the :class:`~transformers.OpenAIGPTTokenizer` will default to tokenize
|
||||
using BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
|
||||
|
||||
OpenAIGPTConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTConfig
|
||||
:members:
|
||||
|
||||
|
||||
OpenAIGPTTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTTokenizer
|
||||
:members: save_vocabulary
|
||||
|
||||
|
||||
OpenAIGPTTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
OpenAI specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.openai.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput
|
||||
:members:
|
||||
|
||||
|
||||
OpenAIGPTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTModel
|
||||
:members: forward
|
||||
|
||||
|
||||
OpenAIGPTLMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTLMHeadModel
|
||||
:members: forward
|
||||
|
||||
|
||||
OpenAIGPTDoubleHeadsModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
|
||||
:members: forward
|
||||
|
||||
|
||||
OpenAIGPTForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
TFOpenAIGPTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFOpenAIGPTModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFOpenAIGPTLMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFOpenAIGPTLMHeadModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFOpenAIGPTDoubleHeadsModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFOpenAIGPTDoubleHeadsModel
|
||||
:members: call
|
||||
|
||||
TFOpenAIGPTForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFOpenAIGPTForSequenceClassification
|
||||
:members: call
|
131
docs/source/model_doc/gpt2.mdx
Normal file
131
docs/source/model_doc/gpt2.mdx
Normal file
@ -0,0 +1,131 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# OpenAI GPT2
|
||||
|
||||
## Overview
|
||||
|
||||
OpenAI GPT-2 model was proposed in [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) by Alec
|
||||
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
|
||||
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million
|
||||
web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some
|
||||
text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks
|
||||
across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than
|
||||
10X the amount of data.*
|
||||
|
||||
Tips:
|
||||
|
||||
- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
|
||||
observed in the *run_generation.py* example script.
|
||||
- The model can take the *past_key_values* (for PyTorch) or *past* (for TF) as input, which is the previously computed
|
||||
key/value attention pairs. Using this (*past_key_values* or *past*) value prevents the model from re-computing
|
||||
pre-computed values in the context of text generation. For PyTorch, see *past_key_values* argument of the
|
||||
[`GPT2Model.forward`] method, or for TF the *past* argument of the
|
||||
[`TFGPT2Model.call`] method for more information on its usage.
|
||||
- Enabling the *scale_attn_by_inverse_layer_idx* and *reorder_and_upcast_attn* flags will apply the training stability
|
||||
improvements from [Mistral](https://github.com/stanford-crfm/mistral/) (for PyTorch only).
|
||||
|
||||
[Write With Transformer](https://transformer.huggingface.co/doc/gpt2-large) is a webapp created and hosted by
|
||||
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
||||
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: *distilgpt-2*.
|
||||
|
||||
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://openai.com/blog/better-language-models/).
|
||||
|
||||
|
||||
## GPT2Config
|
||||
|
||||
[[autodoc]] GPT2Config
|
||||
|
||||
## GPT2Tokenizer
|
||||
|
||||
[[autodoc]] GPT2Tokenizer
|
||||
- save_vocabulary
|
||||
|
||||
## GPT2TokenizerFast
|
||||
|
||||
[[autodoc]] GPT2TokenizerFast
|
||||
|
||||
## GPT2 specific outputs
|
||||
|
||||
[[autodoc]] models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput
|
||||
|
||||
[[autodoc]] models.gpt2.modeling_tf_gpt2.TFGPT2DoubleHeadsModelOutput
|
||||
|
||||
## GPT2Model
|
||||
|
||||
[[autodoc]] GPT2Model
|
||||
- forward
|
||||
- parallelize
|
||||
- deparallelize
|
||||
|
||||
## GPT2LMHeadModel
|
||||
|
||||
[[autodoc]] GPT2LMHeadModel
|
||||
- forward
|
||||
- parallelize
|
||||
- deparallelize
|
||||
|
||||
## GPT2DoubleHeadsModel
|
||||
|
||||
[[autodoc]] GPT2DoubleHeadsModel
|
||||
- forward
|
||||
|
||||
## GPT2ForSequenceClassification
|
||||
|
||||
[[autodoc]] GPT2ForSequenceClassification
|
||||
- forward
|
||||
|
||||
## GPT2ForTokenClassification
|
||||
|
||||
[[autodoc]] GPT2ForTokenClassification
|
||||
- forward
|
||||
|
||||
## TFGPT2Model
|
||||
|
||||
[[autodoc]] TFGPT2Model
|
||||
- call
|
||||
|
||||
## TFGPT2LMHeadModel
|
||||
|
||||
[[autodoc]] TFGPT2LMHeadModel
|
||||
- call
|
||||
|
||||
## TFGPT2DoubleHeadsModel
|
||||
|
||||
[[autodoc]] TFGPT2DoubleHeadsModel
|
||||
- call
|
||||
|
||||
## TFGPT2ForSequenceClassification
|
||||
|
||||
[[autodoc]] TFGPT2ForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFSequenceClassifierOutputWithPast
|
||||
|
||||
[[autodoc]] modeling_tf_outputs.TFSequenceClassifierOutputWithPast
|
||||
|
||||
## FlaxGPT2Model
|
||||
|
||||
[[autodoc]] FlaxGPT2Model
|
||||
- __call__
|
||||
|
||||
## FlaxGPT2LMHeadModel
|
||||
|
||||
[[autodoc]] FlaxGPT2LMHeadModel
|
||||
- __call__
|
@ -1,165 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
OpenAI GPT2
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenAI GPT-2 model was proposed in `Language Models are Unsupervised Multitask Learners
|
||||
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_ by Alec
|
||||
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
|
||||
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million
|
||||
web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some
|
||||
text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks
|
||||
across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than
|
||||
10X the amount of data.*
|
||||
|
||||
Tips:
|
||||
|
||||
- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
|
||||
observed in the `run_generation.py` example script.
|
||||
- The model can take the `past_key_values` (for PyTorch) or `past` (for TF) as input, which is the previously computed
|
||||
key/value attention pairs. Using this (`past_key_values` or `past`) value prevents the model from re-computing
|
||||
pre-computed values in the context of text generation. For PyTorch, see `past_key_values` argument of the
|
||||
:meth:`~transformers.GPT2Model.forward` method, or for TF the `past` argument of the
|
||||
:meth:`~transformers.TFGPT2Model.call` method for more information on its usage.
|
||||
- Enabling the `scale_attn_by_inverse_layer_idx` and `reorder_and_upcast_attn` flags will apply the training stability
|
||||
improvements from `Mistral <https://github.com/stanford-crfm/mistral/>`__ (for PyTorch only).
|
||||
|
||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
|
||||
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
||||
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
|
||||
|
||||
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||
<https://openai.com/blog/better-language-models/>`__.
|
||||
|
||||
|
||||
GPT2Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2Config
|
||||
:members:
|
||||
|
||||
|
||||
GPT2Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2Tokenizer
|
||||
:members: save_vocabulary
|
||||
|
||||
|
||||
GPT2TokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2TokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
GPT2 specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.gpt2.modeling_tf_gpt2.TFGPT2DoubleHeadsModelOutput
|
||||
:members:
|
||||
|
||||
|
||||
GPT2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2Model
|
||||
:members: forward, parallelize, deparallelize
|
||||
|
||||
|
||||
GPT2LMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2LMHeadModel
|
||||
:members: forward, parallelize, deparallelize
|
||||
|
||||
|
||||
GPT2DoubleHeadsModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2DoubleHeadsModel
|
||||
:members: forward
|
||||
|
||||
|
||||
GPT2ForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2ForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
GPT2ForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2ForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
TFGPT2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFGPT2Model
|
||||
:members: call
|
||||
|
||||
|
||||
TFGPT2LMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFGPT2LMHeadModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFGPT2DoubleHeadsModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFGPT2DoubleHeadsModel
|
||||
:members: call
|
||||
|
||||
TFGPT2ForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFGPT2ForSequenceClassification
|
||||
:members: call
|
||||
|
||||
TFSequenceClassifierOutputWithPast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_tf_outputs.TFSequenceClassifierOutputWithPast
|
||||
:members:
|
||||
|
||||
|
||||
FlaxGPT2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPT2Model
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxGPT2LMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPT2LMHeadModel
|
||||
:members: __call__
|
72
docs/source/model_doc/gpt_neo.mdx
Normal file
72
docs/source/model_doc/gpt_neo.mdx
Normal file
@ -0,0 +1,72 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# GPT Neo
|
||||
|
||||
## Overview
|
||||
|
||||
The GPTNeo model was released in the [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) repository by Sid
|
||||
Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like causal language model trained on the
|
||||
[Pile](https://pile.eleuther.ai/) dataset.
|
||||
|
||||
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
|
||||
256 tokens.
|
||||
|
||||
This model was contributed by [valhalla](https://huggingface.co/valhalla).
|
||||
|
||||
### Generation
|
||||
|
||||
The `generate()` method can be used to generate text using GPT Neo model.
|
||||
|
||||
```python
|
||||
>>> from transformers import GPTNeoForCausalLM, GPT2Tokenizer
|
||||
>>> model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
|
||||
>>> tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
```
|
||||
|
||||
## GPTNeoConfig
|
||||
|
||||
[[autodoc]] GPTNeoConfig
|
||||
|
||||
## GPTNeoModel
|
||||
|
||||
[[autodoc]] GPTNeoModel
|
||||
- forward
|
||||
|
||||
## GPTNeoForCausalLM
|
||||
|
||||
[[autodoc]] GPTNeoForCausalLM
|
||||
- forward
|
||||
|
||||
## GPTNeoForSequenceClassification
|
||||
|
||||
[[autodoc]] GPTNeoForSequenceClassification
|
||||
- forward
|
||||
|
||||
## FlaxGPTNeoModel
|
||||
|
||||
[[autodoc]] FlaxGPTNeoModel
|
||||
- __call__
|
||||
|
||||
## FlaxGPTNeoForCausalLM
|
||||
|
||||
[[autodoc]] FlaxGPTNeoForCausalLM
|
||||
- __call__
|
@ -1,86 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
GPT Neo
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The GPTNeo model was released in the `EleutherAI/gpt-neo <https://github.com/EleutherAI/gpt-neo>`__ repository by Sid
|
||||
Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like causal language model trained on the
|
||||
`Pile <https://pile.eleuther.ai/>`__ dataset.
|
||||
|
||||
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
|
||||
256 tokens.
|
||||
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
|
||||
|
||||
Generation
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
The :obj:`generate()` method can be used to generate text using GPT Neo model.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import GPTNeoForCausalLM, GPT2Tokenizer
|
||||
>>> model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
|
||||
>>> tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
|
||||
|
||||
GPTNeoConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTNeoConfig
|
||||
:members:
|
||||
|
||||
|
||||
GPTNeoModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTNeoModel
|
||||
:members: forward
|
||||
|
||||
|
||||
GPTNeoForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTNeoForCausalLM
|
||||
:members: forward
|
||||
|
||||
GPTNeoForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTNeoForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
FlaxGPTNeoModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPTNeoModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxGPTNeoForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPTNeoForCausalLM
|
||||
:members: __call__
|
124
docs/source/model_doc/gptj.mdx
Normal file
124
docs/source/model_doc/gptj.mdx
Normal file
@ -0,0 +1,124 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# GPT-J
|
||||
|
||||
## Overview
|
||||
|
||||
The GPT-J model was released in the [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax) repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like
|
||||
causal language model trained on [the Pile](https://pile.eleuther.ai/) dataset.
|
||||
|
||||
This model was contributed by [Stella Biderman](https://huggingface.co/stellaathena).
|
||||
|
||||
Tips:
|
||||
|
||||
- To load [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B) in float32 one would need at least 2x model size CPU
|
||||
RAM: 1x for initial weights and another 1x to load the checkpoint. So for GPT-J it would take at least 48GB of CPU
|
||||
RAM to just load the model. To reduce the CPU RAM usage there are a few options. The `torch_dtype` argument can be
|
||||
used to initialize the model in half-precision. And the `low_cpu_mem_usage` argument can be used to keep the RAM
|
||||
usage to 1x. There is also a [fp16 branch](https://huggingface.co/EleutherAI/gpt-j-6B/tree/float16) which stores
|
||||
the fp16 weights, which could be used to further minimize the RAM usage. Combining all this it should take roughly
|
||||
12.1GB of CPU RAM to load the model.
|
||||
|
||||
```python
|
||||
>>> from transformers import GPTJForCausalLM
|
||||
>>> import torch
|
||||
|
||||
>>> model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)
|
||||
```
|
||||
|
||||
- The model should fit on 16GB GPU for inference. For training/fine-tuning it would take much more GPU RAM. Adam
|
||||
optimizer for example makes four copies of the model: model, gradients, average and squared average of the gradients.
|
||||
So it would need at least 4x model size GPU memory, even with mixed precision as gradient updates are in fp32. This
|
||||
is not including the activations and data batches, which would again require some more GPU RAM. So one should explore
|
||||
solutions such as DeepSpeed, to train/fine-tune the model. Another option is to use the original codebase to
|
||||
train/fine-tune the model on TPU and then convert the model to Transformers format for inference. Instructions for
|
||||
that could be found [here](https://github.com/kingoflolz/mesh-transformer-jax/blob/master/howto_finetune.md)
|
||||
|
||||
- Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer. These extra
|
||||
tokens are added for the sake of efficiency on TPUs. To avoid the mis-match between embedding matrix size and vocab
|
||||
size, the tokenizer for [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B) contains 143 extra tokens
|
||||
`<|extratoken_1|>... <|extratoken_143|>`, so the `vocab_size` of tokenizer also becomes 50400.
|
||||
|
||||
### Generation
|
||||
|
||||
The [`~generation_utils.GenerationMixin.generate`] method can be used to generate text using GPT-J
|
||||
model.
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
>>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
```
|
||||
|
||||
...or in float16 precision:
|
||||
|
||||
```python
|
||||
>>> from transformers import GPTJForCausalLM, AutoTokenizer
|
||||
>>> import torch
|
||||
|
||||
>>> model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
```
|
||||
|
||||
## GPTJConfig
|
||||
|
||||
[[autodoc]] GPTJConfig
|
||||
- all
|
||||
|
||||
## GPTJModel
|
||||
|
||||
[[autodoc]] GPTJModel
|
||||
- forward
|
||||
|
||||
## GPTJForCausalLM
|
||||
|
||||
[[autodoc]] GPTJForCausalLM
|
||||
- forward
|
||||
|
||||
## GPTJForSequenceClassification
|
||||
|
||||
[[autodoc]] GPTJForSequenceClassification
|
||||
- forward
|
||||
|
||||
## GPTJForQuestionAnswering
|
||||
|
||||
[[autodoc]] GPTJForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## FlaxGPTJModel
|
||||
|
||||
[[autodoc]] FlaxGPTJModel
|
||||
- __call__
|
||||
|
||||
## FlaxGPTJForCausalLM
|
||||
|
||||
[[autodoc]] FlaxGPTJForCausalLM
|
||||
- __call__
|
@ -1,142 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
GPT-J
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The GPT-J model was released in the `kingoflolz/mesh-transformer-jax
|
||||
<https://github.com/kingoflolz/mesh-transformer-jax>`__ repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like
|
||||
causal language model trained on `the Pile <https://pile.eleuther.ai/>`__ dataset.
|
||||
|
||||
This model was contributed by `Stella Biderman <https://huggingface.co/stellaathena>`__.
|
||||
|
||||
Tips:
|
||||
|
||||
- To load `GPT-J <https://huggingface.co/EleutherAI/gpt-j-6B>`__ in float32 one would need at least 2x model size CPU
|
||||
RAM: 1x for initial weights and another 1x to load the checkpoint. So for GPT-J it would take at least 48GB of CPU
|
||||
RAM to just load the model. To reduce the CPU RAM usage there are a few options. The ``torch_dtype`` argument can be
|
||||
used to initialize the model in half-precision. And the ``low_cpu_mem_usage`` argument can be used to keep the RAM
|
||||
usage to 1x. There is also a `fp16 branch <https://huggingface.co/EleutherAI/gpt-j-6B/tree/float16>`__ which stores
|
||||
the fp16 weights, which could be used to further minimize the RAM usage. Combining all this it should take roughly
|
||||
12.1GB of CPU RAM to load the model.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import GPTJForCausalLM
|
||||
>>> import torch
|
||||
|
||||
>>> model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)
|
||||
|
||||
|
||||
- The model should fit on 16GB GPU for inference. For training/fine-tuning it would take much more GPU RAM. Adam
|
||||
optimizer for example makes four copies of the model: model, gradients, average and squared average of the gradients.
|
||||
So it would need at least 4x model size GPU memory, even with mixed precision as gradient updates are in fp32. This
|
||||
is not including the activations and data batches, which would again require some more GPU RAM. So one should explore
|
||||
solutions such as DeepSpeed, to train/fine-tune the model. Another option is to use the original codebase to
|
||||
train/fine-tune the model on TPU and then convert the model to Transformers format for inference. Instructions for
|
||||
that could be found `here <https://github.com/kingoflolz/mesh-transformer-jax/blob/master/howto_finetune.md>`__
|
||||
|
||||
- Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer. These extra
|
||||
tokens are added for the sake of efficiency on TPUs. To avoid the mis-match between embedding matrix size and vocab
|
||||
size, the tokenizer for `GPT-J <https://huggingface.co/EleutherAI/gpt-j-6B>`__ contains 143 extra tokens
|
||||
``<|extratoken_1|>... <|extratoken_143|>``, so the ``vocab_size`` of tokenizer also becomes 50400.
|
||||
|
||||
Generation
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
The :meth:`~transformers.generation_utils.GenerationMixin.generate` method can be used to generate text using GPT-J
|
||||
model.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
>>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
|
||||
...or in float16 precision:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import GPTJForCausalLM, AutoTokenizer
|
||||
>>> import torch
|
||||
|
||||
>>> model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
|
||||
|
||||
GPTJConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJConfig
|
||||
:members:
|
||||
|
||||
GPTJModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJModel
|
||||
:members: forward
|
||||
|
||||
|
||||
GPTJForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
GPTJForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
GPTJForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaxGPTJModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPTJModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxGPTJForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPTJForCausalLM
|
||||
:members: __call__
|
65
docs/source/model_doc/herbert.mdx
Normal file
65
docs/source/model_doc/herbert.mdx
Normal file
@ -0,0 +1,65 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# HerBERT
|
||||
|
||||
## Overview
|
||||
|
||||
The HerBERT model was proposed in [KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://www.aclweb.org/anthology/2020.acl-main.111.pdf) by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, and
|
||||
Ireneusz Gawlik. It is a BERT-based Language Model trained on Polish Corpora using only MLM objective with dynamic
|
||||
masking of whole words.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*In recent years, a series of Transformer-based models unlocked major improvements in general natural language
|
||||
understanding (NLU) tasks. Such a fast pace of research would not be possible without general NLU benchmarks, which
|
||||
allow for a fair comparison of the proposed methods. However, such benchmarks are available only for a handful of
|
||||
languages. To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language
|
||||
understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing
|
||||
datasets for named entity recognition, question-answering, textual entailment, and others. We also introduce a new
|
||||
sentiment analysis task for the e-commerce domain, named Allegro Reviews (AR). To ensure a common evaluation scheme and
|
||||
promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and
|
||||
applications. Additionally, we release HerBERT, a Transformer-based model trained specifically for the Polish language,
|
||||
which has the best average performance and obtains the best results for three out of nine tasks. Finally, we provide an
|
||||
extensive evaluation, including several standard baselines and recently proposed, multilingual Transformer-based
|
||||
models.*
|
||||
|
||||
Examples of use:
|
||||
|
||||
```python
|
||||
>>> from transformers import HerbertTokenizer, RobertaModel
|
||||
|
||||
>>> tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
>>> model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
|
||||
>>> encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd – to jasne.", return_tensors='pt')
|
||||
>>> outputs = model(encoded_input)
|
||||
|
||||
>>> # HerBERT can also be loaded using AutoTokenizer and AutoModel:
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
```
|
||||
|
||||
This model was contributed by [rmroczkowski](https://huggingface.co/rmroczkowski). The original code can be found
|
||||
[here](https://github.com/allegro/HerBERT).
|
||||
|
||||
|
||||
## HerbertTokenizer
|
||||
|
||||
[[autodoc]] HerbertTokenizer
|
||||
|
||||
## HerbertTokenizerFast
|
||||
|
||||
[[autodoc]] HerbertTokenizerFast
|
@ -1,73 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
HerBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The HerBERT model was proposed in `KLEJ: Comprehensive Benchmark for Polish Language Understanding
|
||||
<https://www.aclweb.org/anthology/2020.acl-main.111.pdf>`__ by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, and
|
||||
Ireneusz Gawlik. It is a BERT-based Language Model trained on Polish Corpora using only MLM objective with dynamic
|
||||
masking of whole words.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*In recent years, a series of Transformer-based models unlocked major improvements in general natural language
|
||||
understanding (NLU) tasks. Such a fast pace of research would not be possible without general NLU benchmarks, which
|
||||
allow for a fair comparison of the proposed methods. However, such benchmarks are available only for a handful of
|
||||
languages. To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language
|
||||
understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing
|
||||
datasets for named entity recognition, question-answering, textual entailment, and others. We also introduce a new
|
||||
sentiment analysis task for the e-commerce domain, named Allegro Reviews (AR). To ensure a common evaluation scheme and
|
||||
promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and
|
||||
applications. Additionally, we release HerBERT, a Transformer-based model trained specifically for the Polish language,
|
||||
which has the best average performance and obtains the best results for three out of nine tasks. Finally, we provide an
|
||||
extensive evaluation, including several standard baselines and recently proposed, multilingual Transformer-based
|
||||
models.*
|
||||
|
||||
Examples of use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import HerbertTokenizer, RobertaModel
|
||||
|
||||
>>> tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
>>> model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
|
||||
>>> encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd – to jasne.", return_tensors='pt')
|
||||
>>> outputs = model(encoded_input)
|
||||
|
||||
>>> # HerBERT can also be loaded using AutoTokenizer and AutoModel:
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
|
||||
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||
|
||||
|
||||
This model was contributed by `rmroczkowski <https://huggingface.co/rmroczkowski>`__. The original code can be found
|
||||
`here <https://github.com/allegro/HerBERT>`__.
|
||||
|
||||
|
||||
HerbertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.HerbertTokenizer
|
||||
:members:
|
||||
|
||||
HerbertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.HerbertTokenizerFast
|
||||
:members:
|
71
docs/source/model_doc/hubert.mdx
Normal file
71
docs/source/model_doc/hubert.mdx
Normal file
@ -0,0 +1,71 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Hubert
|
||||
|
||||
## Overview
|
||||
|
||||
Hubert was proposed in [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan
|
||||
Salakhutdinov, Abdelrahman Mohamed.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are
|
||||
multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training
|
||||
phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we
|
||||
propose the Hidden-Unit BERT (HuBERT) approach for self-supervised speech representation learning, which utilizes an
|
||||
offline clustering step to provide aligned target labels for a BERT-like prediction loss. A key ingredient of our
|
||||
approach is applying the prediction loss over the masked regions only, which forces the model to learn a combined
|
||||
acoustic and language model over the continuous inputs. HuBERT relies primarily on the consistency of the unsupervised
|
||||
clustering step rather than the intrinsic quality of the assigned cluster labels. Starting with a simple k-means
|
||||
teacher of 100 clusters, and using two iterations of clustering, the HuBERT model either matches or improves upon the
|
||||
state-of-the-art wav2vec 2.0 performance on the Librispeech (960h) and Libri-light (60,000h) benchmarks with 10min, 1h,
|
||||
10h, 100h, and 960h fine-tuning subsets. Using a 1B parameter model, HuBERT shows up to 19% and 13% relative WER
|
||||
reduction on the more challenging dev-other and test-other evaluation subsets.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Hubert is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
|
||||
- Hubert model was fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded
|
||||
using [`Wav2Vec2CTCTokenizer`].
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
|
||||
|
||||
|
||||
## HubertConfig
|
||||
|
||||
[[autodoc]] HubertConfig
|
||||
|
||||
## HubertModel
|
||||
|
||||
[[autodoc]] HubertModel
|
||||
- forward
|
||||
|
||||
## HubertForCTC
|
||||
|
||||
[[autodoc]] HubertForCTC
|
||||
- forward
|
||||
|
||||
## HubertForSequenceClassification
|
||||
|
||||
[[autodoc]] HubertForSequenceClassification
|
||||
- forward
|
||||
|
||||
## TFHubertModel
|
||||
|
||||
[[autodoc]] TFHubertModel
|
||||
- call
|
||||
|
||||
## TFHubertForCTC
|
||||
|
||||
[[autodoc]] TFHubertForCTC
|
||||
- call
|
@ -1,86 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Hubert
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Hubert was proposed in `HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
|
||||
<https://arxiv.org/abs/2106.07447>`__ by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan
|
||||
Salakhutdinov, Abdelrahman Mohamed.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are
|
||||
multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training
|
||||
phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we
|
||||
propose the Hidden-Unit BERT (HuBERT) approach for self-supervised speech representation learning, which utilizes an
|
||||
offline clustering step to provide aligned target labels for a BERT-like prediction loss. A key ingredient of our
|
||||
approach is applying the prediction loss over the masked regions only, which forces the model to learn a combined
|
||||
acoustic and language model over the continuous inputs. HuBERT relies primarily on the consistency of the unsupervised
|
||||
clustering step rather than the intrinsic quality of the assigned cluster labels. Starting with a simple k-means
|
||||
teacher of 100 clusters, and using two iterations of clustering, the HuBERT model either matches or improves upon the
|
||||
state-of-the-art wav2vec 2.0 performance on the Librispeech (960h) and Libri-light (60,000h) benchmarks with 10min, 1h,
|
||||
10h, 100h, and 960h fine-tuning subsets. Using a 1B parameter model, HuBERT shows up to 19% and 13% relative WER
|
||||
reduction on the more challenging dev-other and test-other evaluation subsets.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Hubert is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
|
||||
- Hubert model was fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded
|
||||
using :class:`~transformers.Wav2Vec2CTCTokenizer`.
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
|
||||
|
||||
|
||||
HubertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.HubertConfig
|
||||
:members:
|
||||
|
||||
|
||||
HubertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.HubertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
HubertForCTC
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.HubertForCTC
|
||||
:members: forward
|
||||
|
||||
|
||||
HubertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.HubertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
TFHubertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFHubertModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFHubertForCTC
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFHubertForCTC
|
||||
:members: call
|
72
docs/source/model_doc/ibert.mdx
Normal file
72
docs/source/model_doc/ibert.mdx
Normal file
@ -0,0 +1,72 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# I-BERT
|
||||
|
||||
## Overview
|
||||
|
||||
The I-BERT model was proposed in [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by
|
||||
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running
|
||||
inference up to four times faster.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language
|
||||
Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for
|
||||
efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this,
|
||||
previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot
|
||||
efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM
|
||||
processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes
|
||||
the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for
|
||||
nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT
|
||||
inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using
|
||||
RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to
|
||||
the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for
|
||||
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
|
||||
been open-sourced.*
|
||||
|
||||
This model was contributed by [kssteven](https://huggingface.co/kssteven). The original code can be found [here](https://github.com/kssteven418/I-BERT).
|
||||
|
||||
|
||||
## IBertConfig
|
||||
|
||||
[[autodoc]] IBertConfig
|
||||
|
||||
## IBertModel
|
||||
|
||||
[[autodoc]] IBertModel
|
||||
- forward
|
||||
|
||||
## IBertForMaskedLM
|
||||
|
||||
[[autodoc]] IBertForMaskedLM
|
||||
- forward
|
||||
|
||||
## IBertForSequenceClassification
|
||||
|
||||
[[autodoc]] IBertForSequenceClassification
|
||||
- forward
|
||||
|
||||
## IBertForMultipleChoice
|
||||
|
||||
[[autodoc]] IBertForMultipleChoice
|
||||
- forward
|
||||
|
||||
## IBertForTokenClassification
|
||||
|
||||
[[autodoc]] IBertForTokenClassification
|
||||
- forward
|
||||
|
||||
## IBertForQuestionAnswering
|
||||
|
||||
[[autodoc]] IBertForQuestionAnswering
|
||||
- forward
|
@ -1,89 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
I-BERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The I-BERT model was proposed in `I-BERT: Integer-only BERT Quantization <https://arxiv.org/abs/2101.01321>`__ by
|
||||
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running
|
||||
inference up to four times faster.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language
|
||||
Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for
|
||||
efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this,
|
||||
previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot
|
||||
efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM
|
||||
processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes
|
||||
the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for
|
||||
nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT
|
||||
inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using
|
||||
RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to
|
||||
the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for
|
||||
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
|
||||
been open-sourced.*
|
||||
|
||||
This model was contributed by `kssteven <https://huggingface.co/kssteven>`__. The original code can be found `here
|
||||
<https://github.com/kssteven418/I-BERT>`__.
|
||||
|
||||
|
||||
IBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.IBertConfig
|
||||
:members:
|
||||
|
||||
|
||||
IBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.IBertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
IBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.IBertForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
IBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.IBertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
IBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.IBertForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
IBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.IBertForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
IBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.IBertForQuestionAnswering
|
||||
:members: forward
|
124
docs/source/model_doc/layoutlm.mdx
Normal file
124
docs/source/model_doc/layoutlm.mdx
Normal file
@ -0,0 +1,124 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# LayoutLM
|
||||
|
||||
<a id='Overview'></a>
|
||||
|
||||
## Overview
|
||||
|
||||
The LayoutLM model was proposed in the paper [LayoutLM: Pre-training of Text and Layout for Document Image
|
||||
Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and
|
||||
Ming Zhou. It's a simple but effective pretraining method of text and layout for document image understanding and
|
||||
information extraction tasks, such as form understanding and receipt understanding. It obtains state-of-the-art results
|
||||
on several downstream tasks:
|
||||
|
||||
- form understanding: the [FUNSD](https://guillaumejaume.github.io/FUNSD/) dataset (a collection of 199 annotated
|
||||
forms comprising more than 30,000 words).
|
||||
- receipt understanding: the [SROIE](https://rrc.cvc.uab.es/?ch=13) dataset (a collection of 626 receipts for
|
||||
training and 347 receipts for testing).
|
||||
- document image classification: the [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/) dataset (a collection of
|
||||
400,000 images belonging to one of 16 classes).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the
|
||||
widespread use of pretraining models for NLP applications, they almost exclusively focus on text-level manipulation,
|
||||
while neglecting layout and style information that is vital for document image understanding. In this paper, we propose
|
||||
the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is
|
||||
beneficial for a great number of real-world document image understanding tasks such as information extraction from
|
||||
scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM.
|
||||
To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for
|
||||
document-level pretraining. It achieves new state-of-the-art results in several downstream tasks, including form
|
||||
understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification
|
||||
(from 93.07 to 94.42).*
|
||||
|
||||
Tips:
|
||||
|
||||
- In addition to *input_ids*, [`~transformers.LayoutLMModel.forward`] also expects the input `bbox`, which are
|
||||
the bounding boxes (i.e. 2D-positions) of the input tokens. These can be obtained using an external OCR engine such
|
||||
as Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) (there's a [Python wrapper](https://pypi.org/project/pytesseract/) available). Each bounding box should be in (x0, y0, x1, y1) format, where
|
||||
(x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1) represents the
|
||||
position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on a 0-1000
|
||||
scale. To normalize, you can use the following function:
|
||||
|
||||
```python
|
||||
def normalize_bbox(bbox, width, height):
|
||||
return [
|
||||
int(1000 * (bbox[0] / width)),
|
||||
int(1000 * (bbox[1] / height)),
|
||||
int(1000 * (bbox[2] / width)),
|
||||
int(1000 * (bbox[3] / height)),
|
||||
]
|
||||
```
|
||||
|
||||
Here, `width` and `height` correspond to the width and height of the original document in which the token
|
||||
occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:
|
||||
|
||||
```python
|
||||
from PIL import Image
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
|
||||
|
||||
width, height = image.size
|
||||
```
|
||||
|
||||
- For a demo which shows how to fine-tune [`LayoutLMForTokenClassification`] on the [FUNSD dataset](https://guillaumejaume.github.io/FUNSD/) (a collection of annotated forms), see [this notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb).
|
||||
It includes an inference part, which shows how to use Google's Tesseract on a new document.
|
||||
|
||||
This model was contributed by [liminghao1630](https://huggingface.co/liminghao1630). The original code can be found
|
||||
[here](https://github.com/microsoft/unilm/tree/master/layoutlm).
|
||||
|
||||
|
||||
## LayoutLMConfig
|
||||
|
||||
[[autodoc]] LayoutLMConfig
|
||||
|
||||
## LayoutLMTokenizer
|
||||
|
||||
[[autodoc]] LayoutLMTokenizer
|
||||
|
||||
## LayoutLMTokenizerFast
|
||||
|
||||
[[autodoc]] LayoutLMTokenizerFast
|
||||
|
||||
## LayoutLMModel
|
||||
|
||||
[[autodoc]] LayoutLMModel
|
||||
|
||||
## LayoutLMForMaskedLM
|
||||
|
||||
[[autodoc]] LayoutLMForMaskedLM
|
||||
|
||||
## LayoutLMForSequenceClassification
|
||||
|
||||
[[autodoc]] LayoutLMForSequenceClassification
|
||||
|
||||
## LayoutLMForTokenClassification
|
||||
|
||||
[[autodoc]] LayoutLMForTokenClassification
|
||||
|
||||
## TFLayoutLMModel
|
||||
|
||||
[[autodoc]] TFLayoutLMModel
|
||||
|
||||
## TFLayoutLMForMaskedLM
|
||||
|
||||
[[autodoc]] TFLayoutLMForMaskedLM
|
||||
|
||||
## TFLayoutLMForSequenceClassification
|
||||
|
||||
[[autodoc]] TFLayoutLMForSequenceClassification
|
||||
|
||||
## TFLayoutLMForTokenClassification
|
||||
|
||||
[[autodoc]] TFLayoutLMForTokenClassification
|
@ -1,161 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LayoutLM
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
.. _Overview:
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The LayoutLM model was proposed in the paper `LayoutLM: Pre-training of Text and Layout for Document Image
|
||||
Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and
|
||||
Ming Zhou. It's a simple but effective pretraining method of text and layout for document image understanding and
|
||||
information extraction tasks, such as form understanding and receipt understanding. It obtains state-of-the-art results
|
||||
on several downstream tasks:
|
||||
|
||||
- form understanding: the `FUNSD <https://guillaumejaume.github.io/FUNSD/>`__ dataset (a collection of 199 annotated
|
||||
forms comprising more than 30,000 words).
|
||||
- receipt understanding: the `SROIE <https://rrc.cvc.uab.es/?ch=13>`__ dataset (a collection of 626 receipts for
|
||||
training and 347 receipts for testing).
|
||||
- document image classification: the `RVL-CDIP <https://www.cs.cmu.edu/~aharley/rvl-cdip/>`__ dataset (a collection of
|
||||
400,000 images belonging to one of 16 classes).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the
|
||||
widespread use of pretraining models for NLP applications, they almost exclusively focus on text-level manipulation,
|
||||
while neglecting layout and style information that is vital for document image understanding. In this paper, we propose
|
||||
the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is
|
||||
beneficial for a great number of real-world document image understanding tasks such as information extraction from
|
||||
scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM.
|
||||
To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for
|
||||
document-level pretraining. It achieves new state-of-the-art results in several downstream tasks, including form
|
||||
understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification
|
||||
(from 93.07 to 94.42).*
|
||||
|
||||
Tips:
|
||||
|
||||
- In addition to `input_ids`, :meth:`~transformer.LayoutLMModel.forward` also expects the input :obj:`bbox`, which are
|
||||
the bounding boxes (i.e. 2D-positions) of the input tokens. These can be obtained using an external OCR engine such
|
||||
as Google's `Tesseract <https://github.com/tesseract-ocr/tesseract>`__ (there's a `Python wrapper
|
||||
<https://pypi.org/project/pytesseract/>`__ available). Each bounding box should be in (x0, y0, x1, y1) format, where
|
||||
(x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1) represents the
|
||||
position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on a 0-1000
|
||||
scale. To normalize, you can use the following function:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def normalize_bbox(bbox, width, height):
|
||||
return [
|
||||
int(1000 * (bbox[0] / width)),
|
||||
int(1000 * (bbox[1] / height)),
|
||||
int(1000 * (bbox[2] / width)),
|
||||
int(1000 * (bbox[3] / height)),
|
||||
]
|
||||
|
||||
Here, :obj:`width` and :obj:`height` correspond to the width and height of the original document in which the token
|
||||
occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from PIL import Image
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
|
||||
|
||||
width, height = image.size
|
||||
|
||||
- For a demo which shows how to fine-tune :class:`LayoutLMForTokenClassification` on the `FUNSD dataset
|
||||
<https://guillaumejaume.github.io/FUNSD/>`__ (a collection of annotated forms), see `this notebook
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
|
||||
It includes an inference part, which shows how to use Google's Tesseract on a new document.
|
||||
|
||||
This model was contributed by `liminghao1630 <https://huggingface.co/liminghao1630>`__. The original code can be found
|
||||
`here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
|
||||
|
||||
|
||||
LayoutLMConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMConfig
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMModel
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMForMaskedLM
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMForSequenceClassification
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMForTokenClassification
|
||||
:members:
|
||||
|
||||
|
||||
TFLayoutLMModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLayoutLMModel
|
||||
:members:
|
||||
|
||||
|
||||
TFLayoutLMForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLayoutLMForMaskedLM
|
||||
:members:
|
||||
|
||||
|
||||
TFLayoutLMForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLayoutLMForSequenceClassification
|
||||
:members:
|
||||
|
||||
|
||||
TFLayoutLMForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLayoutLMForTokenClassification
|
||||
:members:
|
287
docs/source/model_doc/layoutlmv2.mdx
Normal file
287
docs/source/model_doc/layoutlmv2.mdx
Normal file
@ -0,0 +1,287 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# LayoutLMV2
|
||||
|
||||
## Overview
|
||||
|
||||
The LayoutLMV2 model was proposed in [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu,
|
||||
Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves [LayoutLM](layoutlm) to obtain
|
||||
state-of-the-art results across several document image understanding benchmarks:
|
||||
|
||||
- information extraction from scanned documents: the [FUNSD](https://guillaumejaume.github.io/FUNSD/) dataset (a
|
||||
collection of 199 annotated forms comprising more than 30,000 words), the [CORD](https://github.com/clovaai/cord)
|
||||
dataset (a collection of 800 receipts for training, 100 for validation and 100 for testing), the [SROIE](https://rrc.cvc.uab.es/?ch=13) dataset (a collection of 626 receipts for training and 347 receipts for testing)
|
||||
and the [Kleister-NDA](https://github.com/applicaai/kleister-nda) dataset (a collection of non-disclosure
|
||||
agreements from the EDGAR database, including 254 documents for training, 83 documents for validation, and 203
|
||||
documents for testing).
|
||||
- document image classification: the [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/) dataset (a collection of
|
||||
400,000 images belonging to one of 16 classes).
|
||||
- document visual question answering: the [DocVQA](https://arxiv.org/abs/2007.00398) dataset (a collection of 50,000
|
||||
questions defined on 12,000+ document images).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to
|
||||
its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. In this
|
||||
paper, we present LayoutLMv2 by pre-training text, layout and image in a multi-modal framework, where new model
|
||||
architectures and pre-training tasks are leveraged. Specifically, LayoutLMv2 not only uses the existing masked
|
||||
visual-language modeling task but also the new text-image alignment and text-image matching tasks in the pre-training
|
||||
stage, where cross-modality interaction is better learned. Meanwhile, it also integrates a spatial-aware self-attention
|
||||
mechanism into the Transformer architecture, so that the model can fully understand the relative positional
|
||||
relationship among different text blocks. Experiment results show that LayoutLMv2 outperforms strong baselines and
|
||||
achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks,
|
||||
including FUNSD (0.7895 -> 0.8420), CORD (0.9493 -> 0.9601), SROIE (0.9524 -> 0.9781), Kleister-NDA (0.834 -> 0.852),
|
||||
RVL-CDIP (0.9443 -> 0.9564), and DocVQA (0.7295 -> 0.8672). The pre-trained LayoutLMv2 model is publicly available at
|
||||
this https URL.*
|
||||
|
||||
Tips:
|
||||
|
||||
- The main difference between LayoutLMv1 and LayoutLMv2 is that the latter incorporates visual embeddings during
|
||||
pre-training (while LayoutLMv1 only adds visual embeddings during fine-tuning).
|
||||
- LayoutLMv2 adds both a relative 1D attention bias as well as a spatial 2D attention bias to the attention scores in
|
||||
the self-attention layers. Details can be found on page 5 of the [paper](https://arxiv.org/abs/2012.14740).
|
||||
- Demo notebooks on how to use the LayoutLMv2 model on RVL-CDIP, FUNSD, DocVQA, CORD can be found [here](https://github.com/NielsRogge/Transformers-Tutorials).
|
||||
- LayoutLMv2 uses Facebook AI's [Detectron2](https://github.com/facebookresearch/detectron2/) package for its visual
|
||||
backbone. See [this link](https://detectron2.readthedocs.io/en/latest/tutorials/install.html) for installation
|
||||
instructions.
|
||||
- In addition to `input_ids`, [`~LayoutLMv2Model.forward`] expects 2 additional inputs, namely
|
||||
`image` and `bbox`. The `image` input corresponds to the original document image in which the text
|
||||
tokens occur. The model expects each document image to be of size 224x224. This means that if you have a batch of
|
||||
document images, `image` should be a tensor of shape (batch_size, 3, 224, 224). This can be either a
|
||||
`torch.Tensor` or a `Detectron2.structures.ImageList`. You don't need to normalize the channels, as this is
|
||||
done by the model. Important to note is that the visual backbone expects BGR channels instead of RGB, as all models
|
||||
in Detectron2 are pre-trained using the BGR format. The `bbox` input are the bounding boxes (i.e. 2D-positions)
|
||||
of the input text tokens. This is identical to [`LayoutLMModel`]. These can be obtained using an
|
||||
external OCR engine such as Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) (there's a [Python
|
||||
wrapper](https://pypi.org/project/pytesseract/) available). Each bounding box should be in (x0, y0, x1, y1)
|
||||
format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1)
|
||||
represents the position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on
|
||||
a 0-1000 scale. To normalize, you can use the following function:
|
||||
|
||||
```python
|
||||
def normalize_bbox(bbox, width, height):
|
||||
return [
|
||||
int(1000 * (bbox[0] / width)),
|
||||
int(1000 * (bbox[1] / height)),
|
||||
int(1000 * (bbox[2] / width)),
|
||||
int(1000 * (bbox[3] / height)),
|
||||
]
|
||||
```
|
||||
|
||||
Here, `width` and `height` correspond to the width and height of the original document in which the token
|
||||
occurs (before resizing the image). Those can be obtained using the Python Image Library (PIL) library for example, as
|
||||
follows:
|
||||
|
||||
```python
|
||||
from PIL import Image
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
|
||||
|
||||
width, height = image.size
|
||||
```
|
||||
|
||||
However, this model includes a brand new [`~transformers.LayoutLMv2Processor`] which can be used to directly
|
||||
prepare data for the model (including applying OCR under the hood). More information can be found in the "Usage"
|
||||
section below.
|
||||
|
||||
- Internally, [`~transformers.LayoutLMv2Model`] will send the `image` input through its visual backbone to
|
||||
obtain a lower-resolution feature map, whose shape is equal to the `image_feature_pool_shape` attribute of
|
||||
[`~transformers.LayoutLMv2Config`]. This feature map is then flattened to obtain a sequence of image tokens. As
|
||||
the size of the feature map is 7x7 by default, one obtains 49 image tokens. These are then concatenated with the text
|
||||
tokens, and send through the Transformer encoder. This means that the last hidden states of the model will have a
|
||||
length of 512 + 49 = 561, if you pad the text tokens up to the max length. More generally, the last hidden states
|
||||
will have a shape of `seq_length` + `image_feature_pool_shape[0]` *
|
||||
`config.image_feature_pool_shape[1]`.
|
||||
- When calling [`~transformers.LayoutLMv2Model.from_pretrained`], a warning will be printed with a long list of
|
||||
parameter names that are not initialized. This is not a problem, as these parameters are batch normalization
|
||||
statistics, which are going to have values when fine-tuning on a custom dataset.
|
||||
- If you want to train the model in a distributed environment, make sure to call [`synchronize_batch_norm`] on the
|
||||
model in order to properly synchronize the batch normalization layers of the visual backbone.
|
||||
|
||||
In addition, there's LayoutXLM, which is a multilingual version of LayoutLMv2. More information can be found on
|
||||
[LayoutXLM's documentation page](layoutxlm).
|
||||
|
||||
## Usage: LayoutLMv2Processor
|
||||
|
||||
The easiest way to prepare data for the model is to use [`LayoutLMv2Processor`], which internally
|
||||
combines a feature extractor ([`LayoutLMv2FeatureExtractor`]) and a tokenizer
|
||||
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The feature extractor
|
||||
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
|
||||
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
|
||||
modality.
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor() # apply_ocr is set to True by default
|
||||
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
|
||||
```
|
||||
|
||||
In short, one can provide a document image (and possibly additional data) to [`LayoutLMv2Processor`],
|
||||
and it will create the inputs expected by the model. Internally, the processor first uses
|
||||
[`LayoutLMv2FeatureExtractor`] to apply OCR on the image to get a list of words and normalized
|
||||
bounding boxes, as well to resize the image to a given size in order to get the `image` input. The words and
|
||||
normalized bounding boxes are then provided to [`LayoutLMv2Tokenizer`] or
|
||||
[`LayoutLMv2TokenizerFast`], which converts them to token-level `input_ids`,
|
||||
`attention_mask`, `token_type_ids`, `bbox`. Optionally, one can provide word labels to the processor,
|
||||
which are turned into token-level `labels`.
|
||||
|
||||
[`LayoutLMv2Processor`] uses [PyTesseract](https://pypi.org/project/pytesseract/), a Python
|
||||
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
|
||||
choice, and provide the words and normalized boxes yourself. This requires initializing
|
||||
[`LayoutLMv2FeatureExtractor`] with `apply_ocr` set to `False`.
|
||||
|
||||
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
|
||||
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
|
||||
|
||||
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
|
||||
True**
|
||||
|
||||
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
|
||||
the words and normalized bounding boxes.
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
encoding = processor(image, return_tensors="pt") # you can also add all tokenizer parameters here such as padding, truncation
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
```
|
||||
|
||||
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
|
||||
|
||||
In case one wants to do OCR themselves, one can initialize the feature extractor with `apply_ocr` set to
|
||||
`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
|
||||
the processor.
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
```
|
||||
|
||||
**Use case 3: token classification (training), apply_ocr=False**
|
||||
|
||||
For token classification tasks (such as FUNSD, CORD, SROIE, Kleister-NDA), one can also provide the corresponding word
|
||||
labels in order to train a model. The processor will then convert these into token-level `labels`. By default, it
|
||||
will only label the first wordpiece of a word, and label the remaining wordpieces with -100, which is the
|
||||
`ignore_index` of PyTorch's CrossEntropyLoss. In case you want all wordpieces of a word to be labeled, you can
|
||||
initialize the tokenizer with `only_label_first_subword` set to `False`.
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
word_labels = [1, 2]
|
||||
encoding = processor(image, words, boxes=boxes, word_labels=word_labels, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'labels', 'image'])
|
||||
```
|
||||
|
||||
**Use case 4: visual question answering (inference), apply_ocr=True**
|
||||
|
||||
For visual question answering tasks (such as DocVQA), you can provide a question to the processor. By default, the
|
||||
processor will apply OCR on the image, and create [CLS] question tokens [SEP] word tokens [SEP].
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
question = "What's his name?"
|
||||
encoding = processor(image, question, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
```
|
||||
|
||||
**Use case 5: visual question answering (inference), apply_ocr=False**
|
||||
|
||||
For visual question answering tasks (such as DocVQA), you can provide a question to the processor. If you want to
|
||||
perform OCR yourself, you can provide your own words and (normalized) bounding boxes to the processor.
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
question = "What's his name?"
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
encoding = processor(image, question, words, boxes=boxes, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
```
|
||||
|
||||
## LayoutLMv2Config
|
||||
|
||||
[[autodoc]] LayoutLMv2Config
|
||||
|
||||
## LayoutLMv2FeatureExtractor
|
||||
|
||||
[[autodoc]] LayoutLMv2FeatureExtractor
|
||||
- __call__
|
||||
|
||||
## LayoutLMv2Tokenizer
|
||||
|
||||
[[autodoc]] LayoutLMv2Tokenizer
|
||||
- __call__
|
||||
- save_vocabulary
|
||||
|
||||
## LayoutLMv2TokenizerFast
|
||||
|
||||
[[autodoc]] LayoutLMv2TokenizerFast
|
||||
- __call__
|
||||
|
||||
## LayoutLMv2Processor
|
||||
|
||||
[[autodoc]] LayoutLMv2Processor
|
||||
- __call__
|
||||
|
||||
## LayoutLMv2Model
|
||||
|
||||
[[autodoc]] LayoutLMv2Model
|
||||
- forward
|
||||
|
||||
## LayoutLMv2ForSequenceClassification
|
||||
|
||||
[[autodoc]] LayoutLMv2ForSequenceClassification
|
||||
|
||||
## LayoutLMv2ForTokenClassification
|
||||
|
||||
[[autodoc]] LayoutLMv2ForTokenClassification
|
||||
|
||||
## LayoutLMv2ForQuestionAnswering
|
||||
|
||||
[[autodoc]] LayoutLMv2ForQuestionAnswering
|
@ -1,313 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LayoutLMV2
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The LayoutLMV2 model was proposed in `LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
|
||||
<https://arxiv.org/abs/2012.14740>`__ by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu,
|
||||
Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves `LayoutLM <layoutlm>`__ to obtain
|
||||
state-of-the-art results across several document image understanding benchmarks:
|
||||
|
||||
- information extraction from scanned documents: the `FUNSD <https://guillaumejaume.github.io/FUNSD/>`__ dataset (a
|
||||
collection of 199 annotated forms comprising more than 30,000 words), the `CORD <https://github.com/clovaai/cord>`__
|
||||
dataset (a collection of 800 receipts for training, 100 for validation and 100 for testing), the `SROIE
|
||||
<https://rrc.cvc.uab.es/?ch=13>`__ dataset (a collection of 626 receipts for training and 347 receipts for testing)
|
||||
and the `Kleister-NDA <https://github.com/applicaai/kleister-nda>`__ dataset (a collection of non-disclosure
|
||||
agreements from the EDGAR database, including 254 documents for training, 83 documents for validation, and 203
|
||||
documents for testing).
|
||||
- document image classification: the `RVL-CDIP <https://www.cs.cmu.edu/~aharley/rvl-cdip/>`__ dataset (a collection of
|
||||
400,000 images belonging to one of 16 classes).
|
||||
- document visual question answering: the `DocVQA <https://arxiv.org/abs/2007.00398>`__ dataset (a collection of 50,000
|
||||
questions defined on 12,000+ document images).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to
|
||||
its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. In this
|
||||
paper, we present LayoutLMv2 by pre-training text, layout and image in a multi-modal framework, where new model
|
||||
architectures and pre-training tasks are leveraged. Specifically, LayoutLMv2 not only uses the existing masked
|
||||
visual-language modeling task but also the new text-image alignment and text-image matching tasks in the pre-training
|
||||
stage, where cross-modality interaction is better learned. Meanwhile, it also integrates a spatial-aware self-attention
|
||||
mechanism into the Transformer architecture, so that the model can fully understand the relative positional
|
||||
relationship among different text blocks. Experiment results show that LayoutLMv2 outperforms strong baselines and
|
||||
achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks,
|
||||
including FUNSD (0.7895 -> 0.8420), CORD (0.9493 -> 0.9601), SROIE (0.9524 -> 0.9781), Kleister-NDA (0.834 -> 0.852),
|
||||
RVL-CDIP (0.9443 -> 0.9564), and DocVQA (0.7295 -> 0.8672). The pre-trained LayoutLMv2 model is publicly available at
|
||||
this https URL.*
|
||||
|
||||
Tips:
|
||||
|
||||
- The main difference between LayoutLMv1 and LayoutLMv2 is that the latter incorporates visual embeddings during
|
||||
pre-training (while LayoutLMv1 only adds visual embeddings during fine-tuning).
|
||||
- LayoutLMv2 adds both a relative 1D attention bias as well as a spatial 2D attention bias to the attention scores in
|
||||
the self-attention layers. Details can be found on page 5 of the `paper <https://arxiv.org/abs/2012.14740>`__.
|
||||
- Demo notebooks on how to use the LayoutLMv2 model on RVL-CDIP, FUNSD, DocVQA, CORD can be found `here
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials>`__.
|
||||
- LayoutLMv2 uses Facebook AI's `Detectron2 <https://github.com/facebookresearch/detectron2/>`__ package for its visual
|
||||
backbone. See `this link <https://detectron2.readthedocs.io/en/latest/tutorials/install.html>`__ for installation
|
||||
instructions.
|
||||
- In addition to :obj:`input_ids`, :meth:`~transformer.LayoutLMv2Model.forward` expects 2 additional inputs, namely
|
||||
:obj:`image` and :obj:`bbox`. The :obj:`image` input corresponds to the original document image in which the text
|
||||
tokens occur. The model expects each document image to be of size 224x224. This means that if you have a batch of
|
||||
document images, :obj:`image` should be a tensor of shape (batch_size, 3, 224, 224). This can be either a
|
||||
:obj:`torch.Tensor` or a :obj:`Detectron2.structures.ImageList`. You don't need to normalize the channels, as this is
|
||||
done by the model. Important to note is that the visual backbone expects BGR channels instead of RGB, as all models
|
||||
in Detectron2 are pre-trained using the BGR format. The :obj:`bbox` input are the bounding boxes (i.e. 2D-positions)
|
||||
of the input text tokens. This is identical to :class:`~transformer.LayoutLMModel`. These can be obtained using an
|
||||
external OCR engine such as Google's `Tesseract <https://github.com/tesseract-ocr/tesseract>`__ (there's a `Python
|
||||
wrapper <https://pypi.org/project/pytesseract/>`__ available). Each bounding box should be in (x0, y0, x1, y1)
|
||||
format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1)
|
||||
represents the position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on
|
||||
a 0-1000 scale. To normalize, you can use the following function:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def normalize_bbox(bbox, width, height):
|
||||
return [
|
||||
int(1000 * (bbox[0] / width)),
|
||||
int(1000 * (bbox[1] / height)),
|
||||
int(1000 * (bbox[2] / width)),
|
||||
int(1000 * (bbox[3] / height)),
|
||||
]
|
||||
|
||||
Here, :obj:`width` and :obj:`height` correspond to the width and height of the original document in which the token
|
||||
occurs (before resizing the image). Those can be obtained using the Python Image Library (PIL) library for example, as
|
||||
follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from PIL import Image
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
|
||||
|
||||
width, height = image.size
|
||||
|
||||
However, this model includes a brand new :class:`~transformer.LayoutLMv2Processor` which can be used to directly
|
||||
prepare data for the model (including applying OCR under the hood). More information can be found in the "Usage"
|
||||
section below.
|
||||
|
||||
- Internally, :class:`~transformer.LayoutLMv2Model` will send the :obj:`image` input through its visual backbone to
|
||||
obtain a lower-resolution feature map, whose shape is equal to the :obj:`image_feature_pool_shape` attribute of
|
||||
:class:`~transformer.LayoutLMv2Config`. This feature map is then flattened to obtain a sequence of image tokens. As
|
||||
the size of the feature map is 7x7 by default, one obtains 49 image tokens. These are then concatenated with the text
|
||||
tokens, and send through the Transformer encoder. This means that the last hidden states of the model will have a
|
||||
length of 512 + 49 = 561, if you pad the text tokens up to the max length. More generally, the last hidden states
|
||||
will have a shape of :obj:`seq_length` + :obj:`image_feature_pool_shape[0]` *
|
||||
:obj:`config.image_feature_pool_shape[1]`.
|
||||
- When calling :meth:`~transformer.LayoutLMv2Model.from_pretrained`, a warning will be printed with a long list of
|
||||
parameter names that are not initialized. This is not a problem, as these parameters are batch normalization
|
||||
statistics, which are going to have values when fine-tuning on a custom dataset.
|
||||
- If you want to train the model in a distributed environment, make sure to call :meth:`synchronize_batch_norm` on the
|
||||
model in order to properly synchronize the batch normalization layers of the visual backbone.
|
||||
|
||||
In addition, there's LayoutXLM, which is a multilingual version of LayoutLMv2. More information can be found on
|
||||
:doc:`LayoutXLM's documentation page <layoutxlm>`.
|
||||
|
||||
Usage: LayoutLMv2Processor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The easiest way to prepare data for the model is to use :class:`~transformer.LayoutLMv2Processor`, which internally
|
||||
combines a feature extractor (:class:`~transformer.LayoutLMv2FeatureExtractor`) and a tokenizer
|
||||
(:class:`~transformer.LayoutLMv2Tokenizer` or :class:`~transformer.LayoutLMv2TokenizerFast`). The feature extractor
|
||||
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
|
||||
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
|
||||
modality.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor() # apply_ocr is set to True by default
|
||||
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
|
||||
|
||||
In short, one can provide a document image (and possibly additional data) to :class:`~transformer.LayoutLMv2Processor`,
|
||||
and it will create the inputs expected by the model. Internally, the processor first uses
|
||||
:class:`~transformer.LayoutLMv2FeatureExtractor` to apply OCR on the image to get a list of words and normalized
|
||||
bounding boxes, as well to resize the image to a given size in order to get the :obj:`image` input. The words and
|
||||
normalized bounding boxes are then provided to :class:`~transformer.LayoutLMv2Tokenizer` or
|
||||
:class:`~transformer.LayoutLMv2TokenizerFast`, which converts them to token-level :obj:`input_ids`,
|
||||
:obj:`attention_mask`, :obj:`token_type_ids`, :obj:`bbox`. Optionally, one can provide word labels to the processor,
|
||||
which are turned into token-level :obj:`labels`.
|
||||
|
||||
:class:`~transformer.LayoutLMv2Processor` uses `PyTesseract <https://pypi.org/project/pytesseract/>`__, a Python
|
||||
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
|
||||
choice, and provide the words and normalized boxes yourself. This requires initializing
|
||||
:class:`~transformer.LayoutLMv2FeatureExtractor` with :obj:`apply_ocr` set to :obj:`False`.
|
||||
|
||||
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
|
||||
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
|
||||
|
||||
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
|
||||
True**
|
||||
|
||||
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
|
||||
the words and normalized bounding boxes.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
encoding = processor(image, return_tensors="pt") # you can also add all tokenizer parameters here such as padding, truncation
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
|
||||
|
||||
In case one wants to do OCR themselves, one can initialize the feature extractor with :obj:`apply_ocr` set to
|
||||
:obj:`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
|
||||
the processor.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
**Use case 3: token classification (training), apply_ocr=False**
|
||||
|
||||
For token classification tasks (such as FUNSD, CORD, SROIE, Kleister-NDA), one can also provide the corresponding word
|
||||
labels in order to train a model. The processor will then convert these into token-level :obj:`labels`. By default, it
|
||||
will only label the first wordpiece of a word, and label the remaining wordpieces with -100, which is the
|
||||
:obj:`ignore_index` of PyTorch's CrossEntropyLoss. In case you want all wordpieces of a word to be labeled, you can
|
||||
initialize the tokenizer with :obj:`only_label_first_subword` set to :obj:`False`.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
word_labels = [1, 2]
|
||||
encoding = processor(image, words, boxes=boxes, word_labels=word_labels, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'labels', 'image'])
|
||||
|
||||
**Use case 4: visual question answering (inference), apply_ocr=True**
|
||||
|
||||
For visual question answering tasks (such as DocVQA), you can provide a question to the processor. By default, the
|
||||
processor will apply OCR on the image, and create [CLS] question tokens [SEP] word tokens [SEP].
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
question = "What's his name?"
|
||||
encoding = processor(image, question, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
**Use case 5: visual question answering (inference), apply_ocr=False**
|
||||
|
||||
For visual question answering tasks (such as DocVQA), you can provide a question to the processor. If you want to
|
||||
perform OCR yourself, you can provide your own words and (normalized) bounding boxes to the processor.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
question = "What's his name?"
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
encoding = processor(image, question, words, boxes=boxes, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
LayoutLMv2Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Config
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMv2FeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2FeatureExtractor
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutLMv2Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Tokenizer
|
||||
:members: __call__, save_vocabulary
|
||||
|
||||
|
||||
LayoutLMv2TokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2TokenizerFast
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutLMv2Processor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Processor
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutLMv2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Model
|
||||
:members: forward
|
||||
|
||||
|
||||
LayoutLMv2ForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2ForSequenceClassification
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMv2ForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2ForTokenClassification
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMv2ForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2ForQuestionAnswering
|
||||
:members:
|
77
docs/source/model_doc/layoutxlm.mdx
Normal file
77
docs/source/model_doc/layoutxlm.mdx
Normal file
@ -0,0 +1,77 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# LayoutXLM
|
||||
|
||||
## Overview
|
||||
|
||||
LayoutXLM was proposed in [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha
|
||||
Zhang, Furu Wei. It's a multilingual extension of the [LayoutLMv2 model](https://arxiv.org/abs/2012.14740) trained
|
||||
on 53 languages.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually-rich document
|
||||
understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. In
|
||||
this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to
|
||||
bridge the language barriers for visually-rich document understanding. To accurately evaluate LayoutXLM, we also
|
||||
introduce a multilingual form understanding benchmark dataset named XFUN, which includes form understanding samples in
|
||||
7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese), and key-value pairs are manually labeled
|
||||
for each language. Experiment results show that the LayoutXLM model has significantly outperformed the existing SOTA
|
||||
cross-lingual pre-trained models on the XFUN dataset.*
|
||||
|
||||
One can directly plug in the weights of LayoutXLM into a LayoutLMv2 model, like so:
|
||||
|
||||
```python
|
||||
from transformers import LayoutLMv2Model
|
||||
|
||||
model = LayoutLMv2Model.from_pretrained('microsoft/layoutxlm-base')
|
||||
```
|
||||
|
||||
Note that LayoutXLM has its own tokenizer, based on
|
||||
[`LayoutXLMTokenizer`]/[`LayoutXLMTokenizerFast`]. You can initialize it as
|
||||
follows:
|
||||
|
||||
```python
|
||||
from transformers import LayoutXLMTokenizer
|
||||
|
||||
tokenizer = LayoutXLMTokenizer.from_pretrained('microsoft/layoutxlm-base')
|
||||
```
|
||||
|
||||
Similar to LayoutLMv2, you can use [`LayoutXLMProcessor`] (which internally applies
|
||||
[`LayoutLMv2FeatureExtractor`] and
|
||||
[`LayoutXLMTokenizer`]/[`LayoutXLMTokenizerFast`] in sequence) to prepare all
|
||||
data for the model.
|
||||
|
||||
As LayoutXLM's architecture is equivalent to that of LayoutLMv2, one can refer to [LayoutLMv2's documentation page](layoutlmv2) for all tips, code examples and notebooks.
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm).
|
||||
|
||||
|
||||
## LayoutXLMTokenizer
|
||||
|
||||
[[autodoc]] LayoutXLMTokenizer
|
||||
- __call__
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## LayoutXLMTokenizerFast
|
||||
|
||||
[[autodoc]] LayoutXLMTokenizerFast
|
||||
- __call__
|
||||
|
||||
## LayoutXLMProcessor
|
||||
|
||||
[[autodoc]] LayoutXLMProcessor
|
||||
- __call__
|
@ -1,84 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LayoutXLM
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
LayoutXLM was proposed in `LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
|
||||
<https://arxiv.org/abs/2104.08836>`__ by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha
|
||||
Zhang, Furu Wei. It's a multilingual extension of the `LayoutLMv2 model <https://arxiv.org/abs/2012.14740>`__ trained
|
||||
on 53 languages.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually-rich document
|
||||
understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. In
|
||||
this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to
|
||||
bridge the language barriers for visually-rich document understanding. To accurately evaluate LayoutXLM, we also
|
||||
introduce a multilingual form understanding benchmark dataset named XFUN, which includes form understanding samples in
|
||||
7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese), and key-value pairs are manually labeled
|
||||
for each language. Experiment results show that the LayoutXLM model has significantly outperformed the existing SOTA
|
||||
cross-lingual pre-trained models on the XFUN dataset.*
|
||||
|
||||
One can directly plug in the weights of LayoutXLM into a LayoutLMv2 model, like so:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Model
|
||||
|
||||
model = LayoutLMv2Model.from_pretrained('microsoft/layoutxlm-base')
|
||||
|
||||
Note that LayoutXLM has its own tokenizer, based on
|
||||
:class:`~transformers.LayoutXLMTokenizer`/:class:`~transformers.LayoutXLMTokenizerFast`. You can initialize it as
|
||||
follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutXLMTokenizer
|
||||
|
||||
tokenizer = LayoutXLMTokenizer.from_pretrained('microsoft/layoutxlm-base')
|
||||
|
||||
Similar to LayoutLMv2, you can use :class:`~transformers.LayoutXLMProcessor` (which internally applies
|
||||
:class:`~transformers.LayoutLMv2FeatureExtractor` and
|
||||
:class:`~transformers.LayoutXLMTokenizer`/:class:`~transformers.LayoutXLMTokenizerFast` in sequence) to prepare all
|
||||
data for the model.
|
||||
|
||||
As LayoutXLM's architecture is equivalent to that of LayoutLMv2, one can refer to :doc:`LayoutLMv2's documentation page
|
||||
<layoutlmv2>` for all tips, code examples and notebooks.
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/unilm>`__.
|
||||
|
||||
|
||||
LayoutXLMTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutXLMTokenizer
|
||||
:members: __call__, build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
LayoutXLMTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutXLMTokenizerFast
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutXLMProcessor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutXLMProcessor
|
||||
:members: __call__
|
117
docs/source/model_doc/led.mdx
Normal file
117
docs/source/model_doc/led.mdx
Normal file
@ -0,0 +1,117 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# LED
|
||||
|
||||
## Overview
|
||||
|
||||
The LED model was proposed in [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz
|
||||
Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
|
||||
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
|
||||
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
|
||||
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
|
||||
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
|
||||
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
|
||||
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
|
||||
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
|
||||
WikiHop and TriviaQA. We finally introduce the Longformer-Encoder-Decoder (LED), a Longformer variant for supporting
|
||||
long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization
|
||||
dataset.*
|
||||
|
||||
Tips:
|
||||
|
||||
- [`LEDForConditionalGeneration`] is an extension of
|
||||
[`BartForConditionalGeneration`] exchanging the traditional *self-attention* layer with
|
||||
*Longformer*'s *chunked self-attention* layer. [`LEDTokenizer`] is an alias of
|
||||
[`BartTokenizer`].
|
||||
- LED works very well on long-range *sequence-to-sequence* tasks where the `input_ids` largely exceed a length of
|
||||
1024 tokens.
|
||||
- LED pads the `input_ids` to be a multiple of `config.attention_window` if required. Therefore a small speed-up is
|
||||
gained, when [`LEDTokenizer`] is used with the `pad_to_multiple_of` argument.
|
||||
- LED makes use of *global attention* by means of the `global_attention_mask` (see
|
||||
[`LongformerModel`]). For summarization, it is advised to put *global attention* only on the first
|
||||
`<s>` token. For question answering, it is advised to put *global attention* on all tokens of the question.
|
||||
- To fine-tune LED on all 16384, it is necessary to enable *gradient checkpointing* by executing
|
||||
`model.gradient_checkpointing_enable()`.
|
||||
- A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
|
||||
- A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
|
||||
|
||||
|
||||
## LEDConfig
|
||||
|
||||
[[autodoc]] LEDConfig
|
||||
|
||||
## LEDTokenizer
|
||||
|
||||
[[autodoc]] LEDTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## LEDTokenizerFast
|
||||
|
||||
[[autodoc]] LEDTokenizerFast
|
||||
|
||||
## LED specific outputs
|
||||
|
||||
[[autodoc]] models.led.modeling_led.LEDEncoderBaseModelOutput
|
||||
|
||||
[[autodoc]] models.led.modeling_led.LEDSeq2SeqModelOutput
|
||||
|
||||
[[autodoc]] models.led.modeling_led.LEDSeq2SeqLMOutput
|
||||
|
||||
[[autodoc]] models.led.modeling_led.LEDSeq2SeqSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] models.led.modeling_led.LEDSeq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] models.led.modeling_tf_led.TFLEDEncoderBaseModelOutput
|
||||
|
||||
[[autodoc]] models.led.modeling_tf_led.TFLEDSeq2SeqModelOutput
|
||||
|
||||
[[autodoc]] models.led.modeling_tf_led.TFLEDSeq2SeqLMOutput
|
||||
|
||||
## LEDModel
|
||||
|
||||
[[autodoc]] LEDModel
|
||||
- forward
|
||||
|
||||
## LEDForConditionalGeneration
|
||||
|
||||
[[autodoc]] LEDForConditionalGeneration
|
||||
- forward
|
||||
|
||||
## LEDForSequenceClassification
|
||||
|
||||
[[autodoc]] LEDForSequenceClassification
|
||||
- forward
|
||||
|
||||
## LEDForQuestionAnswering
|
||||
|
||||
[[autodoc]] LEDForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFLEDModel
|
||||
|
||||
[[autodoc]] TFLEDModel
|
||||
- call
|
||||
|
||||
## TFLEDForConditionalGeneration
|
||||
|
||||
[[autodoc]] TFLEDForConditionalGeneration
|
||||
- call
|
@ -1,150 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LED
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The LED model was proposed in `Longformer: The Long-Document Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz
|
||||
Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
|
||||
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
|
||||
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
|
||||
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
|
||||
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
|
||||
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
|
||||
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
|
||||
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
|
||||
WikiHop and TriviaQA. We finally introduce the Longformer-Encoder-Decoder (LED), a Longformer variant for supporting
|
||||
long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization
|
||||
dataset.*
|
||||
|
||||
Tips:
|
||||
|
||||
- :class:`~transformers.LEDForConditionalGeneration` is an extension of
|
||||
:class:`~transformers.BartForConditionalGeneration` exchanging the traditional *self-attention* layer with
|
||||
*Longformer*'s *chunked self-attention* layer. :class:`~transformers.LEDTokenizer` is an alias of
|
||||
:class:`~transformers.BartTokenizer`.
|
||||
- LED works very well on long-range *sequence-to-sequence* tasks where the ``input_ids`` largely exceed a length of
|
||||
1024 tokens.
|
||||
- LED pads the ``input_ids`` to be a multiple of ``config.attention_window`` if required. Therefore a small speed-up is
|
||||
gained, when :class:`~transformers.LEDTokenizer` is used with the ``pad_to_multiple_of`` argument.
|
||||
- LED makes use of *global attention* by means of the ``global_attention_mask`` (see
|
||||
:class:`~transformers.LongformerModel`). For summarization, it is advised to put *global attention* only on the first
|
||||
``<s>`` token. For question answering, it is advised to put *global attention* on all tokens of the question.
|
||||
- To fine-tune LED on all 16384, it is necessary to enable *gradient checkpointing* by executing
|
||||
``model.gradient_checkpointing_enable()``.
|
||||
- A notebook showing how to evaluate LED, can be accessed `here
|
||||
<https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing>`__.
|
||||
- A notebook showing how to fine-tune LED, can be accessed `here
|
||||
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
|
||||
|
||||
|
||||
LEDConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LEDConfig
|
||||
:members:
|
||||
|
||||
|
||||
LEDTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LEDTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
LEDTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LEDTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
LED specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_led.LEDEncoderBaseModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqLMOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqSequenceClassifierOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqQuestionAnsweringModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_tf_led.TFLEDEncoderBaseModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_tf_led.TFLEDSeq2SeqModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.led.modeling_tf_led.TFLEDSeq2SeqLMOutput
|
||||
:members:
|
||||
|
||||
|
||||
|
||||
|
||||
LEDModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LEDModel
|
||||
:members: forward
|
||||
|
||||
|
||||
LEDForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LEDForConditionalGeneration
|
||||
:members: forward
|
||||
|
||||
|
||||
LEDForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LEDForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
LEDForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LEDForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFLEDModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLEDModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFLEDForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLEDForConditionalGeneration
|
||||
:members: call
|
184
docs/source/model_doc/longformer.mdx
Normal file
184
docs/source/model_doc/longformer.mdx
Normal file
@ -0,0 +1,184 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Longformer
|
||||
|
||||
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title).
|
||||
|
||||
## Overview
|
||||
|
||||
The Longformer model was presented in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
|
||||
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
|
||||
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
|
||||
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
|
||||
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
|
||||
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
|
||||
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
|
||||
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
|
||||
WikiHop and TriviaQA.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Since the Longformer is based on RoBERTa, it doesn't have `token_type_ids`. You don't need to indicate which
|
||||
token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or
|
||||
`</s>`).
|
||||
|
||||
This model was contributed by [beltagy](https://huggingface.co/beltagy). The Authors' code can be found [here](https://github.com/allenai/longformer).
|
||||
|
||||
## Longformer Self Attention
|
||||
|
||||
Longformer self attention employs self attention on both a "local" context and a "global" context. Most tokens only
|
||||
attend "locally" to each other meaning that each token attends to its \\(\frac{1}{2} w\\) previous tokens and
|
||||
\\(\frac{1}{2} w\\) succeding tokens with \\(w\\) being the window length as defined in
|
||||
`config.attention_window`. Note that `config.attention_window` can be of type `List` to define a
|
||||
different \\(w\\) for each layer. A selected few tokens attend "globally" to all other tokens, as it is
|
||||
conventionally done for all tokens in `BertSelfAttention`.
|
||||
|
||||
Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices. Also note
|
||||
that every "locally" attending token not only attends to tokens within its window \\(w\\), but also to all "globally"
|
||||
attending tokens so that global attention is *symmetric*.
|
||||
|
||||
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor
|
||||
`global_attention_mask` at run-time appropriately. All Longformer models employ the following logic for
|
||||
`global_attention_mask`:
|
||||
|
||||
- 0: the token attends "locally",
|
||||
- 1: the token attends "globally".
|
||||
|
||||
For more information please also refer to [`~LongformerModel.forward`] method.
|
||||
|
||||
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually
|
||||
represents the memory and time bottleneck, can be reduced from \\(\mathcal{O}(n_s \times n_s)\\) to
|
||||
\\(\mathcal{O}(n_s \times w)\\), with \\(n_s\\) being the sequence length and \\(w\\) being the average window
|
||||
size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of
|
||||
"locally" attending tokens.
|
||||
|
||||
For more information, please refer to the official [paper](https://arxiv.org/pdf/2004.05150.pdf).
|
||||
|
||||
|
||||
## Training
|
||||
|
||||
[`LongformerForMaskedLM`] is trained the exact same way [`RobertaForMaskedLM`] is
|
||||
trained and should be used as follows:
|
||||
|
||||
```python
|
||||
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
|
||||
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
|
||||
|
||||
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
|
||||
```
|
||||
|
||||
## LongformerConfig
|
||||
|
||||
[[autodoc]] LongformerConfig
|
||||
|
||||
## LongformerTokenizer
|
||||
|
||||
[[autodoc]] LongformerTokenizer
|
||||
|
||||
## LongformerTokenizerFast
|
||||
|
||||
[[autodoc]] LongformerTokenizerFast
|
||||
|
||||
## Longformer specific outputs
|
||||
|
||||
[[autodoc]] models.longformer.modeling_longformer.LongformerBaseModelOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_longformer.LongformerBaseModelOutputWithPooling
|
||||
|
||||
[[autodoc]] models.longformer.modeling_longformer.LongformerMaskedLMOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_longformer.LongformerQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_longformer.LongformerSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_longformer.LongformerMultipleChoiceModelOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_longformer.LongformerTokenClassifierOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_tf_longformer.TFLongformerBaseModelOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_tf_longformer.TFLongformerBaseModelOutputWithPooling
|
||||
|
||||
[[autodoc]] models.longformer.modeling_tf_longformer.TFLongformerMaskedLMOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_tf_longformer.TFLongformerQuestionAnsweringModelOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_tf_longformer.TFLongformerSequenceClassifierOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_tf_longformer.TFLongformerMultipleChoiceModelOutput
|
||||
|
||||
[[autodoc]] models.longformer.modeling_tf_longformer.TFLongformerTokenClassifierOutput
|
||||
|
||||
## LongformerModel
|
||||
|
||||
[[autodoc]] LongformerModel
|
||||
- forward
|
||||
|
||||
## LongformerForMaskedLM
|
||||
|
||||
[[autodoc]] LongformerForMaskedLM
|
||||
- forward
|
||||
|
||||
## LongformerForSequenceClassification
|
||||
|
||||
[[autodoc]] LongformerForSequenceClassification
|
||||
- forward
|
||||
|
||||
## LongformerForMultipleChoice
|
||||
|
||||
[[autodoc]] LongformerForMultipleChoice
|
||||
- forward
|
||||
|
||||
## LongformerForTokenClassification
|
||||
|
||||
[[autodoc]] LongformerForTokenClassification
|
||||
- forward
|
||||
|
||||
## LongformerForQuestionAnswering
|
||||
|
||||
[[autodoc]] LongformerForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFLongformerModel
|
||||
|
||||
[[autodoc]] TFLongformerModel
|
||||
- call
|
||||
|
||||
## TFLongformerForMaskedLM
|
||||
|
||||
[[autodoc]] TFLongformerForMaskedLM
|
||||
- call
|
||||
|
||||
## TFLongformerForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFLongformerForQuestionAnswering
|
||||
- call
|
||||
|
||||
## TFLongformerForSequenceClassification
|
||||
|
||||
[[autodoc]] TFLongformerForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFLongformerForTokenClassification
|
||||
|
||||
[[autodoc]] TFLongformerForTokenClassification
|
||||
- call
|
||||
|
||||
## TFLongformerForMultipleChoice
|
||||
|
||||
[[autodoc]] TFLongformerForMultipleChoice
|
||||
- call
|
@ -1,239 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Longformer
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Longformer model was presented in `Longformer: The Long-Document Transformer
|
||||
<https://arxiv.org/pdf/2004.05150.pdf>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
|
||||
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
|
||||
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
|
||||
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
|
||||
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
|
||||
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
|
||||
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
|
||||
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
|
||||
WikiHop and TriviaQA.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Since the Longformer is based on RoBERTa, it doesn't have :obj:`token_type_ids`. You don't need to indicate which
|
||||
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
|
||||
:obj:`</s>`).
|
||||
|
||||
This model was contributed by `beltagy <https://huggingface.co/beltagy>`__. The Authors' code can be found `here
|
||||
<https://github.com/allenai/longformer>`__.
|
||||
|
||||
Longformer Self Attention
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Longformer self attention employs self attention on both a "local" context and a "global" context. Most tokens only
|
||||
attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and
|
||||
:math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in
|
||||
:obj:`config.attention_window`. Note that :obj:`config.attention_window` can be of type :obj:`List` to define a
|
||||
different :math:`w` for each layer. A selected few tokens attend "globally" to all other tokens, as it is
|
||||
conventionally done for all tokens in :obj:`BertSelfAttention`.
|
||||
|
||||
Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices. Also note
|
||||
that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all "globally"
|
||||
attending tokens so that global attention is *symmetric*.
|
||||
|
||||
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor
|
||||
:obj:`global_attention_mask` at run-time appropriately. All Longformer models employ the following logic for
|
||||
:obj:`global_attention_mask`:
|
||||
|
||||
- 0: the token attends "locally",
|
||||
- 1: the token attends "globally".
|
||||
|
||||
For more information please also refer to :meth:`~transformers.LongformerModel.forward` method.
|
||||
|
||||
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually
|
||||
represents the memory and time bottleneck, can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to
|
||||
:math:`\mathcal{O}(n_s \times w)`, with :math:`n_s` being the sequence length and :math:`w` being the average window
|
||||
size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of
|
||||
"locally" attending tokens.
|
||||
|
||||
For more information, please refer to the official `paper <https://arxiv.org/pdf/2004.05150.pdf>`__.
|
||||
|
||||
|
||||
Training
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:class:`~transformers.LongformerForMaskedLM` is trained the exact same way :class:`~transformers.RobertaForMaskedLM` is
|
||||
trained and should be used as follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
|
||||
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
|
||||
|
||||
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
|
||||
|
||||
|
||||
LongformerConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerConfig
|
||||
:members:
|
||||
|
||||
|
||||
LongformerTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
LongformerTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerTokenizerFast
|
||||
:members:
|
||||
|
||||
Longformer specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerBaseModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerBaseModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerMaskedLMOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerQuestionAnsweringModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerSequenceClassifierOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerMultipleChoiceModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerTokenClassifierOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerBaseModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerBaseModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerMaskedLMOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerQuestionAnsweringModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerSequenceClassifierOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerMultipleChoiceModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerTokenClassifierOutput
|
||||
:members:
|
||||
|
||||
LongformerModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerModel
|
||||
:members: forward
|
||||
|
||||
|
||||
LongformerForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
LongformerForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
LongformerForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
LongformerForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
LongformerForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LongformerForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFLongformerModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLongformerModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFLongformerForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLongformerForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFLongformerForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLongformerForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
|
||||
TFLongformerForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLongformerForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFLongformerForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLongformerForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFLongformerForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLongformerForMultipleChoice
|
||||
:members: call
|
||||
|
151
docs/source/model_doc/luke.mdx
Normal file
151
docs/source/model_doc/luke.mdx
Normal file
@ -0,0 +1,151 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# LUKE
|
||||
|
||||
## Overview
|
||||
|
||||
The LUKE model was proposed in [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda and Yuji Matsumoto.
|
||||
It is based on RoBERTa and adds entity embeddings as well as an entity-aware self-attention mechanism, which helps
|
||||
improve performance on various downstream tasks involving reasoning about entities such as named entity recognition,
|
||||
extractive and cloze-style question answering, entity typing, and relation classification.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Entity representations are useful in natural language tasks involving entities. In this paper, we propose new
|
||||
pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed
|
||||
model treats words and entities in a given text as independent tokens, and outputs contextualized representations of
|
||||
them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves
|
||||
predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also
|
||||
propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the
|
||||
transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model
|
||||
achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains
|
||||
state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification),
|
||||
CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question
|
||||
answering).*
|
||||
|
||||
Tips:
|
||||
|
||||
- This implementation is the same as [`RobertaModel`] with the addition of entity embeddings as well
|
||||
as an entity-aware self-attention mechanism, which improves performance on tasks involving reasoning about entities.
|
||||
- LUKE treats entities as input tokens; therefore, it takes `entity_ids`, `entity_attention_mask`,
|
||||
`entity_token_type_ids` and `entity_position_ids` as extra input. You can obtain those using
|
||||
[`LukeTokenizer`].
|
||||
- [`LukeTokenizer`] takes `entities` and `entity_spans` (character-based start and end
|
||||
positions of the entities in the input text) as extra input. `entities` typically consist of [MASK] entities or
|
||||
Wikipedia entities. The brief description when inputting these entities are as follows:
|
||||
|
||||
- *Inputting [MASK] entities to compute entity representations*: The [MASK] entity is used to mask entities to be
|
||||
predicted during pretraining. When LUKE receives the [MASK] entity, it tries to predict the original entity by
|
||||
gathering the information about the entity from the input text. Therefore, the [MASK] entity can be used to address
|
||||
downstream tasks requiring the information of entities in text such as entity typing, relation classification, and
|
||||
named entity recognition.
|
||||
- *Inputting Wikipedia entities to compute knowledge-enhanced token representations*: LUKE learns rich information
|
||||
(or knowledge) about Wikipedia entities during pretraining and stores the information in its entity embedding. By
|
||||
using Wikipedia entities as input tokens, LUKE outputs token representations enriched by the information stored in
|
||||
the embeddings of these entities. This is particularly effective for tasks requiring real-world knowledge, such as
|
||||
question answering.
|
||||
|
||||
- There are three head models for the former use case:
|
||||
|
||||
- [`LukeForEntityClassification`], for tasks to classify a single entity in an input text such as
|
||||
entity typing, e.g. the [Open Entity dataset](https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html).
|
||||
This model places a linear head on top of the output entity representation.
|
||||
- [`LukeForEntityPairClassification`], for tasks to classify the relationship between two entities
|
||||
such as relation classification, e.g. the [TACRED dataset](https://nlp.stanford.edu/projects/tacred/). This
|
||||
model places a linear head on top of the concatenated output representation of the pair of given entities.
|
||||
- [`LukeForEntitySpanClassification`], for tasks to classify the sequence of entity spans, such as
|
||||
named entity recognition (NER). This model places a linear head on top of the output entity representations. You
|
||||
can address NER using this model by inputting all possible entity spans in the text to the model.
|
||||
|
||||
[`LukeTokenizer`] has a `task` argument, which enables you to easily create an input to these
|
||||
head models by specifying `task="entity_classification"`, `task="entity_pair_classification"`, or
|
||||
`task="entity_span_classification"`. Please refer to the example code of each head models.
|
||||
|
||||
A demo notebook on how to fine-tune [`LukeForEntityPairClassification`] for relation
|
||||
classification can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LUKE).
|
||||
|
||||
There are also 3 notebooks available, which showcase how you can reproduce the results as reported in the paper with
|
||||
the HuggingFace implementation of LUKE. They can be found [here](https://github.com/studio-ousia/luke/tree/master/notebooks).
|
||||
|
||||
Example:
|
||||
|
||||
```python
|
||||
>>> from transformers import LukeTokenizer, LukeModel, LukeForEntityPairClassification
|
||||
|
||||
>>> model = LukeModel.from_pretrained("studio-ousia/luke-base")
|
||||
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
|
||||
|
||||
# Example 1: Computing the contextualized entity representation corresponding to the entity mention "Beyoncé"
|
||||
>>> text = "Beyoncé lives in Los Angeles."
|
||||
>>> entity_spans = [(0, 7)] # character-based entity span corresponding to "Beyoncé"
|
||||
>>> inputs = tokenizer(text, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
>>> word_last_hidden_state = outputs.last_hidden_state
|
||||
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
|
||||
|
||||
# Example 2: Inputting Wikipedia entities to obtain enriched contextualized representations
|
||||
>>> entities = ["Beyoncé", "Los Angeles"] # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
|
||||
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
|
||||
>>> inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
>>> word_last_hidden_state = outputs.last_hidden_state
|
||||
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
|
||||
|
||||
# Example 3: Classifying the relationship between two entities using LukeForEntityPairClassification head model
|
||||
>>> model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
|
||||
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
|
||||
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
|
||||
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
>>> logits = outputs.logits
|
||||
>>> predicted_class_idx = int(logits[0].argmax())
|
||||
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])
|
||||
```
|
||||
|
||||
This model was contributed by [ikuyamada](https://huggingface.co/ikuyamada) and [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/studio-ousia/luke).
|
||||
|
||||
|
||||
## LukeConfig
|
||||
|
||||
[[autodoc]] LukeConfig
|
||||
|
||||
## LukeTokenizer
|
||||
|
||||
[[autodoc]] LukeTokenizer
|
||||
- __call__
|
||||
- save_vocabulary
|
||||
|
||||
## LukeModel
|
||||
|
||||
[[autodoc]] LukeModel
|
||||
- forward
|
||||
|
||||
## LukeForMaskedLM
|
||||
|
||||
[[autodoc]] LukeForMaskedLM
|
||||
- forward
|
||||
|
||||
## LukeForEntityClassification
|
||||
|
||||
[[autodoc]] LukeForEntityClassification
|
||||
- forward
|
||||
|
||||
## LukeForEntityPairClassification
|
||||
|
||||
[[autodoc]] LukeForEntityPairClassification
|
||||
- forward
|
||||
|
||||
## LukeForEntitySpanClassification
|
||||
|
||||
[[autodoc]] LukeForEntitySpanClassification
|
||||
- forward
|
@ -1,168 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LUKE
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The LUKE model was proposed in `LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
|
||||
<https://arxiv.org/abs/2010.01057>`_ by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda and Yuji Matsumoto.
|
||||
It is based on RoBERTa and adds entity embeddings as well as an entity-aware self-attention mechanism, which helps
|
||||
improve performance on various downstream tasks involving reasoning about entities such as named entity recognition,
|
||||
extractive and cloze-style question answering, entity typing, and relation classification.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Entity representations are useful in natural language tasks involving entities. In this paper, we propose new
|
||||
pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed
|
||||
model treats words and entities in a given text as independent tokens, and outputs contextualized representations of
|
||||
them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves
|
||||
predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also
|
||||
propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the
|
||||
transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model
|
||||
achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains
|
||||
state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification),
|
||||
CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question
|
||||
answering).*
|
||||
|
||||
Tips:
|
||||
|
||||
- This implementation is the same as :class:`~transformers.RobertaModel` with the addition of entity embeddings as well
|
||||
as an entity-aware self-attention mechanism, which improves performance on tasks involving reasoning about entities.
|
||||
- LUKE treats entities as input tokens; therefore, it takes :obj:`entity_ids`, :obj:`entity_attention_mask`,
|
||||
:obj:`entity_token_type_ids` and :obj:`entity_position_ids` as extra input. You can obtain those using
|
||||
:class:`~transformers.LukeTokenizer`.
|
||||
- :class:`~transformers.LukeTokenizer` takes :obj:`entities` and :obj:`entity_spans` (character-based start and end
|
||||
positions of the entities in the input text) as extra input. :obj:`entities` typically consist of [MASK] entities or
|
||||
Wikipedia entities. The brief description when inputting these entities are as follows:
|
||||
|
||||
- *Inputting [MASK] entities to compute entity representations*: The [MASK] entity is used to mask entities to be
|
||||
predicted during pretraining. When LUKE receives the [MASK] entity, it tries to predict the original entity by
|
||||
gathering the information about the entity from the input text. Therefore, the [MASK] entity can be used to address
|
||||
downstream tasks requiring the information of entities in text such as entity typing, relation classification, and
|
||||
named entity recognition.
|
||||
- *Inputting Wikipedia entities to compute knowledge-enhanced token representations*: LUKE learns rich information
|
||||
(or knowledge) about Wikipedia entities during pretraining and stores the information in its entity embedding. By
|
||||
using Wikipedia entities as input tokens, LUKE outputs token representations enriched by the information stored in
|
||||
the embeddings of these entities. This is particularly effective for tasks requiring real-world knowledge, such as
|
||||
question answering.
|
||||
|
||||
- There are three head models for the former use case:
|
||||
|
||||
- :class:`~transformers.LukeForEntityClassification`, for tasks to classify a single entity in an input text such as
|
||||
entity typing, e.g. the `Open Entity dataset <https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html>`__.
|
||||
This model places a linear head on top of the output entity representation.
|
||||
- :class:`~transformers.LukeForEntityPairClassification`, for tasks to classify the relationship between two entities
|
||||
such as relation classification, e.g. the `TACRED dataset <https://nlp.stanford.edu/projects/tacred/>`__. This
|
||||
model places a linear head on top of the concatenated output representation of the pair of given entities.
|
||||
- :class:`~transformers.LukeForEntitySpanClassification`, for tasks to classify the sequence of entity spans, such as
|
||||
named entity recognition (NER). This model places a linear head on top of the output entity representations. You
|
||||
can address NER using this model by inputting all possible entity spans in the text to the model.
|
||||
|
||||
:class:`~transformers.LukeTokenizer` has a ``task`` argument, which enables you to easily create an input to these
|
||||
head models by specifying ``task="entity_classification"``, ``task="entity_pair_classification"``, or
|
||||
``task="entity_span_classification"``. Please refer to the example code of each head models.
|
||||
|
||||
A demo notebook on how to fine-tune :class:`~transformers.LukeForEntityPairClassification` for relation
|
||||
classification can be found `here <https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LUKE>`__.
|
||||
|
||||
There are also 3 notebooks available, which showcase how you can reproduce the results as reported in the paper with
|
||||
the HuggingFace implementation of LUKE. They can be found `here
|
||||
<https://github.com/studio-ousia/luke/tree/master/notebooks>`__.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import LukeTokenizer, LukeModel, LukeForEntityPairClassification
|
||||
|
||||
>>> model = LukeModel.from_pretrained("studio-ousia/luke-base")
|
||||
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
|
||||
|
||||
# Example 1: Computing the contextualized entity representation corresponding to the entity mention "Beyoncé"
|
||||
>>> text = "Beyoncé lives in Los Angeles."
|
||||
>>> entity_spans = [(0, 7)] # character-based entity span corresponding to "Beyoncé"
|
||||
>>> inputs = tokenizer(text, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
>>> word_last_hidden_state = outputs.last_hidden_state
|
||||
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
|
||||
|
||||
# Example 2: Inputting Wikipedia entities to obtain enriched contextualized representations
|
||||
>>> entities = ["Beyoncé", "Los Angeles"] # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
|
||||
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
|
||||
>>> inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
>>> word_last_hidden_state = outputs.last_hidden_state
|
||||
>>> entity_last_hidden_state = outputs.entity_last_hidden_state
|
||||
|
||||
# Example 3: Classifying the relationship between two entities using LukeForEntityPairClassification head model
|
||||
>>> model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
|
||||
>>> tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
|
||||
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
|
||||
>>> inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
>>> logits = outputs.logits
|
||||
>>> predicted_class_idx = int(logits[0].argmax())
|
||||
>>> print("Predicted class:", model.config.id2label[predicted_class_idx])
|
||||
|
||||
This model was contributed by `ikuyamada <https://huggingface.co/ikuyamada>`__ and `nielsr
|
||||
<https://huggingface.co/nielsr>`__. The original code can be found `here <https://github.com/studio-ousia/luke>`__.
|
||||
|
||||
|
||||
LukeConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeConfig
|
||||
:members:
|
||||
|
||||
|
||||
LukeTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeTokenizer
|
||||
:members: __call__, save_vocabulary
|
||||
|
||||
|
||||
LukeModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeModel
|
||||
:members: forward
|
||||
|
||||
LukeForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
LukeForEntityClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeForEntityClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
LukeForEntityPairClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeForEntityPairClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
LukeForEntitySpanClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeForEntitySpanClassification
|
||||
:members: forward
|
102
docs/source/model_doc/lxmert.mdx
Normal file
102
docs/source/model_doc/lxmert.mdx
Normal file
@ -0,0 +1,102 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# LXMERT
|
||||
|
||||
## Overview
|
||||
|
||||
The LXMERT model was proposed in [LXMERT: Learning Cross-Modality Encoder Representations from Transformers](https://arxiv.org/abs/1908.07490) by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders
|
||||
(one for the vision modality, one for the language modality, and then one to fuse both modalities) pretrained using a
|
||||
combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked
|
||||
visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. The pretraining
|
||||
consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering, VQA 2.0, and GQA.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly,
|
||||
the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality
|
||||
Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we
|
||||
build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language
|
||||
encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language
|
||||
semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative
|
||||
pretraining tasks: masked language modeling, masked object prediction (feature regression and label classification),
|
||||
cross-modality matching, and image question answering. These tasks help in learning both intra-modality and
|
||||
cross-modality relationships. After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art
|
||||
results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our
|
||||
pretrained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR, and improve the previous
|
||||
best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel
|
||||
model components and pretraining strategies significantly contribute to our strong results; and also present several
|
||||
attention visualizations for the different encoders*
|
||||
|
||||
Tips:
|
||||
|
||||
- Bounding boxes are not necessary to be used in the visual feature embeddings, any kind of visual-spacial features
|
||||
will work.
|
||||
- Both the language hidden states and the visual hidden states that LXMERT outputs are passed through the
|
||||
cross-modality layer, so they contain information from both modalities. To access a modality that only attends to
|
||||
itself, select the vision/language hidden states from the first input in the tuple.
|
||||
- The bidirectional cross-modality encoder attention only returns attention values when the language modality is used
|
||||
as the input and the vision modality is used as the context vector. Further, while the cross-modality encoder
|
||||
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
|
||||
both self attention outputs are disregarded.
|
||||
|
||||
This model was contributed by [eltoto1219](https://huggingface.co/eltoto1219). The original code can be found [here](https://github.com/airsplay/lxmert).
|
||||
|
||||
|
||||
## LxmertConfig
|
||||
|
||||
[[autodoc]] LxmertConfig
|
||||
|
||||
## LxmertTokenizer
|
||||
|
||||
[[autodoc]] LxmertTokenizer
|
||||
|
||||
## LxmertTokenizerFast
|
||||
|
||||
[[autodoc]] LxmertTokenizerFast
|
||||
|
||||
## Lxmert specific outputs
|
||||
|
||||
[[autodoc]] models.lxmert.modeling_lxmert.LxmertModelOutput
|
||||
|
||||
[[autodoc]] models.lxmert.modeling_lxmert.LxmertForPreTrainingOutput
|
||||
|
||||
[[autodoc]] models.lxmert.modeling_lxmert.LxmertForQuestionAnsweringOutput
|
||||
|
||||
[[autodoc]] models.lxmert.modeling_tf_lxmert.TFLxmertModelOutput
|
||||
|
||||
[[autodoc]] models.lxmert.modeling_tf_lxmert.TFLxmertForPreTrainingOutput
|
||||
|
||||
## LxmertModel
|
||||
|
||||
[[autodoc]] LxmertModel
|
||||
- forward
|
||||
|
||||
## LxmertForPreTraining
|
||||
|
||||
[[autodoc]] LxmertForPreTraining
|
||||
- forward
|
||||
|
||||
## LxmertForQuestionAnswering
|
||||
|
||||
[[autodoc]] LxmertForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFLxmertModel
|
||||
|
||||
[[autodoc]] TFLxmertModel
|
||||
- call
|
||||
|
||||
## TFLxmertForPreTraining
|
||||
|
||||
[[autodoc]] TFLxmertForPreTraining
|
||||
- call
|
@ -1,128 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LXMERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The LXMERT model was proposed in `LXMERT: Learning Cross-Modality Encoder Representations from Transformers
|
||||
<https://arxiv.org/abs/1908.07490>`__ by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders
|
||||
(one for the vision modality, one for the language modality, and then one to fuse both modalities) pretrained using a
|
||||
combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked
|
||||
visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. The pretraining
|
||||
consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering, VQA 2.0, and GQA.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly,
|
||||
the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality
|
||||
Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we
|
||||
build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language
|
||||
encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language
|
||||
semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative
|
||||
pretraining tasks: masked language modeling, masked object prediction (feature regression and label classification),
|
||||
cross-modality matching, and image question answering. These tasks help in learning both intra-modality and
|
||||
cross-modality relationships. After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art
|
||||
results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our
|
||||
pretrained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR, and improve the previous
|
||||
best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel
|
||||
model components and pretraining strategies significantly contribute to our strong results; and also present several
|
||||
attention visualizations for the different encoders*
|
||||
|
||||
Tips:
|
||||
|
||||
- Bounding boxes are not necessary to be used in the visual feature embeddings, any kind of visual-spacial features
|
||||
will work.
|
||||
- Both the language hidden states and the visual hidden states that LXMERT outputs are passed through the
|
||||
cross-modality layer, so they contain information from both modalities. To access a modality that only attends to
|
||||
itself, select the vision/language hidden states from the first input in the tuple.
|
||||
- The bidirectional cross-modality encoder attention only returns attention values when the language modality is used
|
||||
as the input and the vision modality is used as the context vector. Further, while the cross-modality encoder
|
||||
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
|
||||
both self attention outputs are disregarded.
|
||||
|
||||
This model was contributed by `eltoto1219 <https://huggingface.co/eltoto1219>`__. The original code can be found `here
|
||||
<https://github.com/airsplay/lxmert>`__.
|
||||
|
||||
|
||||
LxmertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LxmertConfig
|
||||
:members:
|
||||
|
||||
|
||||
LxmertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LxmertTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
LxmertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LxmertTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
Lxmert specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.lxmert.modeling_lxmert.LxmertModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.lxmert.modeling_lxmert.LxmertForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.lxmert.modeling_lxmert.LxmertForQuestionAnsweringOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.lxmert.modeling_tf_lxmert.TFLxmertModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.lxmert.modeling_tf_lxmert.TFLxmertForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
|
||||
LxmertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LxmertModel
|
||||
:members: forward
|
||||
|
||||
LxmertForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LxmertForPreTraining
|
||||
:members: forward
|
||||
|
||||
LxmertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LxmertForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFLxmertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLxmertModel
|
||||
:members: call
|
||||
|
||||
TFLxmertForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFLxmertForPreTraining
|
||||
:members: call
|
116
docs/source/model_doc/m2m_100.mdx
Normal file
116
docs/source/model_doc/m2m_100.mdx
Normal file
@ -0,0 +1,116 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# M2M100
|
||||
|
||||
## Overview
|
||||
|
||||
The M2M100 model was proposed in [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky,
|
||||
Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy
|
||||
Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Existing work in translation demonstrated the potential of massively multilingual machine translation by training a
|
||||
single model able to translate between any pair of languages. However, much of this work is English-Centric by training
|
||||
only on data which was translated from or to English. While this is supported by large sources of training data, it
|
||||
does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation
|
||||
model that can translate directly between any pair of 100 languages. We build and open source a training dataset that
|
||||
covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how
|
||||
to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters
|
||||
to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly
|
||||
translating between non-English directions while performing competitively to the best single systems of WMT. We
|
||||
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
|
||||
|
||||
This model was contributed by [valhalla](https://huggingface.co/valhalla).
|
||||
|
||||
|
||||
### Training and Generation
|
||||
|
||||
M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks. As the model is
|
||||
multilingual it expects the sequences in a certain format: A special language id token is used as prefix in both the
|
||||
source and target text. The source text format is `[lang_code] X [eos]`, where `lang_code` is source language
|
||||
id for source text and target language id for target text, with `X` being the source or target text.
|
||||
|
||||
The [`M2M100Tokenizer`] depends on `sentencepiece` so be sure to install it before running the
|
||||
examples. To install `sentencepiece` run `pip install sentencepiece`.
|
||||
|
||||
- Supervised Training
|
||||
|
||||
```python
|
||||
from transformers import M2M100Config, M2M100ForConditionalGeneration, M2M100Tokenizer
|
||||
|
||||
model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
|
||||
tokenizer = M2M100Tokenizer.from_pretrained('facebook/m2m100_418M', src_lang="en", tgt_lang="fr")
|
||||
|
||||
src_text = "Life is like a box of chocolates."
|
||||
tgt_text = "La vie est comme une boîte de chocolat."
|
||||
|
||||
model_inputs = tokenizer(src_text, return_tensors="pt")
|
||||
with tokenizer.as_target_tokenizer():
|
||||
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
|
||||
|
||||
loss = model(**model_inputs, labels=labels) # forward pass
|
||||
```
|
||||
|
||||
- Generation
|
||||
|
||||
M2M100 uses the `eos_token_id` as the `decoder_start_token_id` for generation with the target language id
|
||||
being forced as the first generated token. To force the target language id as the first generated token, pass the
|
||||
*forced_bos_token_id* parameter to the *generate* method. The following example shows how to translate between
|
||||
Hindi to French and Chinese to English using the *facebook/m2m100_418M* checkpoint.
|
||||
|
||||
```python
|
||||
>>> from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
|
||||
|
||||
>>> hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
|
||||
>>> chinese_text = "生活就像一盒巧克力。"
|
||||
|
||||
>>> model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
|
||||
>>> tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
|
||||
|
||||
>>> # translate Hindi to French
|
||||
>>> tokenizer.src_lang = "hi"
|
||||
>>> encoded_hi = tokenizer(hi_text, return_tensors="pt")
|
||||
>>> generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("fr"))
|
||||
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
"La vie est comme une boîte de chocolat."
|
||||
|
||||
>>> # translate Chinese to English
|
||||
>>> tokenizer.src_lang = "zh"
|
||||
>>> encoded_zh = tokenizer(chinese_text, return_tensors="pt")
|
||||
>>> generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
|
||||
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
"Life is like a box of chocolate."
|
||||
```
|
||||
|
||||
## M2M100Config
|
||||
|
||||
[[autodoc]] M2M100Config
|
||||
|
||||
## M2M100Tokenizer
|
||||
|
||||
[[autodoc]] M2M100Tokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## M2M100Model
|
||||
|
||||
[[autodoc]] M2M100Model
|
||||
- forward
|
||||
|
||||
## M2M100ForConditionalGeneration
|
||||
|
||||
[[autodoc]] M2M100ForConditionalGeneration
|
||||
- forward
|
@ -1,130 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
M2M100
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The M2M100 model was proposed in `Beyond English-Centric Multilingual Machine Translation
|
||||
<https://arxiv.org/abs/2010.11125>`__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky,
|
||||
Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy
|
||||
Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Existing work in translation demonstrated the potential of massively multilingual machine translation by training a
|
||||
single model able to translate between any pair of languages. However, much of this work is English-Centric by training
|
||||
only on data which was translated from or to English. While this is supported by large sources of training data, it
|
||||
does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation
|
||||
model that can translate directly between any pair of 100 languages. We build and open source a training dataset that
|
||||
covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how
|
||||
to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters
|
||||
to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly
|
||||
translating between non-English directions while performing competitively to the best single systems of WMT. We
|
||||
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
|
||||
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
|
||||
|
||||
|
||||
Training and Generation
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks. As the model is
|
||||
multilingual it expects the sequences in a certain format: A special language id token is used as prefix in both the
|
||||
source and target text. The source text format is :obj:`[lang_code] X [eos]`, where :obj:`lang_code` is source language
|
||||
id for source text and target language id for target text, with :obj:`X` being the source or target text.
|
||||
|
||||
The :class:`~transformers.M2M100Tokenizer` depends on :obj:`sentencepiece` so be sure to install it before running the
|
||||
examples. To install :obj:`sentencepiece` run ``pip install sentencepiece``.
|
||||
|
||||
- Supervised Training
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import M2M100Config, M2M100ForConditionalGeneration, M2M100Tokenizer
|
||||
|
||||
model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
|
||||
tokenizer = M2M100Tokenizer.from_pretrained('facebook/m2m100_418M', src_lang="en", tgt_lang="fr")
|
||||
|
||||
src_text = "Life is like a box of chocolates."
|
||||
tgt_text = "La vie est comme une boîte de chocolat."
|
||||
|
||||
model_inputs = tokenizer(src_text, return_tensors="pt")
|
||||
with tokenizer.as_target_tokenizer():
|
||||
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
|
||||
|
||||
loss = model(**model_inputs, labels=labels) # forward pass
|
||||
|
||||
|
||||
- Generation
|
||||
|
||||
M2M100 uses the :obj:`eos_token_id` as the :obj:`decoder_start_token_id` for generation with the target language id
|
||||
being forced as the first generated token. To force the target language id as the first generated token, pass the
|
||||
`forced_bos_token_id` parameter to the `generate` method. The following example shows how to translate between
|
||||
Hindi to French and Chinese to English using the `facebook/m2m100_418M` checkpoint.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
|
||||
|
||||
>>> hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
|
||||
>>> chinese_text = "生活就像一盒巧克力。"
|
||||
|
||||
>>> model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")
|
||||
>>> tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
|
||||
|
||||
>>> # translate Hindi to French
|
||||
>>> tokenizer.src_lang = "hi"
|
||||
>>> encoded_hi = tokenizer(hi_text, return_tensors="pt")
|
||||
>>> generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id("fr"))
|
||||
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
"La vie est comme une boîte de chocolat."
|
||||
|
||||
>>> # translate Chinese to English
|
||||
>>> tokenizer.src_lang = "zh"
|
||||
>>> encoded_zh = tokenizer(chinese_text, return_tensors="pt")
|
||||
>>> generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en"))
|
||||
>>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
"Life is like a box of chocolate."
|
||||
|
||||
|
||||
M2M100Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.M2M100Config
|
||||
:members:
|
||||
|
||||
|
||||
M2M100Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.M2M100Tokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
M2M100Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.M2M100Model
|
||||
:members: forward
|
||||
|
||||
|
||||
M2M100ForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.M2M100ForConditionalGeneration
|
||||
:members: forward
|
||||
|
||||
|
191
docs/source/model_doc/marian.mdx
Normal file
191
docs/source/model_doc/marian.mdx
Normal file
@ -0,0 +1,191 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# MarianMT
|
||||
|
||||
**Bugs:** If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title)
|
||||
and assign @patrickvonplaten.
|
||||
|
||||
Translations should be similar, but not identical to output in the test set linked to in each model card.
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- Each model is about 298 MB on disk, there are more than 1,000 models.
|
||||
- The list of supported language pairs can be found [here](https://huggingface.co/Helsinki-NLP).
|
||||
- Models were originally trained by [Jörg Tiedemann](https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann) using the [Marian](https://marian-nmt.github.io/) C++ library, which supports fast training and translation.
|
||||
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented
|
||||
in a model card.
|
||||
- The 80 opus models that require BPE preprocessing are not supported.
|
||||
- The modeling code is the same as [`BartForConditionalGeneration`] with a few minor modifications:
|
||||
|
||||
- static (sinusoid) positional embeddings (`MarianConfig.static_position_embeddings=True`)
|
||||
- no layernorm_embedding (`MarianConfig.normalize_embedding=False`)
|
||||
- the model starts generating with `pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
|
||||
`<s/>`),
|
||||
- Code to bulk convert models can be found in `convert_marian_to_pytorch.py`.
|
||||
- This model was contributed by [sshleifer](https://huggingface.co/sshleifer).
|
||||
|
||||
## Naming
|
||||
|
||||
- All model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`
|
||||
- The language codes used to name models are inconsistent. Two digit codes can usually be found [here](https://developers.google.com/admin-sdk/directory/v1/languages), three digit codes require googling "language
|
||||
code {code}".
|
||||
- Codes formatted like `es_AR` are usually `code_{region}`. That one is Spanish from Argentina.
|
||||
- The models were converted in two stages. The first 1000 models use ISO-639-2 codes to identify languages, the second
|
||||
group use a combination of ISO-639-5 codes and ISO-639-2 codes.
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
- Since Marian models are smaller than many other translation models available in the library, they can be useful for
|
||||
fine-tuning experiments and integration tests.
|
||||
- [Fine-tune on GPU](https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_enro_teacher.sh)
|
||||
- [Fine-tune on GPU with pytorch-lightning](https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_no_teacher.sh)
|
||||
|
||||
## Multilingual Models
|
||||
|
||||
- All model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`:
|
||||
- If a model can output multiple languages, and you should specify a language code by prepending the desired output
|
||||
language to the `src_text`.
|
||||
- You can see a models's supported language codes in its model card, under target constituents, like in [opus-mt-en-roa](https://huggingface.co/Helsinki-NLP/opus-mt-en-roa).
|
||||
- Note that if a model is only multilingual on the source side, like `Helsinki-NLP/opus-mt-roa-en`, no language
|
||||
codes are required.
|
||||
|
||||
New multi-lingual models from the [Tatoeba-Challenge repo](https://github.com/Helsinki-NLP/Tatoeba-Challenge)
|
||||
require 3 character language codes:
|
||||
|
||||
```python
|
||||
>>> from transformers import MarianMTModel, MarianTokenizer
|
||||
>>> src_text = [
|
||||
... '>>fra<< this is a sentence in english that we want to translate to french',
|
||||
... '>>por<< This should go to portuguese',
|
||||
... '>>esp<< And this to Spanish'
|
||||
>>> ]
|
||||
|
||||
>>> model_name = 'Helsinki-NLP/opus-mt-en-roa'
|
||||
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
|
||||
>>> print(tokenizer.supported_language_codes)
|
||||
['>>zlm_Latn<<', '>>mfe<<', '>>hat<<', '>>pap<<', '>>ast<<', '>>cat<<', '>>ind<<', '>>glg<<', '>>wln<<', '>>spa<<', '>>fra<<', '>>ron<<', '>>por<<', '>>ita<<', '>>oci<<', '>>arg<<', '>>min<<']
|
||||
|
||||
>>> model = MarianMTModel.from_pretrained(model_name)
|
||||
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
|
||||
>>> [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
|
||||
["c'est une phrase en anglais que nous voulons traduire en français",
|
||||
'Isto deve ir para o português.',
|
||||
'Y esto al español']
|
||||
```
|
||||
|
||||
Here is the code to see all available pretrained models on the hub:
|
||||
|
||||
```python
|
||||
from huggingface_hub import list_models
|
||||
model_list = list_models()
|
||||
org = "Helsinki-NLP"
|
||||
model_ids = [x.modelId for x in model_list if x.modelId.startswith(org)]
|
||||
suffix = [x.split('/')[1] for x in model_ids]
|
||||
old_style_multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()]
|
||||
```
|
||||
|
||||
## Old Style Multi-Lingual Models
|
||||
|
||||
These are the old style multi-lingual models ported from the OPUS-MT-Train repo: and the members of each language
|
||||
group:
|
||||
|
||||
```python
|
||||
['Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU',
|
||||
'Helsinki-NLP/opus-mt-ROMANCE-en',
|
||||
'Helsinki-NLP/opus-mt-SCANDINAVIA-SCANDINAVIA',
|
||||
'Helsinki-NLP/opus-mt-de-ZH',
|
||||
'Helsinki-NLP/opus-mt-en-CELTIC',
|
||||
'Helsinki-NLP/opus-mt-en-ROMANCE',
|
||||
'Helsinki-NLP/opus-mt-es-NORWAY',
|
||||
'Helsinki-NLP/opus-mt-fi-NORWAY',
|
||||
'Helsinki-NLP/opus-mt-fi-ZH',
|
||||
'Helsinki-NLP/opus-mt-fi_nb_no_nn_ru_sv_en-SAMI',
|
||||
'Helsinki-NLP/opus-mt-sv-NORWAY',
|
||||
'Helsinki-NLP/opus-mt-sv-ZH']
|
||||
GROUP_MEMBERS = {
|
||||
'ZH': ['cmn', 'cn', 'yue', 'ze_zh', 'zh_cn', 'zh_CN', 'zh_HK', 'zh_tw', 'zh_TW', 'zh_yue', 'zhs', 'zht', 'zh'],
|
||||
'ROMANCE': ['fr', 'fr_BE', 'fr_CA', 'fr_FR', 'wa', 'frp', 'oc', 'ca', 'rm', 'lld', 'fur', 'lij', 'lmo', 'es', 'es_AR', 'es_CL', 'es_CO', 'es_CR', 'es_DO', 'es_EC', 'es_ES', 'es_GT', 'es_HN', 'es_MX', 'es_NI', 'es_PA', 'es_PE', 'es_PR', 'es_SV', 'es_UY', 'es_VE', 'pt', 'pt_br', 'pt_BR', 'pt_PT', 'gl', 'lad', 'an', 'mwl', 'it', 'it_IT', 'co', 'nap', 'scn', 'vec', 'sc', 'ro', 'la'],
|
||||
'NORTH_EU': ['de', 'nl', 'fy', 'af', 'da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
|
||||
'SCANDINAVIA': ['da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
|
||||
'SAMI': ['se', 'sma', 'smj', 'smn', 'sms'],
|
||||
'NORWAY': ['nb_NO', 'nb', 'nn_NO', 'nn', 'nog', 'no_nb', 'no'],
|
||||
'CELTIC': ['ga', 'cy', 'br', 'gd', 'kw', 'gv']
|
||||
}
|
||||
```
|
||||
|
||||
Example of translating english to many romance languages, using old-style 2 character language codes
|
||||
|
||||
|
||||
```python
|
||||
>>> from transformers import MarianMTModel, MarianTokenizer
|
||||
>>> src_text = [
|
||||
... '>>fr<< this is a sentence in english that we want to translate to french',
|
||||
... '>>pt<< This should go to portuguese',
|
||||
... '>>es<< And this to Spanish'
|
||||
>>> ]
|
||||
|
||||
>>> model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
|
||||
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
|
||||
|
||||
>>> model = MarianMTModel.from_pretrained(model_name)
|
||||
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
|
||||
>>> tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
|
||||
["c'est une phrase en anglais que nous voulons traduire en français",
|
||||
'Isto deve ir para o português.',
|
||||
'Y esto al español']
|
||||
```
|
||||
|
||||
## MarianConfig
|
||||
|
||||
[[autodoc]] MarianConfig
|
||||
|
||||
## MarianTokenizer
|
||||
|
||||
[[autodoc]] MarianTokenizer
|
||||
- as_target_tokenizer
|
||||
|
||||
## MarianModel
|
||||
|
||||
[[autodoc]] MarianModel
|
||||
- forward
|
||||
|
||||
## MarianMTModel
|
||||
|
||||
[[autodoc]] MarianMTModel
|
||||
- forward
|
||||
|
||||
## MarianForCausalLM
|
||||
|
||||
[[autodoc]] MarianForCausalLM
|
||||
- forward
|
||||
|
||||
## TFMarianModel
|
||||
|
||||
[[autodoc]] TFMarianModel
|
||||
- call
|
||||
|
||||
## TFMarianMTModel
|
||||
|
||||
[[autodoc]] TFMarianMTModel
|
||||
- call
|
||||
|
||||
## FlaxMarianModel
|
||||
|
||||
[[autodoc]] FlaxMarianModel
|
||||
- __call__
|
||||
|
||||
## FlaxMarianMTModel
|
||||
|
||||
[[autodoc]] FlaxMarianMTModel
|
||||
- __call__
|
@ -1,232 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
MarianMT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
**Bugs:** If you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__
|
||||
and assign @patrickvonplaten.
|
||||
|
||||
Translations should be similar, but not identical to output in the test set linked to in each model card.
|
||||
|
||||
Implementation Notes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Each model is about 298 MB on disk, there are more than 1,000 models.
|
||||
- The list of supported language pairs can be found `here <https://huggingface.co/Helsinki-NLP>`__.
|
||||
- Models were originally trained by `Jörg Tiedemann
|
||||
<https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the `Marian
|
||||
<https://marian-nmt.github.io/>`__ C++ library, which supports fast training and translation.
|
||||
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented
|
||||
in a model card.
|
||||
- The 80 opus models that require BPE preprocessing are not supported.
|
||||
- The modeling code is the same as :class:`~transformers.BartForConditionalGeneration` with a few minor modifications:
|
||||
|
||||
- static (sinusoid) positional embeddings (:obj:`MarianConfig.static_position_embeddings=True`)
|
||||
- no layernorm_embedding (:obj:`MarianConfig.normalize_embedding=False`)
|
||||
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
|
||||
:obj:`<s/>`),
|
||||
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
|
||||
- This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__.
|
||||
|
||||
Naming
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- All model names use the following format: :obj:`Helsinki-NLP/opus-mt-{src}-{tgt}`
|
||||
- The language codes used to name models are inconsistent. Two digit codes can usually be found `here
|
||||
<https://developers.google.com/admin-sdk/directory/v1/languages>`__, three digit codes require googling "language
|
||||
code {code}".
|
||||
- Codes formatted like :obj:`es_AR` are usually :obj:`code_{region}`. That one is Spanish from Argentina.
|
||||
- The models were converted in two stages. The first 1000 models use ISO-639-2 codes to identify languages, the second
|
||||
group use a combination of ISO-639-5 codes and ISO-639-2 codes.
|
||||
|
||||
|
||||
Examples
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Since Marian models are smaller than many other translation models available in the library, they can be useful for
|
||||
fine-tuning experiments and integration tests.
|
||||
- `Fine-tune on GPU
|
||||
<https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_enro_teacher.sh>`__
|
||||
- `Fine-tune on GPU with pytorch-lightning
|
||||
<https://github.com/huggingface/transformers/blob/master/examples/research_projects/seq2seq-distillation/train_distil_marian_no_teacher.sh>`__
|
||||
|
||||
Multilingual Models
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- All model names use the following format: :obj:`Helsinki-NLP/opus-mt-{src}-{tgt}`:
|
||||
- If a model can output multiple languages, and you should specify a language code by prepending the desired output
|
||||
language to the :obj:`src_text`.
|
||||
- You can see a models's supported language codes in its model card, under target constituents, like in `opus-mt-en-roa
|
||||
<https://huggingface.co/Helsinki-NLP/opus-mt-en-roa>`__.
|
||||
- Note that if a model is only multilingual on the source side, like :obj:`Helsinki-NLP/opus-mt-roa-en`, no language
|
||||
codes are required.
|
||||
|
||||
New multi-lingual models from the `Tatoeba-Challenge repo <https://github.com/Helsinki-NLP/Tatoeba-Challenge>`__
|
||||
require 3 character language codes:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> from transformers import MarianMTModel, MarianTokenizer
|
||||
>>> src_text = [
|
||||
... '>>fra<< this is a sentence in english that we want to translate to french',
|
||||
... '>>por<< This should go to portuguese',
|
||||
... '>>esp<< And this to Spanish'
|
||||
>>> ]
|
||||
|
||||
>>> model_name = 'Helsinki-NLP/opus-mt-en-roa'
|
||||
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
|
||||
>>> print(tokenizer.supported_language_codes)
|
||||
['>>zlm_Latn<<', '>>mfe<<', '>>hat<<', '>>pap<<', '>>ast<<', '>>cat<<', '>>ind<<', '>>glg<<', '>>wln<<', '>>spa<<', '>>fra<<', '>>ron<<', '>>por<<', '>>ita<<', '>>oci<<', '>>arg<<', '>>min<<']
|
||||
|
||||
>>> model = MarianMTModel.from_pretrained(model_name)
|
||||
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
|
||||
>>> [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
|
||||
["c'est une phrase en anglais que nous voulons traduire en français",
|
||||
'Isto deve ir para o português.',
|
||||
'Y esto al español']
|
||||
|
||||
|
||||
|
||||
|
||||
Here is the code to see all available pretrained models on the hub:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from huggingface_hub import list_models
|
||||
model_list = list_models()
|
||||
org = "Helsinki-NLP"
|
||||
model_ids = [x.modelId for x in model_list if x.modelId.startswith(org)]
|
||||
suffix = [x.split('/')[1] for x in model_ids]
|
||||
old_style_multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()]
|
||||
|
||||
|
||||
|
||||
Old Style Multi-Lingual Models
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
These are the old style multi-lingual models ported from the OPUS-MT-Train repo: and the members of each language
|
||||
group:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
['Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU',
|
||||
'Helsinki-NLP/opus-mt-ROMANCE-en',
|
||||
'Helsinki-NLP/opus-mt-SCANDINAVIA-SCANDINAVIA',
|
||||
'Helsinki-NLP/opus-mt-de-ZH',
|
||||
'Helsinki-NLP/opus-mt-en-CELTIC',
|
||||
'Helsinki-NLP/opus-mt-en-ROMANCE',
|
||||
'Helsinki-NLP/opus-mt-es-NORWAY',
|
||||
'Helsinki-NLP/opus-mt-fi-NORWAY',
|
||||
'Helsinki-NLP/opus-mt-fi-ZH',
|
||||
'Helsinki-NLP/opus-mt-fi_nb_no_nn_ru_sv_en-SAMI',
|
||||
'Helsinki-NLP/opus-mt-sv-NORWAY',
|
||||
'Helsinki-NLP/opus-mt-sv-ZH']
|
||||
GROUP_MEMBERS = {
|
||||
'ZH': ['cmn', 'cn', 'yue', 'ze_zh', 'zh_cn', 'zh_CN', 'zh_HK', 'zh_tw', 'zh_TW', 'zh_yue', 'zhs', 'zht', 'zh'],
|
||||
'ROMANCE': ['fr', 'fr_BE', 'fr_CA', 'fr_FR', 'wa', 'frp', 'oc', 'ca', 'rm', 'lld', 'fur', 'lij', 'lmo', 'es', 'es_AR', 'es_CL', 'es_CO', 'es_CR', 'es_DO', 'es_EC', 'es_ES', 'es_GT', 'es_HN', 'es_MX', 'es_NI', 'es_PA', 'es_PE', 'es_PR', 'es_SV', 'es_UY', 'es_VE', 'pt', 'pt_br', 'pt_BR', 'pt_PT', 'gl', 'lad', 'an', 'mwl', 'it', 'it_IT', 'co', 'nap', 'scn', 'vec', 'sc', 'ro', 'la'],
|
||||
'NORTH_EU': ['de', 'nl', 'fy', 'af', 'da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
|
||||
'SCANDINAVIA': ['da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
|
||||
'SAMI': ['se', 'sma', 'smj', 'smn', 'sms'],
|
||||
'NORWAY': ['nb_NO', 'nb', 'nn_NO', 'nn', 'nog', 'no_nb', 'no'],
|
||||
'CELTIC': ['ga', 'cy', 'br', 'gd', 'kw', 'gv']
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
Example of translating english to many romance languages, using old-style 2 character language codes
|
||||
|
||||
|
||||
.. code-block::python
|
||||
|
||||
>>> from transformers import MarianMTModel, MarianTokenizer
|
||||
>>> src_text = [
|
||||
... '>>fr<< this is a sentence in english that we want to translate to french',
|
||||
... '>>pt<< This should go to portuguese',
|
||||
... '>>es<< And this to Spanish'
|
||||
>>> ]
|
||||
|
||||
>>> model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
|
||||
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
|
||||
|
||||
>>> model = MarianMTModel.from_pretrained(model_name)
|
||||
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
|
||||
>>> tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
|
||||
["c'est une phrase en anglais que nous voulons traduire en français",
|
||||
'Isto deve ir para o português.',
|
||||
'Y esto al español']
|
||||
|
||||
|
||||
|
||||
MarianConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MarianConfig
|
||||
:members:
|
||||
|
||||
|
||||
MarianTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MarianTokenizer
|
||||
:members: as_target_tokenizer
|
||||
|
||||
|
||||
MarianModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MarianModel
|
||||
:members: forward
|
||||
|
||||
|
||||
MarianMTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MarianMTModel
|
||||
:members: forward
|
||||
|
||||
|
||||
MarianForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MarianForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
TFMarianModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFMarianModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFMarianMTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFMarianMTModel
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxMarianModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMarianModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxMarianMTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMarianMTModel
|
||||
:members: __call__
|
230
docs/source/model_doc/mbart.mdx
Normal file
230
docs/source/model_doc/mbart.mdx
Normal file
@ -0,0 +1,230 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# MBart and MBart-50
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title) and assign
|
||||
@patrickvonplaten
|
||||
|
||||
## Overview of MBart
|
||||
|
||||
The MBart model was presented in [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov Marjan
|
||||
Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
|
||||
According to the abstract, MBART is a sequence-to-sequence denoising auto-encoder pretrained on large-scale monolingual
|
||||
corpora in many languages using the BART objective. mBART is one of the first methods for pretraining a complete
|
||||
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
|
||||
on the encoder, decoder, or reconstructing parts of the text.
|
||||
|
||||
This model was contributed by [valhalla](https://huggingface.co/valhalla). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/mbart)
|
||||
|
||||
### Training of MBart
|
||||
|
||||
MBart is a multilingual encoder-decoder (sequence-to-sequence) model primarily intended for translation task. As the
|
||||
model is multilingual it expects the sequences in a different format. A special language id token is added in both the
|
||||
source and target text. The source text format is `X [eos, src_lang_code]` where `X` is the source text. The
|
||||
target text format is `[tgt_lang_code] X [eos]`. `bos` is never used.
|
||||
|
||||
The regular [`~MBartTokenizer.__call__`] will encode source text format, and it should be wrapped
|
||||
inside the context manager [`~MBartTokenizer.as_target_tokenizer`] to encode target text format.
|
||||
|
||||
- Supervised training
|
||||
|
||||
```python
|
||||
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
|
||||
|
||||
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro", src_lang="en_XX", tgt_lang="ro_RO")
|
||||
>>> example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
|
||||
>>> expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
|
||||
|
||||
>>> inputs = tokenizer(example_english_phrase, return_tensors="pt")
|
||||
>>> with tokenizer.as_target_tokenizer():
|
||||
... labels = tokenizer(expected_translation_romanian, return_tensors="pt")
|
||||
|
||||
>>> model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
|
||||
>>> # forward pass
|
||||
>>> model(**inputs, labels=batch['labels'])
|
||||
```
|
||||
|
||||
- Generation
|
||||
|
||||
While generating the target text set the `decoder_start_token_id` to the target language id. The following
|
||||
example shows how to translate English to Romanian using the *facebook/mbart-large-en-ro* model.
|
||||
|
||||
```python
|
||||
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
|
||||
|
||||
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro", src_lang="en_XX")
|
||||
>>> article = "UN Chief Says There Is No Military Solution in Syria"
|
||||
>>> inputs = tokenizer(article, return_tensors="pt")
|
||||
>>> translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
|
||||
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
|
||||
"Şeful ONU declară că nu există o soluţie militară în Siria"
|
||||
```
|
||||
|
||||
## Overview of MBart-50
|
||||
|
||||
MBart-50 was introduced in the *Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
|
||||
<https://arxiv.org/abs/2008.00401>* paper by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav
|
||||
Chaudhary, Jiatao Gu, Angela Fan. MBart-50 is created using the original *mbart-large-cc25* checkpoint by extendeding
|
||||
its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50
|
||||
languages.
|
||||
|
||||
According to the abstract
|
||||
|
||||
*Multilingual translation models can be created through multilingual finetuning. Instead of finetuning on one
|
||||
direction, a pretrained model is finetuned on many directions at the same time. It demonstrates that pretrained models
|
||||
can be extended to incorporate additional languages without loss of performance. Multilingual finetuning improves on
|
||||
average 1 BLEU over the strongest baselines (being either multilingual from scratch or bilingual finetuning) while
|
||||
improving 9.3 BLEU on average over bilingual baselines from scratch.*
|
||||
|
||||
|
||||
### Training of MBart-50
|
||||
|
||||
The text format for MBart-50 is slightly different from mBART. For MBart-50 the language id token is used as a prefix
|
||||
for both source and target text i.e the text format is `[lang_code] X [eos]`, where `lang_code` is source
|
||||
language id for source text and target language id for target text, with `X` being the source or target text
|
||||
respectively.
|
||||
|
||||
|
||||
MBart-50 has its own tokenizer [`MBart50Tokenizer`].
|
||||
|
||||
- Supervised training
|
||||
|
||||
```python
|
||||
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
|
||||
|
||||
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50")
|
||||
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50", src_lang="en_XX", tgt_lang="ro_RO")
|
||||
|
||||
src_text = " UN Chief Says There Is No Military Solution in Syria"
|
||||
tgt_text = "Şeful ONU declară că nu există o soluţie militară în Siria"
|
||||
|
||||
model_inputs = tokenizer(src_text, return_tensors="pt")
|
||||
with tokenizer.as_target_tokenizer():
|
||||
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
|
||||
|
||||
model(**model_inputs, labels=labels) # forward pass
|
||||
```
|
||||
|
||||
- Generation
|
||||
|
||||
To generate using the mBART-50 multilingual translation models, `eos_token_id` is used as the
|
||||
`decoder_start_token_id` and the target language id is forced as the first generated token. To force the
|
||||
target language id as the first generated token, pass the *forced_bos_token_id* parameter to the *generate* method.
|
||||
The following example shows how to translate between Hindi to French and Arabic to English using the
|
||||
*facebook/mbart-50-large-many-to-many* checkpoint.
|
||||
|
||||
```python
|
||||
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
|
||||
|
||||
article_hi = "संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है"
|
||||
article_ar = "الأمين العام للأمم المتحدة يقول إنه لا يوجد حل عسكري في سوريا."
|
||||
|
||||
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
|
||||
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
|
||||
|
||||
# translate Hindi to French
|
||||
tokenizer.src_lang = "hi_IN"
|
||||
encoded_hi = tokenizer(article_hi, return_tensors="pt")
|
||||
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id["fr_XX"])
|
||||
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
# => "Le chef de l 'ONU affirme qu 'il n 'y a pas de solution militaire en Syria."
|
||||
|
||||
# translate Arabic to English
|
||||
tokenizer.src_lang = "ar_AR"
|
||||
encoded_ar = tokenizer(article_ar, return_tensors="pt")
|
||||
generated_tokens = model.generate(**encoded_ar, forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"])
|
||||
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
# => "The Secretary-General of the United Nations says there is no military solution in Syria."
|
||||
```
|
||||
|
||||
## MBartConfig
|
||||
|
||||
[[autodoc]] MBartConfig
|
||||
|
||||
## MBartTokenizer
|
||||
|
||||
[[autodoc]] MBartTokenizer
|
||||
- as_target_tokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
|
||||
## MBartTokenizerFast
|
||||
|
||||
[[autodoc]] MBartTokenizerFast
|
||||
|
||||
## MBart50Tokenizer
|
||||
|
||||
[[autodoc]] MBart50Tokenizer
|
||||
|
||||
## MBart50TokenizerFast
|
||||
|
||||
[[autodoc]] MBart50TokenizerFast
|
||||
|
||||
## MBartModel
|
||||
|
||||
[[autodoc]] MBartModel
|
||||
|
||||
## MBartForConditionalGeneration
|
||||
|
||||
[[autodoc]] MBartForConditionalGeneration
|
||||
|
||||
## MBartForQuestionAnswering
|
||||
|
||||
[[autodoc]] MBartForQuestionAnswering
|
||||
|
||||
## MBartForSequenceClassification
|
||||
|
||||
[[autodoc]] MBartForSequenceClassification
|
||||
|
||||
## MBartForCausalLM
|
||||
|
||||
[[autodoc]] MBartForCausalLM
|
||||
- forward
|
||||
|
||||
## TFMBartModel
|
||||
|
||||
[[autodoc]] TFMBartModel
|
||||
- call
|
||||
|
||||
## TFMBartForConditionalGeneration
|
||||
|
||||
[[autodoc]] TFMBartForConditionalGeneration
|
||||
- call
|
||||
|
||||
## FlaxMBartModel
|
||||
|
||||
[[autodoc]] FlaxMBartModel
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxMBartForConditionalGeneration
|
||||
|
||||
[[autodoc]] FlaxMBartForConditionalGeneration
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxMBartForSequenceClassification
|
||||
|
||||
[[autodoc]] FlaxMBartForSequenceClassification
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
||||
|
||||
## FlaxMBartForQuestionAnswering
|
||||
|
||||
[[autodoc]] FlaxMBartForQuestionAnswering
|
||||
- __call__
|
||||
- encode
|
||||
- decode
|
@ -1,270 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
MBart and MBart-50
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
|
||||
@patrickvonplaten
|
||||
|
||||
Overview of MBart
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The MBart model was presented in `Multilingual Denoising Pre-training for Neural Machine Translation
|
||||
<https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov Marjan
|
||||
Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
|
||||
According to the abstract, MBART is a sequence-to-sequence denoising auto-encoder pretrained on large-scale monolingual
|
||||
corpora in many languages using the BART objective. mBART is one of the first methods for pretraining a complete
|
||||
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
|
||||
on the encoder, decoder, or reconstructing parts of the text.
|
||||
|
||||
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The Authors' code can be found `here
|
||||
<https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
|
||||
|
||||
Training of MBart
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
MBart is a multilingual encoder-decoder (sequence-to-sequence) model primarily intended for translation task. As the
|
||||
model is multilingual it expects the sequences in a different format. A special language id token is added in both the
|
||||
source and target text. The source text format is :obj:`X [eos, src_lang_code]` where :obj:`X` is the source text. The
|
||||
target text format is :obj:`[tgt_lang_code] X [eos]`. :obj:`bos` is never used.
|
||||
|
||||
The regular :meth:`~transformers.MBartTokenizer.__call__` will encode source text format, and it should be wrapped
|
||||
inside the context manager :meth:`~transformers.MBartTokenizer.as_target_tokenizer` to encode target text format.
|
||||
|
||||
- Supervised training
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
|
||||
|
||||
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro", src_lang="en_XX", tgt_lang="ro_RO")
|
||||
>>> example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
|
||||
>>> expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
|
||||
|
||||
>>> inputs = tokenizer(example_english_phrase, return_tensors="pt")
|
||||
>>> with tokenizer.as_target_tokenizer():
|
||||
... labels = tokenizer(expected_translation_romanian, return_tensors="pt")
|
||||
|
||||
>>> model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
|
||||
>>> # forward pass
|
||||
>>> model(**inputs, labels=batch['labels'])
|
||||
|
||||
- Generation
|
||||
|
||||
While generating the target text set the :obj:`decoder_start_token_id` to the target language id. The following
|
||||
example shows how to translate English to Romanian using the `facebook/mbart-large-en-ro` model.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
|
||||
|
||||
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro", src_lang="en_XX")
|
||||
>>> article = "UN Chief Says There Is No Military Solution in Syria"
|
||||
>>> inputs = tokenizer(article, return_tensors="pt")
|
||||
>>> translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
|
||||
>>> tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
|
||||
"Şeful ONU declară că nu există o soluţie militară în Siria"
|
||||
|
||||
|
||||
Overview of MBart-50
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
MBart-50 was introduced in the `Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
|
||||
<https://arxiv.org/abs/2008.00401>` paper by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav
|
||||
Chaudhary, Jiatao Gu, Angela Fan. MBart-50 is created using the original `mbart-large-cc25` checkpoint by extendeding
|
||||
its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50
|
||||
languages.
|
||||
|
||||
According to the abstract
|
||||
|
||||
*Multilingual translation models can be created through multilingual finetuning. Instead of finetuning on one
|
||||
direction, a pretrained model is finetuned on many directions at the same time. It demonstrates that pretrained models
|
||||
can be extended to incorporate additional languages without loss of performance. Multilingual finetuning improves on
|
||||
average 1 BLEU over the strongest baselines (being either multilingual from scratch or bilingual finetuning) while
|
||||
improving 9.3 BLEU on average over bilingual baselines from scratch.*
|
||||
|
||||
|
||||
Training of MBart-50
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
The text format for MBart-50 is slightly different from mBART. For MBart-50 the language id token is used as a prefix
|
||||
for both source and target text i.e the text format is :obj:`[lang_code] X [eos]`, where :obj:`lang_code` is source
|
||||
language id for source text and target language id for target text, with :obj:`X` being the source or target text
|
||||
respectively.
|
||||
|
||||
|
||||
MBart-50 has its own tokenizer :class:`~transformers.MBart50Tokenizer`.
|
||||
|
||||
- Supervised training
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
|
||||
|
||||
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50")
|
||||
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50", src_lang="en_XX", tgt_lang="ro_RO")
|
||||
|
||||
src_text = " UN Chief Says There Is No Military Solution in Syria"
|
||||
tgt_text = "Şeful ONU declară că nu există o soluţie militară în Siria"
|
||||
|
||||
model_inputs = tokenizer(src_text, return_tensors="pt")
|
||||
with tokenizer.as_target_tokenizer():
|
||||
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
|
||||
|
||||
model(**model_inputs, labels=labels) # forward pass
|
||||
|
||||
|
||||
- Generation
|
||||
|
||||
To generate using the mBART-50 multilingual translation models, :obj:`eos_token_id` is used as the
|
||||
:obj:`decoder_start_token_id` and the target language id is forced as the first generated token. To force the
|
||||
target language id as the first generated token, pass the `forced_bos_token_id` parameter to the `generate` method.
|
||||
The following example shows how to translate between Hindi to French and Arabic to English using the
|
||||
`facebook/mbart-50-large-many-to-many` checkpoint.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
|
||||
|
||||
article_hi = "संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है"
|
||||
article_ar = "الأمين العام للأمم المتحدة يقول إنه لا يوجد حل عسكري في سوريا."
|
||||
|
||||
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
|
||||
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
|
||||
|
||||
# translate Hindi to French
|
||||
tokenizer.src_lang = "hi_IN"
|
||||
encoded_hi = tokenizer(article_hi, return_tensors="pt")
|
||||
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id["fr_XX"])
|
||||
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
# => "Le chef de l 'ONU affirme qu 'il n 'y a pas de solution militaire en Syria."
|
||||
|
||||
# translate Arabic to English
|
||||
tokenizer.src_lang = "ar_AR"
|
||||
encoded_ar = tokenizer(article_ar, return_tensors="pt")
|
||||
generated_tokens = model.generate(**encoded_ar, forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"])
|
||||
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
|
||||
# => "The Secretary-General of the United Nations says there is no military solution in Syria."
|
||||
|
||||
|
||||
MBartConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartConfig
|
||||
:members:
|
||||
|
||||
|
||||
MBartTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartTokenizer
|
||||
:members: as_target_tokenizer, build_inputs_with_special_tokens
|
||||
|
||||
|
||||
MBartTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
MBart50Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBart50Tokenizer
|
||||
:members:
|
||||
|
||||
|
||||
MBart50TokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBart50TokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
MBartModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartModel
|
||||
:members:
|
||||
|
||||
|
||||
MBartForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartForConditionalGeneration
|
||||
:members:
|
||||
|
||||
|
||||
MBartForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartForQuestionAnswering
|
||||
:members:
|
||||
|
||||
|
||||
MBartForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartForSequenceClassification
|
||||
|
||||
|
||||
MBartForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MBartForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
TFMBartModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFMBartModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFMBartForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFMBartForConditionalGeneration
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxMBartModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMBartModel
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxMBartForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMBartForConditionalGeneration
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxMBartForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMBartForSequenceClassification
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxMBartForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMBartForQuestionAnswering
|
||||
:members: __call__, encode, decode
|
Loading…
Reference in New Issue
Block a user