mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-05 22:00:09 +06:00

* add electra model to flax * Remove Electra Next Sentence Prediction model added by mistake * fix parameter sharing and loosen equality threshold * fix styling issues * add mistaken removen imports * fix electra table * Add FlaxElectra to automodels and fixe docs * fix issues pointed out the PR * fix flax electra to comply with latest changes * remove stale class * add copied from Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
237 lines
9.8 KiB
ReStructuredText
237 lines
9.8 KiB
ReStructuredText
..
|
|
Copyright 2020 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
|
|
ELECTRA
|
|
-----------------------------------------------------------------------------------------------------------------------
|
|
|
|
Overview
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
|
|
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
|
|
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
|
|
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
|
|
identify which tokens were replaced by the generator in the sequence.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*Masked language modeling (MLM) pretraining methods such as BERT corrupt the input by replacing some tokens with [MASK]
|
|
and then train a model to reconstruct the original tokens. While they produce good results when transferred to
|
|
downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a
|
|
more sample-efficient pretraining task called replaced token detection. Instead of masking the input, our approach
|
|
corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead
|
|
of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that
|
|
predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments
|
|
demonstrate this new pretraining task is more efficient than MLM because the task is defined over all input tokens
|
|
rather than just the small subset that was masked out. As a result, the contextual representations learned by our
|
|
approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are
|
|
particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained
|
|
using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale,
|
|
where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when
|
|
using the same amount of compute.*
|
|
|
|
Tips:
|
|
|
|
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
|
|
only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
|
|
while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from their
|
|
embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no projection
|
|
layer is used.
|
|
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
|
|
contain both the generator and discriminator. The conversion script requires the user to name which model to export
|
|
into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
|
|
available ELECTRA models, however. This means that the discriminator may be loaded in the
|
|
:class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
|
|
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
|
|
doesn't exist in the generator).
|
|
|
|
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
|
|
<https://github.com/google-research/electra>`__.
|
|
|
|
|
|
ElectraConfig
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraConfig
|
|
:members:
|
|
|
|
|
|
ElectraTokenizer
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraTokenizer
|
|
:members:
|
|
|
|
|
|
ElectraTokenizerFast
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraTokenizerFast
|
|
:members:
|
|
|
|
|
|
Electra specific outputs
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.models.electra.modeling_electra.ElectraForPreTrainingOutput
|
|
:members:
|
|
|
|
.. autoclass:: transformers.models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
|
|
:members:
|
|
|
|
|
|
ElectraModel
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraModel
|
|
:members: forward
|
|
|
|
|
|
ElectraForPreTraining
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraForPreTraining
|
|
:members: forward
|
|
|
|
|
|
ElectraForMaskedLM
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraForMaskedLM
|
|
:members: forward
|
|
|
|
|
|
ElectraForSequenceClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraForSequenceClassification
|
|
:members: forward
|
|
|
|
|
|
ElectraForMultipleChoice
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraForMultipleChoice
|
|
:members: forward
|
|
|
|
|
|
ElectraForTokenClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraForTokenClassification
|
|
:members: forward
|
|
|
|
|
|
ElectraForQuestionAnswering
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.ElectraForQuestionAnswering
|
|
:members: forward
|
|
|
|
|
|
TFElectraModel
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFElectraModel
|
|
:members: call
|
|
|
|
|
|
TFElectraForPreTraining
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFElectraForPreTraining
|
|
:members: call
|
|
|
|
|
|
TFElectraForMaskedLM
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFElectraForMaskedLM
|
|
:members: call
|
|
|
|
|
|
TFElectraForSequenceClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFElectraForSequenceClassification
|
|
:members: call
|
|
|
|
|
|
TFElectraForMultipleChoice
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFElectraForMultipleChoice
|
|
:members: call
|
|
|
|
|
|
TFElectraForTokenClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFElectraForTokenClassification
|
|
:members: call
|
|
|
|
|
|
TFElectraForQuestionAnswering
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFElectraForQuestionAnswering
|
|
:members: call
|
|
|
|
|
|
FlaxElectraModel
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxElectraModel
|
|
:members: __call__
|
|
|
|
|
|
FlaxElectraForPreTraining
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxElectraForPreTraining
|
|
:members: __call__
|
|
|
|
|
|
FlaxElectraForMaskedLM
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxElectraForMaskedLM
|
|
:members: __call__
|
|
|
|
|
|
FlaxElectraForSequenceClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxElectraForSequenceClassification
|
|
:members: __call__
|
|
|
|
|
|
FlaxElectraForMultipleChoice
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxElectraForMultipleChoice
|
|
:members: __call__
|
|
|
|
|
|
FlaxElectraForTokenClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxElectraForTokenClassification
|
|
:members: __call__
|
|
|
|
|
|
FlaxElectraForQuestionAnswering
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxElectraForQuestionAnswering
|
|
:members: __call__
|