mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-04 21:30:07 +06:00

* Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates
143 lines
4.6 KiB
Plaintext
143 lines
4.6 KiB
Plaintext
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# MobileBERT
|
|
|
|
## Overview
|
|
|
|
The MobileBERT model was proposed in [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny
|
|
Zhou. It's a bidirectional transformer based on the BERT model, which is compressed and accelerated using several
|
|
approaches.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds
|
|
of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot
|
|
be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating
|
|
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to
|
|
various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while
|
|
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks.
|
|
To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE
|
|
model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is
|
|
4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the
|
|
natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7 (0.6 lower than BERT_BASE), and 62 ms
|
|
latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of
|
|
90.0/79.2 (1.5/2.1 higher than BERT_BASE).*
|
|
|
|
Tips:
|
|
|
|
- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
|
|
than the left.
|
|
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective. It is therefore
|
|
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
|
|
with a causal language modeling (CLM) objective are better in that regard.
|
|
|
|
This model was contributed by [vshampor](https://huggingface.co/vshampor). The original code can be found [here](https://github.com/google-research/mobilebert).
|
|
|
|
## MobileBertConfig
|
|
|
|
[[autodoc]] MobileBertConfig
|
|
|
|
## MobileBertTokenizer
|
|
|
|
[[autodoc]] MobileBertTokenizer
|
|
|
|
## MobileBertTokenizerFast
|
|
|
|
[[autodoc]] MobileBertTokenizerFast
|
|
|
|
## MobileBert specific outputs
|
|
|
|
[[autodoc]] models.mobilebert.modeling_mobilebert.MobileBertForPreTrainingOutput
|
|
|
|
[[autodoc]] models.mobilebert.modeling_tf_mobilebert.TFMobileBertForPreTrainingOutput
|
|
|
|
## MobileBertModel
|
|
|
|
[[autodoc]] MobileBertModel
|
|
- forward
|
|
|
|
## MobileBertForPreTraining
|
|
|
|
[[autodoc]] MobileBertForPreTraining
|
|
- forward
|
|
|
|
## MobileBertForMaskedLM
|
|
|
|
[[autodoc]] MobileBertForMaskedLM
|
|
- forward
|
|
|
|
## MobileBertForNextSentencePrediction
|
|
|
|
[[autodoc]] MobileBertForNextSentencePrediction
|
|
- forward
|
|
|
|
## MobileBertForSequenceClassification
|
|
|
|
[[autodoc]] MobileBertForSequenceClassification
|
|
- forward
|
|
|
|
## MobileBertForMultipleChoice
|
|
|
|
[[autodoc]] MobileBertForMultipleChoice
|
|
- forward
|
|
|
|
## MobileBertForTokenClassification
|
|
|
|
[[autodoc]] MobileBertForTokenClassification
|
|
- forward
|
|
|
|
## MobileBertForQuestionAnswering
|
|
|
|
[[autodoc]] MobileBertForQuestionAnswering
|
|
- forward
|
|
|
|
## TFMobileBertModel
|
|
|
|
[[autodoc]] TFMobileBertModel
|
|
- call
|
|
|
|
## TFMobileBertForPreTraining
|
|
|
|
[[autodoc]] TFMobileBertForPreTraining
|
|
- call
|
|
|
|
## TFMobileBertForMaskedLM
|
|
|
|
[[autodoc]] TFMobileBertForMaskedLM
|
|
- call
|
|
|
|
## TFMobileBertForNextSentencePrediction
|
|
|
|
[[autodoc]] TFMobileBertForNextSentencePrediction
|
|
- call
|
|
|
|
## TFMobileBertForSequenceClassification
|
|
|
|
[[autodoc]] TFMobileBertForSequenceClassification
|
|
- call
|
|
|
|
## TFMobileBertForMultipleChoice
|
|
|
|
[[autodoc]] TFMobileBertForMultipleChoice
|
|
- call
|
|
|
|
## TFMobileBertForTokenClassification
|
|
|
|
[[autodoc]] TFMobileBertForTokenClassification
|
|
- call
|
|
|
|
## TFMobileBertForQuestionAnswering
|
|
|
|
[[autodoc]] TFMobileBertForQuestionAnswering
|
|
- call
|