transformers/docs/source/model_doc/squeezebert.rst
Forrest Iandola 02ef825be2
SqueezeBERT architecture (#7083)
* configuration_squeezebert.py

thin wrapper around bert tokenizer

fix typos

wip sb model code

wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working

set up squeezebert to use BertModelOutput when returning results.

squeezebert documentation

formatting

allow head mask that is an array of [None, ..., None]

docs

docs cont'd

path to vocab

docs and pointers to cloud files (WIP)

line length and indentation

squeezebert model cards

formatting of model cards

untrack modeling_squeezebert_scratchpad.py

update aws paths to vocab and config files

get rid of stub of NSP code, and advise users to pretrain with mlm only

fix rebase issues

redo rebase of modeling_auto.py

fix issues with code formatting

more code format auto-fixes

move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert

tests for squeezebert modeling and tokenization

fix typo

move squeezebert before bert in modeling_auto.py to fix inheritance problem

disable test_head_masking, since squeezebert doesn't yet implement head masking

fix issues exposed by the test_modeling_squeezebert.py

fix an issue exposed by test_tokenization_squeezebert.py

fix issue exposed by test_modeling_squeezebert.py

auto generated code style improvement

issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()

update copyright

resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask

docs

add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli

autogenerated formatting tweaks

integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings

* tiny change to order of imports
2020-10-05 04:25:43 -04:00

104 lines
3.8 KiB
ReStructuredText

SqueezeBERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The SqueezeBERT model was proposed in
`SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
<https://arxiv.org/abs/2006.11316>`__
by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer.
It's a bidirectional transformer similar to the BERT model.
The key difference between the BERT architecture and the SqueezeBERT architecture
is that SqueezeBERT uses `grouped convolutions <https://blog.yani.io/filter-group-tutorial>`__
instead of fully-connected layers for the Q, K, V and FFN layers.
The abstract from the paper is the following:
*Humans read and write hundreds of billions of messages every day. Further, due to the availability of
large datasets, large computing systems, and better neural network models, natural language processing (NLP)
technology has made significant strides in understanding, proofreading, and organizing these messages.
Thus, there is a significant opportunity to deploy NLP in myriad applications to help web users,
social networks, and businesses. In particular, we consider smartphones and other mobile devices as
crucial platforms for deploying NLP models at scale. However, today's highly-accurate NLP neural network
models such as BERT and RoBERTa are extremely computationally expensive, with BERT-base taking 1.7 seconds
to classify a text snippet on a Pixel 3 smartphone. In this work, we observe that methods such as grouped
convolutions have yielded significant speedups for computer vision networks, but many of these techniques
have not been adopted by NLP neural network designers. We demonstrate how to replace several operations in
self-attention layers with grouped convolutions, and we use this technique in a novel network architecture
called SqueezeBERT, which runs 4.3x faster than BERT-base on the Pixel 3 while achieving competitive
accuracy on the GLUE test set. The SqueezeBERT code will be released.*
Tips:
- SqueezeBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- SqueezeBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective.
It is therefore efficient at predicting masked tokens and at NLU in general, but is not optimal for
text generation. Models trained with a causal language modeling (CLM) objective are better in that regard.
- For best results when finetuning on sequence classification tasks, it is recommended to start with the
`squeezebert/squeezebert-mnli-headless` checkpoint.
SqueezeBertConfig
~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertConfig
:members:
SqueezeBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
SqueezeBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertTokenizerFast
:members:
SqueezeBertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertModel
:members:
SqueezeBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForMaskedLM
:members:
SqueezeBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForSequenceClassification
:members:
SqueezeBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForMultipleChoice
:members:
SqueezeBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForTokenClassification
:members:
SqueezeBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SqueezeBertForQuestionAnswering
:members: