mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-05 22:00:09 +06:00

* Convert all tutorials and guides * Convert all remaining rst to mdx * Track and fix bad links
163 lines
4.6 KiB
Plaintext
163 lines
4.6 KiB
Plaintext
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# RoBERTa
|
|
|
|
## Overview
|
|
|
|
The RoBERTa model was proposed in [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
|
|
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
|
|
|
|
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with
|
|
much larger mini-batches and learning rates.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*Language model pretraining has led to significant performance gains but careful comparison between different
|
|
approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes,
|
|
and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication
|
|
study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and
|
|
training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every
|
|
model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results
|
|
highlight the importance of previously overlooked design choices, and raise questions about the source of recently
|
|
reported improvements. We release our models and code.*
|
|
|
|
Tips:
|
|
|
|
- This implementation is the same as [`BertModel`] with a tiny embeddings tweak as well as a setup
|
|
for Roberta pretrained models.
|
|
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a
|
|
different pretraining scheme.
|
|
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just
|
|
separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
|
|
- [CamemBERT](camembert) is a wrapper around RoBERTa. Refer to this page for usage examples.
|
|
|
|
This model was contributed by [julien-c](https://huggingface.co/julien-c). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta).
|
|
|
|
|
|
## RobertaConfig
|
|
|
|
[[autodoc]] RobertaConfig
|
|
|
|
## RobertaTokenizer
|
|
|
|
[[autodoc]] RobertaTokenizer
|
|
- build_inputs_with_special_tokens
|
|
- get_special_tokens_mask
|
|
- create_token_type_ids_from_sequences
|
|
- save_vocabulary
|
|
|
|
## RobertaTokenizerFast
|
|
|
|
[[autodoc]] RobertaTokenizerFast
|
|
- build_inputs_with_special_tokens
|
|
|
|
## RobertaModel
|
|
|
|
[[autodoc]] RobertaModel
|
|
- forward
|
|
|
|
## RobertaForCausalLM
|
|
|
|
[[autodoc]] RobertaForCausalLM
|
|
- forward
|
|
|
|
## RobertaForMaskedLM
|
|
|
|
[[autodoc]] RobertaForMaskedLM
|
|
- forward
|
|
|
|
## RobertaForSequenceClassification
|
|
|
|
[[autodoc]] RobertaForSequenceClassification
|
|
- forward
|
|
|
|
## RobertaForMultipleChoice
|
|
|
|
[[autodoc]] RobertaForMultipleChoice
|
|
- forward
|
|
|
|
## RobertaForTokenClassification
|
|
|
|
[[autodoc]] RobertaForTokenClassification
|
|
- forward
|
|
|
|
## RobertaForQuestionAnswering
|
|
|
|
[[autodoc]] RobertaForQuestionAnswering
|
|
- forward
|
|
|
|
## TFRobertaModel
|
|
|
|
[[autodoc]] TFRobertaModel
|
|
- call
|
|
|
|
## TFRobertaForCausalLM
|
|
|
|
[[autodoc]] TFRobertaForCausalLM
|
|
- call
|
|
|
|
## TFRobertaForMaskedLM
|
|
|
|
[[autodoc]] TFRobertaForMaskedLM
|
|
- call
|
|
|
|
## TFRobertaForSequenceClassification
|
|
|
|
[[autodoc]] TFRobertaForSequenceClassification
|
|
- call
|
|
|
|
## TFRobertaForMultipleChoice
|
|
|
|
[[autodoc]] TFRobertaForMultipleChoice
|
|
- call
|
|
|
|
## TFRobertaForTokenClassification
|
|
|
|
[[autodoc]] TFRobertaForTokenClassification
|
|
- call
|
|
|
|
## TFRobertaForQuestionAnswering
|
|
|
|
[[autodoc]] TFRobertaForQuestionAnswering
|
|
- call
|
|
|
|
## FlaxRobertaModel
|
|
|
|
[[autodoc]] FlaxRobertaModel
|
|
- __call__
|
|
|
|
## FlaxRobertaForMaskedLM
|
|
|
|
[[autodoc]] FlaxRobertaForMaskedLM
|
|
- __call__
|
|
|
|
## FlaxRobertaForSequenceClassification
|
|
|
|
[[autodoc]] FlaxRobertaForSequenceClassification
|
|
- __call__
|
|
|
|
## FlaxRobertaForMultipleChoice
|
|
|
|
[[autodoc]] FlaxRobertaForMultipleChoice
|
|
- __call__
|
|
|
|
## FlaxRobertaForTokenClassification
|
|
|
|
[[autodoc]] FlaxRobertaForTokenClassification
|
|
- __call__
|
|
|
|
## FlaxRobertaForQuestionAnswering
|
|
|
|
[[autodoc]] FlaxRobertaForQuestionAnswering
|
|
- __call__
|