mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-14 18:18:24 +06:00

* First addition of Flax/Jax documentation Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * make style * Ensure input order match between Bert & Roberta Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Install dependencies "all" when building doc Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * wraps build_doc deps with "" Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing @sgugger comments. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use list to highlight JAX features. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Make style. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Let's not look to much into the future for now. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Style Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
156 lines
6.1 KiB
ReStructuredText
156 lines
6.1 KiB
ReStructuredText
RoBERTa
|
|
-----------------------------------------------------------------------------------------------------------------------
|
|
|
|
Overview
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach
|
|
<https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
|
|
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
|
|
|
|
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with
|
|
much larger mini-batches and learning rates.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*Language model pretraining has led to significant performance gains but careful comparison between different
|
|
approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes,
|
|
and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication
|
|
study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and
|
|
training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every
|
|
model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results
|
|
highlight the importance of previously overlooked design choices, and raise questions about the source of recently
|
|
reported improvements. We release our models and code.*
|
|
|
|
Tips:
|
|
|
|
- This implementation is the same as :class:`~transformers.BertModel` with a tiny embeddings tweak as well as a setup
|
|
for Roberta pretrained models.
|
|
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a
|
|
different pretraining scheme.
|
|
- RoBERTa doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
|
|
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`</s>`)
|
|
- :doc:`CamemBERT <camembert>` is a wrapper around RoBERTa. Refer to this page for usage examples.
|
|
|
|
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
|
|
|
|
|
|
RobertaConfig
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaConfig
|
|
:members:
|
|
|
|
|
|
RobertaTokenizer
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaTokenizer
|
|
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
|
create_token_type_ids_from_sequences, save_vocabulary
|
|
|
|
|
|
RobertaTokenizerFast
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaTokenizerFast
|
|
:members: build_inputs_with_special_tokens
|
|
|
|
|
|
RobertaModel
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaModel
|
|
:members: forward
|
|
|
|
|
|
RobertaForCausalLM
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaForCausalLM
|
|
:members: forward
|
|
|
|
|
|
RobertaForMaskedLM
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaForMaskedLM
|
|
:members: forward
|
|
|
|
|
|
RobertaForSequenceClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaForSequenceClassification
|
|
:members: forward
|
|
|
|
|
|
RobertaForMultipleChoice
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaForMultipleChoice
|
|
:members: forward
|
|
|
|
|
|
RobertaForTokenClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaForTokenClassification
|
|
:members: forward
|
|
|
|
|
|
RobertaForQuestionAnswering
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.RobertaForQuestionAnswering
|
|
:members: forward
|
|
|
|
|
|
TFRobertaModel
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFRobertaModel
|
|
:members: call
|
|
|
|
|
|
TFRobertaForMaskedLM
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFRobertaForMaskedLM
|
|
:members: call
|
|
|
|
|
|
TFRobertaForSequenceClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFRobertaForSequenceClassification
|
|
:members: call
|
|
|
|
|
|
TFRobertaForMultipleChoice
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFRobertaForMultipleChoice
|
|
:members: call
|
|
|
|
|
|
TFRobertaForTokenClassification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFRobertaForTokenClassification
|
|
:members: call
|
|
|
|
|
|
TFRobertaForQuestionAnswering
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.TFRobertaForQuestionAnswering
|
|
:members: call
|
|
|
|
|
|
FlaxRobertaModel
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.FlaxRobertaModel
|
|
:members: __call__
|