From 443aafd3d6e73efb8f6e8fc0933ea966e621fd9c Mon Sep 17 00:00:00 2001 From: Sunil Reddy <166301619+allmight05@users.noreply.github.com> Date: Sat, 14 Jun 2025 00:32:44 +0530 Subject: [PATCH] [docs] updated roberta model card (#38777) * updated roberta model card * fixes suggested after reviewing --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/roberta.md | 129 +++++++++++++--------------- 1 file changed, 59 insertions(+), 70 deletions(-) diff --git a/docs/source/en/model_doc/roberta.md b/docs/source/en/model_doc/roberta.md index cfbf31fceed..058bebad5bf 100644 --- a/docs/source/en/model_doc/roberta.md +++ b/docs/source/en/model_doc/roberta.md @@ -14,97 +14,86 @@ rendered properly in your Markdown viewer. --> -# RoBERTa - -
-PyTorch -TensorFlow -Flax -SDPA +
+
+ PyTorch + TensorFlow + Flax + SDPA +
-## Overview +# RoBERTa -The RoBERTa model was proposed in [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://huggingface.co/papers/1907.11692) by Yinhan Liu, [Myle Ott](https://huggingface.co/myleott), Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer -Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018. +[RoBERTa](https://huggingface.co/papers/1907.11692) improves BERT with new pretraining objectives, demonstrating [BERT](./bert) was undertrained and training design is important. The pretraining objectives include dynamic masking, sentence packing, larger batches and a byte-level BPE tokenizer. -It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with -much larger mini-batches and learning rates. +You can find all the original RoBERTa checkpoints under the [Facebook AI](https://huggingface.co/FacebookAI) organization. -The abstract from the paper is the following: -*Language model pretraining has led to significant performance gains but careful comparison between different -approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, -and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication -study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and -training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every -model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results -highlight the importance of previously overlooked design choices, and raise questions about the source of recently -reported improvements. We release our models and code.* +> [!TIP] +> Click on the RoBERTa models in the right sidebar for more examples of how to apply RoBERTa to different language tasks. -This model was contributed by [julien-c](https://huggingface.co/julien-c). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/roberta). +The example below demonstrates how to predict the `` token with [`Pipeline`], [`AutoModel`], and from the command line. -## Usage tips + + -- This implementation is the same as [`BertModel`] with a minor tweak to the embeddings, as well as a setup - for RoBERTa pretrained models. -- RoBERTa has the same architecture as BERT but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a - different pretraining scheme. -- RoBERTa doesn't have `token_type_ids`, so you don't need to indicate which token belongs to which segment. Just - separate your segments with the separation token `tokenizer.sep_token` (or ``). -- RoBERTa is similar to BERT but with better pretraining techniques: +```py +import torch +from transformers import pipeline - * Dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all. - * Sentence packing: Sentences are packed together to reach 512 tokens (so the sentences are in an order that may span several documents). - * Larger batches: Training uses larger batches. - * Byte-level BPE vocabulary: Uses BPE with bytes as a subunit instead of characters, accommodating Unicode characters. -- [CamemBERT](camembert) is a wrapper around RoBERTa. Refer to its model page for usage examples. +pipeline = pipeline( + task="fill-mask", + model="FacebookAI/roberta-base", + torch_dtype=torch.float16, + device=0 +) +pipeline("Plants create through a process known as photosynthesis.") +``` -## Resources + + -A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with RoBERTa. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. +```py +import torch +from transformers import AutoModelForMaskedLM, AutoTokenizer - +tokenizer = AutoTokenizer.from_pretrained( + "FacebookAI/roberta-base", +) +model = AutoModelForMaskedLM.from_pretrained( + "FacebookAI/roberta-base", + torch_dtype=torch.float16, + device_map="auto", + attn_implementation="sdpa" +) +inputs = tokenizer("Plants create through a process known as photosynthesis.", return_tensors="pt").to("cuda") -- A blog on [Getting Started with Sentiment Analysis on Twitter](https://huggingface.co/blog/sentiment-analysis-twitter) using RoBERTa and the [Inference API](https://huggingface.co/inference-api). -- A blog on [Opinion Classification with Kili and Hugging Face AutoTrain](https://huggingface.co/blog/opinion-classification-with-kili) using RoBERTa. -- A notebook on how to [finetune RoBERTa for sentiment analysis](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb). 🌎 -- [`RobertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb). -- [`TFRobertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb). -- [`FlaxRobertaForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb). -- [Text classification task guide](../tasks/sequence_classification) +with torch.no_grad(): + outputs = model(**inputs) + predictions = outputs.logits - +masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] +predicted_token_id = predictions[0, masked_index].argmax(dim=-1) +predicted_token = tokenizer.decode(predicted_token_id) -- [`RobertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb). -- [`TFRobertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). -- [`FlaxRobertaForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification). -- [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course. -- [Token classification task guide](../tasks/token_classification) +print(f"The predicted token is: {predicted_token}") +``` - + + -- A blog on [How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train) with RoBERTa. -- [`RobertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). -- [`TFRobertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). -- [`FlaxRobertaForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb). -- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the 🤗 Hugging Face Course. -- [Masked language modeling task guide](../tasks/masked_language_modeling) +```bash +echo -e "Plants create through a process known as photosynthesis." | transformers-cli run --task fill-mask --model FacebookAI/roberta-base --device 0 +``` - + + -- A blog on [Accelerated Inference with Optimum and Transformers Pipelines](https://huggingface.co/blog/optimum-inference) with RoBERTa for question answering. -- [`RobertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb). -- [`TFRobertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb). -- [`FlaxRobertaForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering). -- [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the 🤗 Hugging Face Course. -- [Question answering task guide](../tasks/question_answering) +## Notes -**Multiple choice** -- [`RobertaForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb). -- [`TFRobertaForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb). -- [Multiple choice task guide](../tasks/multiple_choice) +- RoBERTa doesn't have `token_type_ids` so you don't need to indicate which token belongs to which segment. Separate your segments with the separation token `tokenizer.sep_token` or ``. ## RobertaConfig