From e594e75f1bc8a01349284e0d86b49b5278f127ce Mon Sep 17 00:00:00 2001
From: Aashish Anand <84689683+AshAnand34@users.noreply.github.com>
Date: Mon, 9 Jun 2025 12:26:31 -0700
Subject: [PATCH] Update XLM-RoBERTa model documentation with enhanced usage
examples and improved layout (#38596)
* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout
* Added CLI command example and quantization example for XLM RoBERTa model card.
* Minor change to transformers CLI and quantization example for XLM roberta model card
---
docs/source/en/model_doc/xlm-roberta.md | 130 ++++++++++++++++++------
1 file changed, 99 insertions(+), 31 deletions(-)
diff --git a/docs/source/en/model_doc/xlm-roberta.md b/docs/source/en/model_doc/xlm-roberta.md
index 2bc890257a6..80465da245e 100644
--- a/docs/source/en/model_doc/xlm-roberta.md
+++ b/docs/source/en/model_doc/xlm-roberta.md
@@ -14,45 +14,113 @@ rendered properly in your Markdown viewer.
-->
-# XLM-RoBERTa
-
-
-

-

-

-

+
-## Overview
+# XLM-RoBERTa
-The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume
-Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's
-RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl
-data.
+[XLM-RoBERTa](https://huggingface.co/papers/1911.02116) is a large multilingual masked language model trained on 2.5TB of filtered CommonCrawl data across 100 languages. It shows that scaling the model provides strong performance gains on high-resource and low-resource languages. The model uses the [RoBERTa](./roberta) pretraining objectives on the [XLM](./xlm) model.
-The abstract from the paper is the following:
+You can find all the original XLM-RoBERTa checkpoints under the [Facebook AI community](https://huggingface.co/FacebookAI) organization.
-*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a
-wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred
-languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly
-outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on
-XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on
-low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We
-also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the
-trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource
-languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing
-per-language performance; XLM-R is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We
-will make XLM-R code, data, and models publicly available.*
+> [!TIP]
+> Click on the XLM-RoBERTa models in the right sidebar for more examples of how to apply XLM-RoBERTa to different cross-lingual tasks like classification, translation, and question answering.
-This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
+The example below demonstrates how to predict the `
` token with [`Pipeline`], [`AutoModel`], and from the command line.
-## Usage tips
+
+
-- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
- not require `lang` tensors to understand which language is used, and should be able to determine the correct
- language from the input ids.
-- Uses RoBERTa tricks on the XLM approach, but does not use the translation language modeling objective. It only uses masked language modeling on sentences coming from one language.
+```python
+import torch
+from transformers import pipeline
+
+pipeline = pipeline(
+ task="fill-mask",
+ model="FacebookAI/xlm-roberta-base",
+ torch_dtype=torch.float16,
+ device=0
+)
+# Example in French
+pipeline("Bonjour, je suis un modèle .")
+
+
+
+
+```python
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+import torch
+
+tokenizer = AutoTokenizer.from_pretrained(
+ "FacebookAI/xlm-roberta-base"
+)
+model = AutoModelForMaskedLM.from_pretrained(
+ "FacebookAI/xlm-roberta-base",
+ torch_dtype=torch.float16,
+ device_map="auto",
+ attn_implementation="sdpa"
+)
+
+# Prepare input
+inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda")
+
+with torch.no_grad():
+ outputs = model(**inputs)
+ predictions = outputs.logits
+
+masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
+predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
+predicted_token = tokenizer.decode(predicted_token_id)
+
+print(f"The predicted token is: {predicted_token}")
+```
+
+
+
+
+```bash
+echo -e "Plants create through a process known as photosynthesis." | transformers-cli run --task fill-mask --model FacebookAI/xlm-roberta-base --device 0
+```
+
+
+
+Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [quantization guide](../quantization) overview for more available quantization backends.
+
+The example below uses [bitsandbytes](../quantization/bitsandbytes) the quantive the weights to 4 bits
+
+```python
+import torch
+from transformers import AutoModelForMaskedLM, AutoTokenizer, BitsAndBytesConfig
+
+quantization_config = BitsAndBytesConfig(
+ load_in_4bit=True,
+ bnb_4bit_compute_dtype=torch.bfloat16
+ bnb_4bit_quant_type="nf4", # or "fp4" for float 4-bit quantization
+ bnb_4bit_use_double_quant=True, # use double quantization for better performance
+)
+tokenizer = AutoTokenizer.from_pretrained("facebook/xlm-roberta-large")
+model = AutoModelForMaskedLM.from_pretrained(
+ "facebook/xlm-roberta-large",
+ torch_dtype=torch.float16,
+ device_map="auto",
+ attn_implementation="flash_attention_2",
+ quantization_config=quantization_config
+)
+
+inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## Notes
+
+- Unlike some XLM models, XLM-RoBERTa doesn't require `lang` tensors to understand what language is being used. It automatically determines the language from the input IDs
## Resources