Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596)

* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout

* Added CLI command example and quantization example for XLM RoBERTa model card.

* Minor change to transformers CLI and quantization example for XLM roberta model card
This commit is contained in:
Aashish Anand 2025-06-09 12:26:31 -07:00 committed by GitHub
parent 29ca043856
commit e594e75f1b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -14,45 +14,113 @@ rendered properly in your Markdown viewer.
--> -->
# XLM-RoBERTa <div style="float: right;">
<div class="flex flex-wrap space-x-1">
<div class="flex flex-wrap space-x-1"> <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white"> <img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white"> <img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo= <img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
"> </div>
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div> </div>
## Overview # XLM-RoBERTa
The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume [XLM-RoBERTa](https://huggingface.co/papers/1911.02116) is a large multilingual masked language model trained on 2.5TB of filtered CommonCrawl data across 100 languages. It shows that scaling the model provides strong performance gains on high-resource and low-resource languages. The model uses the [RoBERTa](./roberta) pretraining objectives on the [XLM](./xlm) model.
Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's
RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl
data.
The abstract from the paper is the following: You can find all the original XLM-RoBERTa checkpoints under the [Facebook AI community](https://huggingface.co/FacebookAI) organization.
*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a > [!TIP]
wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred > Click on the XLM-RoBERTa models in the right sidebar for more examples of how to apply XLM-RoBERTa to different cross-lingual tasks like classification, translation, and question answering.
languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly
outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on
XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on
low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We
also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the
trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource
languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing
per-language performance; XLM-R is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We
will make XLM-R code, data, and models publicly available.*
This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr). The example below demonstrates how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`], and from the command line.
## Usage tips <hfoptions id="usage">
<hfoption id="Pipeline">
- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does ```python
not require `lang` tensors to understand which language is used, and should be able to determine the correct import torch
language from the input ids. from transformers import pipeline
- Uses RoBERTa tricks on the XLM approach, but does not use the translation language modeling objective. It only uses masked language modeling on sentences coming from one language.
pipeline = pipeline(
task="fill-mask",
model="FacebookAI/xlm-roberta-base",
torch_dtype=torch.float16,
device=0
)
# Example in French
pipeline("Bonjour, je suis un modèle <mask>.")
</hfoption>
<hfoption id="AutoModel">
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained(
"FacebookAI/xlm-roberta-base"
)
model = AutoModelForMaskedLM.from_pretrained(
"FacebookAI/xlm-roberta-base",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="sdpa"
)
# Prepare input
inputs = tokenizer("Bonjour, je suis un modèle <mask>.", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")
```
</hfoption>
<hfoption id="transformers CLI">
```bash
echo -e "Plants create <mask> through a process known as photosynthesis." | transformers-cli run --task fill-mask --model FacebookAI/xlm-roberta-base --device 0
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [quantization guide](../quantization) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) the quantive the weights to 4 bits
```python
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16
bnb_4bit_quant_type="nf4", # or "fp4" for float 4-bit quantization
bnb_4bit_use_double_quant=True, # use double quantization for better performance
)
tokenizer = AutoTokenizer.from_pretrained("facebook/xlm-roberta-large")
model = AutoModelForMaskedLM.from_pretrained(
"facebook/xlm-roberta-large",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="flash_attention_2",
quantization_config=quantization_config
)
inputs = tokenizer("Bonjour, je suis un modèle <mask>.", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Notes
- Unlike some XLM models, XLM-RoBERTa doesn't require `lang` tensors to understand what language is being used. It automatically determines the language from the input IDs
## Resources ## Resources