mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-04 05:10:06 +06:00
Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596)
* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout * Added CLI command example and quantization example for XLM RoBERTa model card. * Minor change to transformers CLI and quantization example for XLM roberta model card
This commit is contained in:
parent
29ca043856
commit
e594e75f1b
@ -14,45 +14,113 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# XLM-RoBERTa
|
<div style="float: right;">
|
||||||
|
<div class="flex flex-wrap space-x-1">
|
||||||
<div class="flex flex-wrap space-x-1">
|
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||||
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
|
||||||
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
|
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=">
|
||||||
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
|
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||||
">
|
</div>
|
||||||
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
## Overview
|
# XLM-RoBERTa
|
||||||
|
|
||||||
The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume
|
[XLM-RoBERTa](https://huggingface.co/papers/1911.02116) is a large multilingual masked language model trained on 2.5TB of filtered CommonCrawl data across 100 languages. It shows that scaling the model provides strong performance gains on high-resource and low-resource languages. The model uses the [RoBERTa](./roberta) pretraining objectives on the [XLM](./xlm) model.
|
||||||
Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's
|
|
||||||
RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl
|
|
||||||
data.
|
|
||||||
|
|
||||||
The abstract from the paper is the following:
|
You can find all the original XLM-RoBERTa checkpoints under the [Facebook AI community](https://huggingface.co/FacebookAI) organization.
|
||||||
|
|
||||||
*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a
|
> [!TIP]
|
||||||
wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred
|
> Click on the XLM-RoBERTa models in the right sidebar for more examples of how to apply XLM-RoBERTa to different cross-lingual tasks like classification, translation, and question answering.
|
||||||
languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly
|
|
||||||
outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on
|
|
||||||
XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on
|
|
||||||
low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We
|
|
||||||
also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the
|
|
||||||
trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource
|
|
||||||
languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing
|
|
||||||
per-language performance; XLM-R is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We
|
|
||||||
will make XLM-R code, data, and models publicly available.*
|
|
||||||
|
|
||||||
This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
|
The example below demonstrates how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`], and from the command line.
|
||||||
|
|
||||||
## Usage tips
|
<hfoptions id="usage">
|
||||||
|
<hfoption id="Pipeline">
|
||||||
|
|
||||||
- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
|
```python
|
||||||
not require `lang` tensors to understand which language is used, and should be able to determine the correct
|
import torch
|
||||||
language from the input ids.
|
from transformers import pipeline
|
||||||
- Uses RoBERTa tricks on the XLM approach, but does not use the translation language modeling objective. It only uses masked language modeling on sentences coming from one language.
|
|
||||||
|
pipeline = pipeline(
|
||||||
|
task="fill-mask",
|
||||||
|
model="FacebookAI/xlm-roberta-base",
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
device=0
|
||||||
|
)
|
||||||
|
# Example in French
|
||||||
|
pipeline("Bonjour, je suis un modèle <mask>.")
|
||||||
|
|
||||||
|
</hfoption>
|
||||||
|
<hfoption id="AutoModel">
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoModelForMaskedLM, AutoTokenizer
|
||||||
|
import torch
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(
|
||||||
|
"FacebookAI/xlm-roberta-base"
|
||||||
|
)
|
||||||
|
model = AutoModelForMaskedLM.from_pretrained(
|
||||||
|
"FacebookAI/xlm-roberta-base",
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
device_map="auto",
|
||||||
|
attn_implementation="sdpa"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Prepare input
|
||||||
|
inputs = tokenizer("Bonjour, je suis un modèle <mask>.", return_tensors="pt").to("cuda")
|
||||||
|
|
||||||
|
with torch.no_grad():
|
||||||
|
outputs = model(**inputs)
|
||||||
|
predictions = outputs.logits
|
||||||
|
|
||||||
|
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
|
||||||
|
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
|
||||||
|
predicted_token = tokenizer.decode(predicted_token_id)
|
||||||
|
|
||||||
|
print(f"The predicted token is: {predicted_token}")
|
||||||
|
```
|
||||||
|
|
||||||
|
</hfoption>
|
||||||
|
<hfoption id="transformers CLI">
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo -e "Plants create <mask> through a process known as photosynthesis." | transformers-cli run --task fill-mask --model FacebookAI/xlm-roberta-base --device 0
|
||||||
|
```
|
||||||
|
</hfoption>
|
||||||
|
</hfoptions>
|
||||||
|
|
||||||
|
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [quantization guide](../quantization) overview for more available quantization backends.
|
||||||
|
|
||||||
|
The example below uses [bitsandbytes](../quantization/bitsandbytes) the quantive the weights to 4 bits
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModelForMaskedLM, AutoTokenizer, BitsAndBytesConfig
|
||||||
|
|
||||||
|
quantization_config = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True,
|
||||||
|
bnb_4bit_compute_dtype=torch.bfloat16
|
||||||
|
bnb_4bit_quant_type="nf4", # or "fp4" for float 4-bit quantization
|
||||||
|
bnb_4bit_use_double_quant=True, # use double quantization for better performance
|
||||||
|
)
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("facebook/xlm-roberta-large")
|
||||||
|
model = AutoModelForMaskedLM.from_pretrained(
|
||||||
|
"facebook/xlm-roberta-large",
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
device_map="auto",
|
||||||
|
attn_implementation="flash_attention_2",
|
||||||
|
quantization_config=quantization_config
|
||||||
|
)
|
||||||
|
|
||||||
|
inputs = tokenizer("Bonjour, je suis un modèle <mask>.", return_tensors="pt").to("cuda")
|
||||||
|
outputs = model.generate(**inputs, max_new_tokens=100)
|
||||||
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Unlike some XLM models, XLM-RoBERTa doesn't require `lang` tensors to understand what language is being used. It automatically determines the language from the input IDs
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user