diff --git a/docs/source/en/model_doc/xlm-roberta.md b/docs/source/en/model_doc/xlm-roberta.md index 2bc890257a6..80465da245e 100644 --- a/docs/source/en/model_doc/xlm-roberta.md +++ b/docs/source/en/model_doc/xlm-roberta.md @@ -14,45 +14,113 @@ rendered properly in your Markdown viewer. --> -# XLM-RoBERTa - -
-PyTorch -TensorFlow -Flax -SDPA +
+
+ PyTorch + TensorFlow + Flax + SDPA +
-## Overview +# XLM-RoBERTa -The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume -Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's -RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl -data. +[XLM-RoBERTa](https://huggingface.co/papers/1911.02116) is a large multilingual masked language model trained on 2.5TB of filtered CommonCrawl data across 100 languages. It shows that scaling the model provides strong performance gains on high-resource and low-resource languages. The model uses the [RoBERTa](./roberta) pretraining objectives on the [XLM](./xlm) model. -The abstract from the paper is the following: +You can find all the original XLM-RoBERTa checkpoints under the [Facebook AI community](https://huggingface.co/FacebookAI) organization. -*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a -wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred -languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly -outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on -XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on -low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We -also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the -trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource -languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing -per-language performance; XLM-R is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We -will make XLM-R code, data, and models publicly available.* +> [!TIP] +> Click on the XLM-RoBERTa models in the right sidebar for more examples of how to apply XLM-RoBERTa to different cross-lingual tasks like classification, translation, and question answering. -This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr). +The example below demonstrates how to predict the `` token with [`Pipeline`], [`AutoModel`], and from the command line. -## Usage tips + + -- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does - not require `lang` tensors to understand which language is used, and should be able to determine the correct - language from the input ids. -- Uses RoBERTa tricks on the XLM approach, but does not use the translation language modeling objective. It only uses masked language modeling on sentences coming from one language. +```python +import torch +from transformers import pipeline + +pipeline = pipeline( + task="fill-mask", + model="FacebookAI/xlm-roberta-base", + torch_dtype=torch.float16, + device=0 +) +# Example in French +pipeline("Bonjour, je suis un modèle .") + + + + +```python +from transformers import AutoModelForMaskedLM, AutoTokenizer +import torch + +tokenizer = AutoTokenizer.from_pretrained( + "FacebookAI/xlm-roberta-base" +) +model = AutoModelForMaskedLM.from_pretrained( + "FacebookAI/xlm-roberta-base", + torch_dtype=torch.float16, + device_map="auto", + attn_implementation="sdpa" +) + +# Prepare input +inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda") + +with torch.no_grad(): + outputs = model(**inputs) + predictions = outputs.logits + +masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] +predicted_token_id = predictions[0, masked_index].argmax(dim=-1) +predicted_token = tokenizer.decode(predicted_token_id) + +print(f"The predicted token is: {predicted_token}") +``` + + + + +```bash +echo -e "Plants create through a process known as photosynthesis." | transformers-cli run --task fill-mask --model FacebookAI/xlm-roberta-base --device 0 +``` + + + +Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [quantization guide](../quantization) overview for more available quantization backends. + +The example below uses [bitsandbytes](../quantization/bitsandbytes) the quantive the weights to 4 bits + +```python +import torch +from transformers import AutoModelForMaskedLM, AutoTokenizer, BitsAndBytesConfig + +quantization_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_compute_dtype=torch.bfloat16 + bnb_4bit_quant_type="nf4", # or "fp4" for float 4-bit quantization + bnb_4bit_use_double_quant=True, # use double quantization for better performance +) +tokenizer = AutoTokenizer.from_pretrained("facebook/xlm-roberta-large") +model = AutoModelForMaskedLM.from_pretrained( + "facebook/xlm-roberta-large", + torch_dtype=torch.float16, + device_map="auto", + attn_implementation="flash_attention_2", + quantization_config=quantization_config +) + +inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda") +outputs = model.generate(**inputs, max_new_tokens=100) +print(tokenizer.decode(outputs[0], skip_special_tokens=True)) +``` + +## Notes + +- Unlike some XLM models, XLM-RoBERTa doesn't require `lang` tensors to understand what language is being used. It automatically determines the language from the input IDs ## Resources