# XLM-RoBERTa-XL
[XLM-RoBERTa-XL](https://huggingface.co/papers/2105.00572) is a 3.5B parameter multilingual masked language model pretrained on 100 languages. It shows that by scaling model capacity, multilingual models demonstrates strong performance on high-resource languages and can even zero-shot low-resource languages.
You can find all the original XLM-RoBERTa-XL checkpoints under the [AI at Meta](https://huggingface.co/facebook?search_models=xlm) organization.
> [!TIP]
> Click on the XLM-RoBERTa-XL models in the right sidebar for more examples of how to apply XLM-RoBERTa-XL to different cross-lingual tasks like classification, translation, and question answering.
The example below demonstrates how to predict the `` token with [`Pipeline`], [`AutoModel`], and from the command line.
```python
import torch
from transformers import pipeline
pipeline = pipeline(
task="fill-mask",
model="facebook/xlm-roberta-xl",
torch_dtype=torch.float16,
device=0
)
pipeline("Bonjour, je suis un modèle .")
```
```python
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"facebook/xlm-roberta-xl",
)
model = AutoModelForMaskedLM.from_pretrained(
"facebook/xlm-roberta-xl",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="sdpa"
)
inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")
```
```bash
echo -e "Plants create through a process known as photosynthesis." | transformers-cli run --task fill-mask --model facebook/xlm-roberta-xl --device 0
```
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [torchao](../quantization/torchao) to only quantize the weights to int4.
```py
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer, TorchAoConfig
quantization_config = TorchAoConfig("int4_weight_only", group_size=128)
tokenizer = AutoTokenizer.from_pretrained(
"facebook/xlm-roberta-xl",
)
model = AutoModelForMaskedLM.from_pretrained(
"facebook/xlm-roberta-xl",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="sdpa",
quantization_config=quantization_config
)
inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")
```
## Notes
- Unlike some XLM models, XLM-RoBERTa-XL doesn't require `lang` tensors to understand which language is used. It automatically determines the language from the input ids.
## XLMRobertaXLConfig
[[autodoc]] XLMRobertaXLConfig
## XLMRobertaXLModel
[[autodoc]] XLMRobertaXLModel
- forward
## XLMRobertaXLForCausalLM
[[autodoc]] XLMRobertaXLForCausalLM
- forward
## XLMRobertaXLForMaskedLM
[[autodoc]] XLMRobertaXLForMaskedLM
- forward
## XLMRobertaXLForSequenceClassification
[[autodoc]] XLMRobertaXLForSequenceClassification
- forward
## XLMRobertaXLForMultipleChoice
[[autodoc]] XLMRobertaXLForMultipleChoice
- forward
## XLMRobertaXLForTokenClassification
[[autodoc]] XLMRobertaXLForTokenClassification
- forward
## XLMRobertaXLForQuestionAnswering
[[autodoc]] XLMRobertaXLForQuestionAnswering
- forward