PyTorch SDPA
# XLM-RoBERTa-XL [XLM-RoBERTa-XL](https://huggingface.co/papers/2105.00572) is a 3.5B parameter multilingual masked language model pretrained on 100 languages. It shows that by scaling model capacity, multilingual models demonstrates strong performance on high-resource languages and can even zero-shot low-resource languages. You can find all the original XLM-RoBERTa-XL checkpoints under the [AI at Meta](https://huggingface.co/facebook?search_models=xlm) organization. > [!TIP] > Click on the XLM-RoBERTa-XL models in the right sidebar for more examples of how to apply XLM-RoBERTa-XL to different cross-lingual tasks like classification, translation, and question answering. The example below demonstrates how to predict the `` token with [`Pipeline`], [`AutoModel`], and from the command line. ```python import torch from transformers import pipeline pipeline = pipeline( task="fill-mask", model="facebook/xlm-roberta-xl", torch_dtype=torch.float16, device=0 ) pipeline("Bonjour, je suis un modèle .") ``` ```python import torch from transformers import AutoModelForMaskedLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( "facebook/xlm-roberta-xl", ) model = AutoModelForMaskedLM.from_pretrained( "facebook/xlm-roberta-xl", torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa" ) inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model(**inputs) predictions = outputs.logits masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] predicted_token_id = predictions[0, masked_index].argmax(dim=-1) predicted_token = tokenizer.decode(predicted_token_id) print(f"The predicted token is: {predicted_token}") ``` ```bash echo -e "Plants create through a process known as photosynthesis." | transformers-cli run --task fill-mask --model facebook/xlm-roberta-xl --device 0 ``` Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends. The example below uses [torchao](../quantization/torchao) to only quantize the weights to int4. ```py import torch from transformers import AutoModelForMaskedLM, AutoTokenizer, TorchAoConfig quantization_config = TorchAoConfig("int4_weight_only", group_size=128) tokenizer = AutoTokenizer.from_pretrained( "facebook/xlm-roberta-xl", ) model = AutoModelForMaskedLM.from_pretrained( "facebook/xlm-roberta-xl", torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa", quantization_config=quantization_config ) inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model(**inputs) predictions = outputs.logits masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] predicted_token_id = predictions[0, masked_index].argmax(dim=-1) predicted_token = tokenizer.decode(predicted_token_id) print(f"The predicted token is: {predicted_token}") ``` ## Notes - Unlike some XLM models, XLM-RoBERTa-XL doesn't require `lang` tensors to understand which language is used. It automatically determines the language from the input ids. ## XLMRobertaXLConfig [[autodoc]] XLMRobertaXLConfig ## XLMRobertaXLModel [[autodoc]] XLMRobertaXLModel - forward ## XLMRobertaXLForCausalLM [[autodoc]] XLMRobertaXLForCausalLM - forward ## XLMRobertaXLForMaskedLM [[autodoc]] XLMRobertaXLForMaskedLM - forward ## XLMRobertaXLForSequenceClassification [[autodoc]] XLMRobertaXLForSequenceClassification - forward ## XLMRobertaXLForMultipleChoice [[autodoc]] XLMRobertaXLForMultipleChoice - forward ## XLMRobertaXLForTokenClassification [[autodoc]] XLMRobertaXLForTokenClassification - forward ## XLMRobertaXLForQuestionAnswering [[autodoc]] XLMRobertaXLForQuestionAnswering - forward