# XLM-RoBERTa-XL
PyTorch SDPA
## Overview The XLM-RoBERTa-XL model was proposed in [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. The abstract from the paper is the following: *Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.* This model was contributed by [Soonhwan-Kwon](https://github.com/Soonhwan-Kwon) and [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr). ## Usage tips XLM-RoBERTa-XL is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does not require `lang` tensors to understand which language is used, and should be able to determine the correct language from the input ids. ## Resources - [Text classification task guide](../tasks/sequence_classification) - [Token classification task guide](../tasks/token_classification) - [Question answering task guide](../tasks/question_answering) - [Causal language modeling task guide](../tasks/language_modeling) - [Masked language modeling task guide](../tasks/masked_language_modeling) - [Multiple choice task guide](../tasks/multiple_choice) ## XLMRobertaXLConfig [[autodoc]] XLMRobertaXLConfig ## XLMRobertaXLModel [[autodoc]] XLMRobertaXLModel - forward ## XLMRobertaXLForCausalLM [[autodoc]] XLMRobertaXLForCausalLM - forward ## XLMRobertaXLForMaskedLM [[autodoc]] XLMRobertaXLForMaskedLM - forward ## XLMRobertaXLForSequenceClassification [[autodoc]] XLMRobertaXLForSequenceClassification - forward ## XLMRobertaXLForMultipleChoice [[autodoc]] XLMRobertaXLForMultipleChoice - forward ## XLMRobertaXLForTokenClassification [[autodoc]] XLMRobertaXLForTokenClassification - forward ## XLMRobertaXLForQuestionAnswering [[autodoc]] XLMRobertaXLForQuestionAnswering - forward