From b61c47f5a555219bc6e29789263ea39d754109f4 Mon Sep 17 00:00:00 2001 From: Aashish Anand <84689683+AshAnand34@users.noreply.github.com> Date: Mon, 9 Jun 2025 13:00:38 -0700 Subject: [PATCH] Created model card for xlm-roberta-xl (#38597) * Created model card for xlm-roberta-xl * Update XLM-RoBERTa-XL model card with improved descriptions and usage examples * Minor option labeling fix * Added MaskedLM version of XLM RoBERTa XL to model card * Added quantization example for XLM RoBERTa XL model card * minor fixes to xlm roberta xl model card * Minor fixes to mask format in xlm roberta xl model card --- docs/source/en/model_doc/xlm-roberta-xl.md | 118 +++++++++++++++++---- 1 file changed, 97 insertions(+), 21 deletions(-) diff --git a/docs/source/en/model_doc/xlm-roberta-xl.md b/docs/source/en/model_doc/xlm-roberta-xl.md index 355869ad6e0..56306bcb4a6 100644 --- a/docs/source/en/model_doc/xlm-roberta-xl.md +++ b/docs/source/en/model_doc/xlm-roberta-xl.md @@ -14,37 +14,113 @@ rendered properly in your Markdown viewer. --> -# XLM-RoBERTa-XL - -
-PyTorch -SDPA +
+
+ PyTorch + SDPA +
-## Overview +# XLM-RoBERTa-XL -The XLM-RoBERTa-XL model was proposed in [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau. +[XLM-RoBERTa-XL](https://huggingface.co/papers/2105.00572) is a 3.5B parameter multilingual masked language model pretrained on 100 languages. It shows that by scaling model capacity, multilingual models demonstrates strong performance on high-resource languages and can even zero-shot low-resource languages. -The abstract from the paper is the following: +You can find all the original XLM-RoBERTa-XL checkpoints under the [AI at Meta](https://huggingface.co/facebook?search_models=xlm) organization. -*Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.* +> [!TIP] +> Click on the XLM-RoBERTa-XL models in the right sidebar for more examples of how to apply XLM-RoBERTa-XL to different cross-lingual tasks like classification, translation, and question answering. -This model was contributed by [Soonhwan-Kwon](https://github.com/Soonhwan-Kwon) and [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr). +The example below demonstrates how to predict the `` token with [`Pipeline`], [`AutoModel`], and from the command line. -## Usage tips + + -XLM-RoBERTa-XL is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does -not require `lang` tensors to understand which language is used, and should be able to determine the correct -language from the input ids. +```python +import torch +from transformers import pipeline -## Resources +pipeline = pipeline( + task="fill-mask", + model="facebook/xlm-roberta-xl", + torch_dtype=torch.float16, + device=0 +) +pipeline("Bonjour, je suis un modèle .") +``` -- [Text classification task guide](../tasks/sequence_classification) -- [Token classification task guide](../tasks/token_classification) -- [Question answering task guide](../tasks/question_answering) -- [Causal language modeling task guide](../tasks/language_modeling) -- [Masked language modeling task guide](../tasks/masked_language_modeling) -- [Multiple choice task guide](../tasks/multiple_choice) + + + +```python +import torch +from transformers import AutoModelForMaskedLM, AutoTokenizer + +tokenizer = AutoTokenizer.from_pretrained( + "facebook/xlm-roberta-xl", +) +model = AutoModelForMaskedLM.from_pretrained( + "facebook/xlm-roberta-xl", + torch_dtype=torch.float16, + device_map="auto", + attn_implementation="sdpa" +) +inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda") + +with torch.no_grad(): + outputs = model(**inputs) + predictions = outputs.logits + +masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] +predicted_token_id = predictions[0, masked_index].argmax(dim=-1) +predicted_token = tokenizer.decode(predicted_token_id) + +print(f"The predicted token is: {predicted_token}") +``` + + + + +```bash +echo -e "Plants create through a process known as photosynthesis." | transformers-cli run --task fill-mask --model facebook/xlm-roberta-xl --device 0 +``` + + + +Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends. + +The example below uses [torchao](../quantization/torchao) to only quantize the weights to int4. + +```py +import torch +from transformers import AutoModelForMaskedLM, AutoTokenizer, TorchAoConfig + +quantization_config = TorchAoConfig("int4_weight_only", group_size=128) +tokenizer = AutoTokenizer.from_pretrained( + "facebook/xlm-roberta-xl", +) +model = AutoModelForMaskedLM.from_pretrained( + "facebook/xlm-roberta-xl", + torch_dtype=torch.float16, + device_map="auto", + attn_implementation="sdpa", + quantization_config=quantization_config +) +inputs = tokenizer("Bonjour, je suis un modèle .", return_tensors="pt").to("cuda") + +with torch.no_grad(): + outputs = model(**inputs) + predictions = outputs.logits + +masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1] +predicted_token_id = predictions[0, masked_index].argmax(dim=-1) +predicted_token = tokenizer.decode(predicted_token_id) + +print(f"The predicted token is: {predicted_token}") +``` + +## Notes + +- Unlike some XLM models, XLM-RoBERTa-XL doesn't require `lang` tensors to understand which language is used. It automatically determines the language from the input ids. ## XLMRobertaXLConfig