[docs]: update roformer.md model card (#37946)
Some checks are pending
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run
Build documentation / build (push) Waiting to run
Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions
Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run
Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions
Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions
Secret Leaks / trufflehog (push) Waiting to run
Update Transformers metadata / build_and_package (push) Waiting to run

* Update roformer model card

* fix example purpose description

* fix model description according to the comments

* revert changes for autodoc

* remove unneeded tags

* fix review issues

* fix hfoption

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
Kseniya Parkhamchuk 2025-05-23 18:27:56 -05:00 committed by GitHub
parent 36f97ae15b
commit 31f8a0fe8a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -14,46 +14,78 @@ rendered properly in your Markdown viewer.
--> -->
# RoFormer <div style="float: right;">
<div class="flex flex-wrap space-x-1">
<div class="flex flex-wrap space-x-1"> <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white"> <img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo= <img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
"> ">
</div>
</div> </div>
## Overview # RoFormer
The RoFormer model was proposed in [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. [RoFormer](https://huggingface.co/papers/2104.09864) introduces Rotary Position Embedding (RoPE) to encode token positions by rotating the inputs in 2D space. This allows a model to track absolute positions and model relative relationships. RoPE can scale to longer sequences, account for the natural decay of token dependencies, and works with the more efficient linear self-attention.
The abstract from the paper is the following: You can find all the RoFormer checkpoints on the [Hub](https://huggingface.co/models?search=roformer).
*Position encoding in transformer architecture provides supervision for dependency modeling between elements at > [!TIP]
different positions in the sequence. We investigate various methods to encode positional information in > Click on the RoFormer models in the right sidebar for more examples of how to apply RoFormer to different language tasks.
transformer-based language models and propose a novel implementation named Rotary Position Embedding(RoPE). The
proposed RoPE encodes absolute positional information with rotation matrix and naturally incorporates explicit relative
position dependency in self-attention formulation. Notably, RoPE comes with valuable properties such as flexibility of
being expand to any sequence lengths, decaying inter-token dependency with increasing relative distances, and
capability of equipping the linear self-attention with relative position encoding. As a result, the enhanced
transformer with rotary position embedding, or RoFormer, achieves superior performance in tasks with long texts. We
release the theoretical analysis along with some preliminary experiment results on Chinese data. The undergoing
experiment for English benchmark will soon be updated.*
This model was contributed by [junnyu](https://huggingface.co/junnyu). The original code can be found [here](https://github.com/ZhuiyiTechnology/roformer). The example below demonstrates how to predict the `[MASK]` token with [`Pipeline`], [`AutoModel`], and from the command line.
## Usage tips <hfoptions id="usage">
RoFormer is a BERT-like autoencoding model with rotary position embeddings. Rotary position embeddings have shown <hfoption id="Pipeline">
improved performance on classification tasks with long texts.
## Resources ```py
# uncomment to install rjieba which is needed for the tokenizer
# !pip install rjieba
import torch
from transformers import pipeline
- [Text classification task guide](../tasks/sequence_classification) pipe = pipeline(
- [Token classification task guide](../tasks/token_classification) task="fill-mask",
- [Question answering task guide](../tasks/question_answering) model="junnyu/roformer_chinese_base",
- [Causal language modeling task guide](../tasks/language_modeling) torch_dtype=torch.float16,
- [Masked language modeling task guide](../tasks/masked_language_modeling) device=0
- [Multiple choice task guide](../tasks/multiple_choice) )
output = pipe("水在零度时会[MASK]")
print(output)
```
</hfoption>
<hfoption id="AutoModel">
```py
# uncomment to install rjieba which is needed for the tokenizer
# !pip install rjieba
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
model = AutoModelForMaskedLM.from_pretrained(
"junnyu/roformer_chinese_base", torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained("junnyu/roformer_chinese_base")
input_ids = tokenizer("水在零度时会[MASK]", return_tensors="pt").to(model.device)
outputs = model(**input_ids)
decoded = tokenizer.batch_decode(outputs.logits.argmax(-1), skip_special_tokens=True)
print(decoded)
```
</hfoption>
<hfoption id="transformers CLI">
```bash
echo -e "水在零度时会[MASK]" | transformers-cli run --task fill-mask --model junnyu/roformer_chinese_base --device 0
```
</hfoption>
</hfoptions>
## Notes
- The current RoFormer implementation is an encoder-only model. The original code can be found in the [ZhuiyiTechnology/roformer](https://github.com/ZhuiyiTechnology/roformer) repository.
## RoFormerConfig ## RoFormerConfig