mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 13:20:12 +06:00

Migrate doc files to Markdown. (#24376 )

* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

2023-06-20 18:07:47 -04:00

3.8 KiB

Raw Blame History

XLM-ProphetNet

DISCLAIMER: If you see something strange, file a Github Issue and assign @patrickvonplaten

Overview

The XLM-ProphetNet model was proposed in ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.

XLM-ProphetNet is an encoder-decoder model and can predict n-future tokens for "ngram" language modeling instead of just the next token. Its architecture is identical to ProhpetNet, but the model was trained on the multi-lingual "wiki100" Wikipedia dump.

The abstract from the paper is the following:

In this paper, we present a new sequence-to-sequence pretraining model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of the optimization of one-step ahead prediction in traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction which predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large scale dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.

The Authors' code can be found here.

Tips:

XLM-ProphetNet's model architecture and pretraining objective is same as ProphetNet, but XLM-ProphetNet was pre-trained on the cross-lingual dataset XGLUE.

Documentation resources

XLMProphetNetConfig

autodoc XLMProphetNetConfig

XLMProphetNetTokenizer

autodoc XLMProphetNetTokenizer

XLMProphetNetModel

autodoc XLMProphetNetModel

XLMProphetNetEncoder

autodoc XLMProphetNetEncoder

XLMProphetNetDecoder

autodoc XLMProphetNetDecoder

XLMProphetNetForConditionalGeneration

autodoc XLMProphetNetForConditionalGeneration

XLMProphetNetForCausalLM

autodoc XLMProphetNetForCausalLM

3.8 KiB Raw Blame History