transformers/examples/summarization/t5
Patrick von Platen ae6834e028
[Examples] Clean summarization and translation example testing files for T5 and Bart (#3514)
* fix conflicts

* add model size argument to summarization

* correct wrong import

* fix isort

* correct imports

* other isort make style

* make style
2020-03-31 17:54:13 +02:00
..
__init__.py Add t5 summarization example (#3411) 2020-03-26 18:17:55 +01:00
download_cnn_daily_mail.py Add t5 summarization example (#3411) 2020-03-26 18:17:55 +01:00
evaluate_cnn.py [Examples] Clean summarization and translation example testing files for T5 and Bart (#3514) 2020-03-31 17:54:13 +02:00
README.md Rename t5-large to t5-base in README.md 2020-03-27 15:57:58 +01:00
test_t5_examples.py [Examples] Clean summarization and translation example testing files for T5 and Bart (#3514) 2020-03-31 17:54:13 +02:00

This script evaluates the the multitask pre-trained checkpoint for t5-base (see paper here) on the CNN/Daily Mail test dataset. Please note that the results in the paper were attained using a model fine-tuned on summarization, so that results will be worse here by approx. 0.5 ROUGE points

Get the CNN Data

First, you need to download the CNN data. It's about ~400 MB and can be downloaded by running

python download_cnn_daily_mail.py cnn_articles_input_data.txt cnn_articles_reference_summaries.txt

You should confirm that each file has 11490 lines:

wc -l cnn_articles_input_data.txt # should print 11490
wc -l cnn_articles_reference_summaries.txt # should print 11490

Usage

To create summaries for each article in dataset, run:

python evaluate_cnn.py cnn_articles_input_data.txt cnn_generated_articles_summaries.txt cnn_articles_reference_summaries.txt rouge_score.txt

The default batch size, 8, fits in 16GB GPU memory, but may need to be adjusted to fit your system. The rouge scores "rouge1, rouge2, rougeL" are automatically created and saved in rouge_score.txt.