mirror of
https://github.com/huggingface/transformers.git
synced 2025-08-02 03:01:07 +06:00
[s2s] dont document packing because it hurts performance (#6077)
This commit is contained in:
parent
9d0d3a6645
commit
1e00ef681d
@ -27,17 +27,7 @@ this should make a directory called `cnn_dm/` with files like `test.source`.
|
|||||||
```
|
```
|
||||||
|
|
||||||
WMT16 English-Romanian Translation Data:
|
WMT16 English-Romanian Translation Data:
|
||||||
|
download with this command:
|
||||||
This dataset comes in two formats. The "packed" version merges short training examples into examples of <200 tokens to increase GPU utilization (and also improves validation performance).
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd examples/seq2seq
|
|
||||||
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro_packed_train_200.tgz
|
|
||||||
tar -xzvf wmt_en_ro_packed_200.tgz
|
|
||||||
export ENRO_DIR=wmt_en_ro_packed_train_200
|
|
||||||
```
|
|
||||||
|
|
||||||
The original data can also be downloaded with this command:
|
|
||||||
```bash
|
```bash
|
||||||
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
|
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
|
||||||
tar -xzvf wmt_en_ro.tar.gz
|
tar -xzvf wmt_en_ro.tar.gz
|
||||||
|
Loading…
Reference in New Issue
Block a user