transformers/examples/research_projects/codeparrot/scripts
Loubna Ben Allal 05a90579a8
CodeParrot data pretokenization (#16932)
* add pretokenization arguments

* add pretokenization script

* add support for pretokenized data

* reformat code

* fix run command for training

* fix model call from config

* remove a package

* add comments on pretokenization in the readme

* remove explicit parallelization

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme -remove username

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme -remove username

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* keep data parallelization

* reformat code

* reformat code

* update readme

* reformat code

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
2022-05-16 15:32:16 +02:00
..
arguments.py CodeParrot data pretokenization (#16932) 2022-05-16 15:32:16 +02:00
bpe_training.py fix: switch from slow to generic tokenizer class (#15122) 2022-01-12 09:12:43 -05:00
codeparrot_training.py CodeParrot data pretokenization (#16932) 2022-05-16 15:32:16 +02:00
human_eval.py Black preview (#17217) 2022-05-12 16:25:55 -04:00
initialize_model.py CodeParrot data pretokenization (#16932) 2022-05-16 15:32:16 +02:00
preprocessing.py Update codeparrot data preprocessing (#16944) 2022-05-16 14:43:25 +02:00
pretokenizing.py CodeParrot data pretokenization (#16932) 2022-05-16 15:32:16 +02:00
validation_loss.py Add CodeParrot 🦜 codebase (#14536) 2021-12-02 10:41:35 +01:00