Loubna Ben Allal
05a90579a8
CodeParrot data pretokenization ( #16932 )
...
* add pretokenization arguments
* add pretokenization script
* add support for pretokenized data
* reformat code
* fix run command for training
* fix model call from config
* remove a package
* add comments on pretokenization in the readme
* remove explicit parallelization
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* update readme
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* update readme -remove username
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* update readme -remove username
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* keep data parallelization
* reformat code
* reformat code
* update readme
* reformat code
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
2022-05-16 15:32:16 +02:00