Loubna Ben Allal
|
05a90579a8
|
CodeParrot data pretokenization (#16932)
* add pretokenization arguments
* add pretokenization script
* add support for pretokenized data
* reformat code
* fix run command for training
* fix model call from config
* remove a package
* add comments on pretokenization in the readme
* remove explicit parallelization
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* update readme
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* update readme -remove username
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* update readme -remove username
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* keep data parallelization
* reformat code
* reformat code
* update readme
* reformat code
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
|
2022-05-16 15:32:16 +02:00 |
|