Loubna Ben Allal
|
e730e12567
|
Update codeparrot data preprocessing (#16944)
* add new preprocessing arguments
* add new filters
* add new filters to readme
* fix config and test count, update function names and docstrings
* reformat code
* update readme
* Update readme
* rename config_test filter
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* rename few_assignments filter
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* rename tokenizer in arguments
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* rename functions and add limit_line argument for config_test filter
* update threshold for config_test filter
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
|
2022-05-16 14:43:25 +02:00 |
|