transformers/examples/research_projects/codeparrot/scripts
Loubna Ben Allal 1d71ad8905
Update CodeParrot readme to include training in Megatron (#17798)
* add info about megatron training

* upload models and datasets from CodeParrot organization

* upload models and datasets from CodeParrot organization

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* fix typo and add comment about codeparrot vs megatron

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2022-07-27 11:59:08 +02:00
..
tests [CodeParrot] Near-deduplication with jaccard similarity (#17054) 2022-06-21 14:23:36 +02:00
arguments.py Update CodeParrot readme to include training in Megatron (#17798) 2022-07-27 11:59:08 +02:00
bpe_training.py fix: switch from slow to generic tokenizer class (#15122) 2022-01-12 09:12:43 -05:00
codeparrot_training.py Fix CodeParrot training script (#17291) 2022-05-23 12:55:35 +02:00
human_eval.py Black preview (#17217) 2022-05-12 16:25:55 -04:00
initialize_model.py Fix CodeParrot training script (#17291) 2022-05-23 12:55:35 +02:00
minhash_deduplication.py [CodeParrot] Near-deduplication with jaccard similarity (#17054) 2022-06-21 14:23:36 +02:00
preprocessing.py [CodeParrot] Near-deduplication with jaccard similarity (#17054) 2022-06-21 14:23:36 +02:00
pretokenizing.py CodeParrot data pretokenization (#16932) 2022-05-16 15:32:16 +02:00
validation_loss.py Add CodeParrot 🦜 codebase (#14536) 2021-12-02 10:41:35 +01:00