transformers/examples/research_projects/codeparrot/scripts
Loubna Ben Allal 286a18fa00
Fix codeparrot deduplication - ignore whitespaces (#18023)
* ignore whitspaces for hash

* reformat code

* Update README.md
2022-07-28 15:58:26 +02:00
..
tests [CodeParrot] Near-deduplication with jaccard similarity (#17054) 2022-06-21 14:23:36 +02:00
arguments.py Update CodeParrot readme to include training in Megatron (#17798) 2022-07-27 11:59:08 +02:00
bpe_training.py fix: switch from slow to generic tokenizer class (#15122) 2022-01-12 09:12:43 -05:00
codeparrot_training.py Fix CodeParrot training script (#17291) 2022-05-23 12:55:35 +02:00
human_eval.py Black preview (#17217) 2022-05-12 16:25:55 -04:00
initialize_model.py Fix CodeParrot training script (#17291) 2022-05-23 12:55:35 +02:00
minhash_deduplication.py [CodeParrot] Near-deduplication with jaccard similarity (#17054) 2022-06-21 14:23:36 +02:00
preprocessing.py Fix codeparrot deduplication - ignore whitespaces (#18023) 2022-07-28 15:58:26 +02:00
pretokenizing.py CodeParrot data pretokenization (#16932) 2022-05-16 15:32:16 +02:00
validation_loss.py Add CodeParrot 🦜 codebase (#14536) 2021-12-02 10:41:35 +01:00