transformers/examples/research_projects/codeparrot/scripts
Loubna Ben Allal b48ac1a094
Fix CodeParrot training script (#17291)
* average loss over batches and accumulated steps for tracking

* fix layernorm weight decay

* use AdamW from Pytorch instead of Transformers

* add shuffling of sequences inside the batches

* add shuffling of sequences inside the batches

* add logging dir and reformat code

* fix lr tracking

* remove Mistral scaling

* keep Mistral scaling

* reformat code

* fix error

* fix error

* use shuffling function from Pytorch

* remove argument for shuffling batch sequences as it isn't optional

* update package versions and install accelerate from source

* remove unused package

* Update loss average over accumulated steps

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update loss average over accumulated steps

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* use one shuffle buffer argument

* compute avg_loss in one line

Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2022-05-23 12:55:35 +02:00
..
arguments.py Fix CodeParrot training script (#17291) 2022-05-23 12:55:35 +02:00
bpe_training.py fix: switch from slow to generic tokenizer class (#15122) 2022-01-12 09:12:43 -05:00
codeparrot_training.py Fix CodeParrot training script (#17291) 2022-05-23 12:55:35 +02:00
human_eval.py Black preview (#17217) 2022-05-12 16:25:55 -04:00
initialize_model.py Fix CodeParrot training script (#17291) 2022-05-23 12:55:35 +02:00
preprocessing.py Update codeparrot data preprocessing (#16944) 2022-05-16 14:43:25 +02:00
pretokenizing.py CodeParrot data pretokenization (#16932) 2022-05-16 15:32:16 +02:00
validation_loss.py Add CodeParrot 🦜 codebase (#14536) 2021-12-02 10:41:35 +01:00