transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 05:40:05 +06:00

History

Loubna Ben Allal b48ac1a094 Fix CodeParrot training script (#17291 ) * average loss over batches and accumulated steps for tracking * fix layernorm weight decay * use AdamW from Pytorch instead of Transformers * add shuffling of sequences inside the batches * add shuffling of sequences inside the batches * add logging dir and reformat code * fix lr tracking * remove Mistral scaling * keep Mistral scaling * reformat code * fix error * fix error * use shuffling function from Pytorch * remove argument for shuffling batch sequences as it isn't optional * update package versions and install accelerate from source * remove unused package * Update loss average over accumulated steps Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Update loss average over accumulated steps Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * use one shuffle buffer argument * compute avg_loss in one line Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>		2022-05-23 12:55:35 +02:00
..
arguments.py	Fix CodeParrot training script (#17291 )	2022-05-23 12:55:35 +02:00
bpe_training.py	fix: switch from slow to generic tokenizer class (#15122 )	2022-01-12 09:12:43 -05:00
codeparrot_training.py	Fix CodeParrot training script (#17291 )	2022-05-23 12:55:35 +02:00
human_eval.py	Black preview (#17217 )	2022-05-12 16:25:55 -04:00
initialize_model.py	Fix CodeParrot training script (#17291 )	2022-05-23 12:55:35 +02:00
preprocessing.py	Update codeparrot data preprocessing (#16944 )	2022-05-16 14:43:25 +02:00
pretokenizing.py	CodeParrot data pretokenization (#16932 )	2022-05-16 15:32:16 +02:00
validation_loss.py	Add CodeParrot 🦜 codebase (#14536 )	2021-12-02 10:41:35 +01:00