Stas Bekman
97e688bc22
[Trainer] memory tracker metrics ( #10225 )
...
* memory tracker metrics
* go back to eval for somewhat consistency
* handle no-gpu case
* deal with stackable eval calls
* restore callback order
* style
* simplify the API
* add test
* docs
* consistently use eval_ prefix
* improve docs
* Update src/transformers/trainer_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* rename method
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 09:27:32 -08:00
Zhang Cheng
df1b0fb54d
set tgt_lang of MBart Tokenizer for summarization ( #10205 )
2021-02-16 09:39:37 -05:00
Suraj Patil
1c8c2d9ab3
[WIP][examples/seq2seq] move old s2s scripts to legacy ( #10136 )
...
* move old s2s scripts to legacy
* add the tests back
* proper rename
* restore
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-15 10:48:02 -08:00
Stas Bekman
0b1f552a24
fix run_seq2seq.py; porting trainer tests to it ( #10162 )
...
* fix run_seq2seq.py; porting DeepSpeed tests to it
* unrefactor
* defensive programming
* defensive programming 2
* port the rest of the trainer tests
* style
* a cleaner scripts dir finder
* cleanup
2021-02-15 09:12:17 -08:00
Suraj Patil
f51188cbe7
[examples/run_s2s] remove task_specific_params and update rouge computation ( #10133 )
...
* fix rouge metrics and task specific params
* fix typo
* round metrics
* typo
* remove task_specific_params
2021-02-12 17:18:21 +05:30
Suraj Patil
63fddcf69c
[examples/s2s] add test set predictions ( #10085 )
...
* add do_predict, pass eval_beams durig eval
* update help
* apply suggestions from code review
2021-02-09 20:41:41 +05:30
Stas Bekman
781220acab
transition to new tests dir ( #10080 )
2021-02-08 12:41:52 -08:00
Stas Bekman
322037e842
[trainer] deepspeed bug fixes and tests ( #10039 )
...
* deepspeed bug fixes and tests
* manual wrap?
2021-02-08 09:44:02 -08:00
Olivier
ece6c51458
[s2s examples] Replace -100 token ids with the tokenizer pad_id for compute_metrics ( #10046 )
...
* replace -100 token ids with the tokenizer pad_id for compute_metrics
* fixed typo for label_ids
2021-02-08 10:08:16 -05:00
Stas Bekman
24db8cc329
Can't mix --fp16 and --device cpu ( #10041 )
2021-02-07 17:54:20 -08:00
Stas Bekman
769948fad2
json to jsonlines, and doc, and typo ( #10043 )
2021-02-07 17:51:34 -08:00
Stas Bekman
8ea412a86f
[examples] make run scripts executable ( #10037 )
...
* make executable
* make executable
* same for the template
* cleanup
2021-02-05 15:51:18 -08:00
Suraj Patil
1cd16512dc
[examples/seq2seq] support label smoothing ( #9844 )
...
* add prepare_decoder_input_ids_from_labels in s2s models
* support lbl smoothing and enc/emb freezing
* fix freezing
* use pad_token_id from config
* remove embed freezing and add warning
* prepare decoder_input_ids inside DataCollatorForSeq2Seq
2021-02-05 23:21:57 +05:30
Sylvain Gugger
115d97dd2f
Remove subclass for sortish sampler ( #9907 )
...
* Remove subclass for sortish sampler
* Use old Seq2SeqTrainer in script
* Styling
2021-02-01 08:06:32 -05:00
Stas Bekman
6bab83683b
fix logger format for non-main process ( #9911 )
2021-02-01 03:08:12 -05:00
Stas Bekman
6bf94bc0b6
correctly handle mt5 ( #9879 )
2021-01-29 08:11:22 -08:00
Sylvain Gugger
b4e559cfa1
Deprecate model_path in Trainer.train ( #9854 )
2021-01-28 08:32:46 -05:00
Sylvain Gugger
f2fabedbab
Setup logging with a stdout handler ( #9816 )
2021-01-27 03:39:11 -05:00
Magdalena Biesialska
8f6c12d306
Fix fine-tuning translation scripts ( #9809 )
2021-01-26 11:30:31 -05:00
Andrea Cappelli
10e5f28212
Improve pytorch examples for fp16 ( #9796 )
...
* Pad to 8x for fp16 multiple choice example (#9752 )
* Pad to 8x for fp16 squad trainer example (#9752 )
* Pad to 8x for fp16 ner example (#9752 )
* Pad to 8x for fp16 swag example (#9752 )
* Pad to 8x for fp16 qa beam search example (#9752 )
* Pad to 8x for fp16 qa example (#9752 )
* Pad to 8x for fp16 seq2seq example (#9752 )
* Pad to 8x for fp16 glue example (#9752 )
* Pad to 8x for fp16 new ner example (#9752 )
* update script template #9752
* Update examples/multiple-choice/run_swag.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa_beam_search.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve code quality #9752
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-26 04:47:07 -05:00
Sylvain Gugger
caf4abf768
Auto-resume training from checkpoint ( #9776 )
...
* Auto-resume training from checkpoint
* Update examples/text-classification/run_glue.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Roll out to other examples
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 12:03:51 -05:00
Sylvain Gugger
411c582109
Fixes to run_seq2seq and instructions ( #9734 )
...
* Fixes to run_seq2seq and instructions
* Add more defaults for summarization
2021-01-22 10:03:57 -05:00
Sylvain Gugger
5f80c15ef5
Fix memory regression in Seq2Seq example ( #9713 )
...
* Fix memory regression in Seq2Seq example
* Fix test and properly deal with -100
* Easier condition with device safety
* Patch for MBartTokenzierFast
2021-01-21 12:05:46 -05:00
Sylvain Gugger
e4c06ed664
New run_seq2seq script ( #9605 )
...
* New run_seq2seq script
* Add tests
* Mark as slow
* Update examples/seq2seq/run_seq2seq.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-01-19 15:22:17 -05:00
Sylvain Gugger
97b787fb4e
Fix old Seq2SeqTrainer ( #9675 )
2021-01-19 09:56:25 -05:00
Stas Bekman
c60e0e1ee4
deepspeed + grad acumm ( #9622 )
2021-01-15 10:12:26 -08:00
Sylvain Gugger
329fe2746a
Upstream (and rename) sortish sampler ( #9574 )
...
* Upstream (and rename) sortish sampler
* Use proper sampler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-14 10:38:14 -05:00
Stas Bekman
2df34f4aba
[trainer] deepspeed integration ( #9211 )
...
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 19:05:18 -08:00
Patrick von Platen
eef66035a2
[PyTorch Bart] Split Bart into different models ( #9343 )
...
* first try
* remove old template
* finish bart
* finish mbart
* delete unnecessary line
* init pegasus
* save intermediate
* correct pegasus
* finish pegasus
* remove cookie cutter leftover
* add marian
* finish blenderbot
* replace in file
* correctly split blenderbot
* delete "old" folder
* correct "add statement"
* adapt config for tf comp
* correct configs for tf
* remove ipdb
* fix more stuff
* fix mbart
* push pegasus fix
* fix mbart
* more fixes
* fix research projects code
* finish docs for bart, mbart, and marian
* delete unnecessary file
* correct attn typo
* correct configs
* remove pegasus for seq class
* correct peg docs
* correct peg docs
* finish configs
* further improve docs
* add copied from statements to mbart
* fix copied from in mbart
* add copy statements to marian
* add copied from to marian
* add pegasus copied from
* finish pegasus
* finish copied from
* Apply suggestions from code review
* make style
* backward comp blenderbot
* apply lysandres and sylvains suggestions
* apply suggestions
* push last fixes
* fix docs
* fix tok tests
* fix imports code style
* fix doc
2021-01-05 22:00:05 +01:00
Sylvain Gugger
a1cb6e9866
Adapt to new name of label_smoothing_factor
training arg ( #9282 )
2020-12-23 11:05:21 -05:00
Sylvain Gugger
e6c1f1cad8
Revert renaming in finetune_trainer ( #9262 )
2020-12-22 15:42:34 -05:00
Manuel Romero
37d6fb5d04
Fix link to bertabs/README.md ( #9255 )
2020-12-22 11:41:23 -05:00
Sylvain Gugger
490b39e614
Seq2seq trainer ( #9241 )
...
* Add label smoothing in Trainer
* Add options for scheduler and Adafactor in Trainer
* Put Seq2SeqTrainer in the main lib
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments and adapt scripts
* Documentation
* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-22 11:33:44 -05:00
Stas Bekman
f38c4ad302
better logging and help ( #9203 )
2020-12-20 10:28:28 -08:00
Sylvain Gugger
1198ba8fba
Add timing inside Trainer ( #9196 )
...
* Add timing inside Trainer
* Fix tests
* Add n_objs for train
* Sort logs
2020-12-18 15:10:39 -05:00
Stas Bekman
f06d0fadc9
[trainer] apex fixes and tests ( #9180 )
2020-12-17 16:49:11 -08:00
Stas Bekman
63841c559b
add tests for the new sharded ddp fairscale integration ( #9177 )
2020-12-17 14:24:03 -08:00
Sylvain Gugger
9a67185344
Experimental support for fairscale ShardedDDP ( #9139 )
...
* Experimental stupport for fairscale ShardedDDP
* Add import error if fairscale not available
* Address review comments
* Fix seq2seq trainer
2020-12-16 13:47:48 -05:00
Stas Bekman
14c79c3e31
native amp leak fix landed in 1.7.1 ( #9115 )
...
update README with good news that the leak fix has been applied to pytorch-1.7.1.
2020-12-15 09:10:41 -05:00
Stas Bekman
c19d04623e
[finetune_trainer] enhancements and fixes ( #9042 )
...
* trainer and finetune_trainer enhancements and fixes
* add fallback default
* move the fixing of incorrect keys back into finetune trainer
* s/eval/val/ to match the split
* trainer can now use a different prefix than eval_ for metrics
* document new arg
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* use 'eval' as the default for metric_key_prefix
* complete adjust var names + disambiguate
* fix logger
* add clarifying comment
* add clarifying comment
* style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/trainer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* complete removal of optional for metric_key_prefix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-14 17:45:33 -08:00
Sylvain Gugger
783d7d2629
Reorganize examples ( #9010 )
...
* Reorganize example folder
* Continue reorganization
* Change requirements for tests
* Final cleanup
* Finish regroup with tests all passing
* Copyright
* Requirements and readme
* Make a full link for the documentation
* Address review comments
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add symlink
* Reorg again
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Adapt title
* Update to new strucutre
* Remove test
* Update READMEs
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-12-11 10:07:02 -05:00
Stas Bekman
df311a5ccf
[seq2seq] document the caveat of leaky native amp ( #8930 )
...
* document the caveat of leaky native amp
* Update examples/seq2seq/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-04 15:43:35 -08:00
Stas Bekman
4c3d98dddc
[s2s finetune_trainer] add instructions for distributed training ( #8884 )
2020-12-03 16:05:55 -08:00
Stas Bekman
379005c9d2
start using training_args.parallel_mode ( #8882 )
2020-12-01 11:40:36 -08:00
Stas Bekman
7f34d75780
[s2s trainer] fix DP mode ( #8823 )
...
* fix DP case on multi-gpu
* make executable
* test all 3 modes
* use the correct check for distributed
* dp doesn't need a special case
* restore original name
* cleanup
2020-11-30 12:55:56 -08:00
Sylvain Gugger
5530299096
Remove deprecated evalutate_during_training
( #8852 )
...
* Remove deprecated `evalutate_during_training`
* Update src/transformers/training_args_tf.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-30 11:12:15 -05:00
Stas Bekman
ddf3c64654
potpurri of small fixes ( #8807 )
2020-11-26 14:06:27 -08:00
Patrick von Platen
8f07f5c44b
Revert "finetune.py: specifying generation min_length ( #8478 )" ( #8805 )
...
This reverts commit 5aa361f3e5
.
2020-11-26 20:12:01 +01:00
Daniel Khashabi
5aa361f3e5
finetune.py: specifying generation min_length ( #8478 )
2020-11-26 12:33:02 +05:30
Stas Bekman
1e45bef0a7
[trainer] make generate work with multigpu ( #8716 )
...
* make generate work with multigpu
* better fix - thanks @sgugger
2020-11-23 10:57:27 -08:00