samuelbroscheit
94247ad6cb
Make num_train_optimization_steps int
2019-05-13 12:38:22 +02:00
samuel.broscheit
49a77ac16f
Clean up a little bit
2019-05-12 00:31:10 +02:00
samuel.broscheit
3bf3f9596f
Fixing the issues reported in https://github.com/huggingface/pytorch-pretrained-BERT/issues/556
...
Reason for issue was that optimzation steps where computed from example size, which is different from actual size of dataloader when an example is chunked into multiple instances.
Solution in this pull request is to compute num_optimization_steps directly from len(data_loader).
2019-05-12 00:13:45 +02:00
Thomas Wolf
3fc63f126d
Merge pull request #598 from burcturkoglu/master
...
Updating learning rate with special warm up in examples
2019-05-10 13:48:12 +02:00
burcturkoglu
00c7fd2b79
Division to num_train_optimizer of global_step in lr_this_step is removed.
2019-05-09 10:57:03 +03:00
burcturkoglu
fa37b4da77
Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT
2019-05-09 10:55:24 +03:00
burcturkoglu
5289b4b9e0
Division to num_train_optimizer of global_step in lr_this_step is removed.
2019-05-09 10:51:38 +03:00
thomwolf
275179a003
output attentions in GPT-2
2019-05-08 22:24:42 +02:00
thomwolf
366a3b0285
clean up in tokenization
2019-05-08 21:43:51 +02:00
Thomas Wolf
701bd59b8b
Merge pull request #585 from huntzhan/master
...
Make the epsilon of LayerNorm configurable.
2019-05-08 16:56:38 +02:00
Thomas Wolf
303b5e2b92
Merge pull request #545 from ailzhang/cache_dir
...
move pytroch_pretrained_bert cache folder under same path as torch
2019-05-08 16:55:27 +02:00
Thomas Wolf
0198399d84
Merge pull request #570 from MottoX/fix-1
...
Create optimizer only when args.do_train is True
2019-05-08 16:07:50 +02:00
Thomas Wolf
50fa92c026
Merge pull request #571 from MottoX/patch-1
...
Fix documentation typo
2019-05-08 16:06:13 +02:00
thomwolf
0efc4ab632
adding dropout to GPT-2 and embedding dropout to GPT
2019-05-08 10:41:35 +02:00
thomwolf
ea9dbea9d5
update GPT2 loss computation for more flexbility
2019-05-07 23:27:18 +02:00
thomwolf
ce86336545
add predict_special_tokens option to GPT also
2019-05-07 16:47:22 +02:00
thomwolf
d1b6979aa5
GPT-2 option to avoid predicting special tokens
2019-05-07 16:25:53 +02:00
huntzhan
101ab4dd8e
Make the epsilon of LayerNorm configurable.
2019-05-06 00:26:21 +08:00
Chris
41089bc7d3
added file to convert pytorch->tf
2019-05-02 13:26:22 -04:00
Chris
0a8b4d65be
added file to convert pytorch->tf
2019-05-02 13:20:59 -04:00
Chris
968c1b44cb
added file to convert pytorch->tf
2019-05-02 13:19:56 -04:00
Chris
96c2b77f0f
added file to convert pytorch->tf
2019-05-02 13:14:25 -04:00
thomwolf
e211785ada
extract attention weights from GPT
2019-05-02 18:31:26 +02:00
MottoX
18c8aef9d3
Fix documentation typo
2019-05-02 19:23:36 +08:00
MottoX
74dbba64bc
Prepare optimizer only when args.do_train is True
2019-05-02 19:09:29 +08:00
thomwolf
db98a4a48b
gpt-2 tokenizer
2019-05-01 11:40:48 +02:00
Thomas Wolf
3ae8c8be1e
Merge pull request #562 from apappu97/roc_stories_lmlabels_fix
...
Small fix to remove shifting of lm labels during pre process of RocStories.
2019-05-01 11:20:17 +02:00
Thomas Wolf
e89520175d
Merge pull request #564 from 8enmann/patch-2
...
Fix #537
2019-05-01 11:18:46 +02:00
Ben Mann
74f7906db4
Fix #537
2019-04-30 19:48:22 -07:00
Aneesh Pappu
365fb34c6c
small fix to remove shifting of lm labels during pre process of roc stories, as this shifting happens interanlly in the model
2019-04-30 13:53:04 -07:00
thomwolf
cd110835a0
coverage in circle-ci
2019-04-30 11:35:40 +02:00
Thomas Wolf
2dee86319d
Merge pull request #527 from Mathieu-Prouveur/fix_value_training_loss
...
Update example files so that tr_loss is not affected by args.gradient…
2019-04-30 11:12:55 +02:00
thomwolf
80f53f7380
gpt-2 from_pretrained can use special tokens
2019-04-30 11:10:22 +02:00
thomwolf
e79ceb1533
gpt-2 special tokens
2019-04-30 11:05:54 +02:00
thomwolf
1f5fc95b68
add code coverage
2019-04-30 11:05:26 +02:00
thomwolf
c30139a013
add special tokens to gpt-2
2019-04-30 10:45:26 +02:00
Mathieu Prouveur
87b9ec3843
Fix tr_loss rescaling factor using global_step
2019-04-29 12:58:29 +02:00
Ailing Zhang
3963d57c89
move pytroch_pretrained_bert cache folder under same path as torch
2019-04-27 11:09:11 -07:00
thomwolf
b832d5bb8a
Release: 0.6.2
2019-04-25 21:37:47 +02:00
Thomas Wolf
e6cf62d499
Merge pull request #488 from dhpollack/fix_multichoice
...
fixed BertForMultipleChoice model init and forward pass
2019-04-25 21:04:16 +02:00
Thomas Wolf
1cc1c3c344
Merge pull request #533 from lukovnikov/master
...
Docs for new learning rate code
2019-04-25 21:02:35 +02:00
Thomas Wolf
dee8af4e46
Merge pull request #518 from huggingface/schedules_in_examples
...
Fix training schedules in examples to match new API
2019-04-25 21:01:04 +02:00
lukovnikov
56a47ce2b7
- replaced OpenAIGPTAdam with OpenAIAdam in docs
2019-04-25 16:05:28 +02:00
lukovnikov
331a46ff04
- replaced OpenAIGPTAdam with OpenAIAdam in docs
2019-04-25 16:04:37 +02:00
lukovnikov
704037ad51
- updated docs for new LR API
...
- added some images for illustration
- updated comments in optimization
2019-04-25 15:59:39 +02:00
Thomas Wolf
d76a57b0ba
Merge pull request #506 from ailzhang/hubconf
...
Hubconf
2019-04-24 20:59:21 +02:00
thomwolf
80f995a141
revert BertForMultipleChoice linear classifier
2019-04-24 16:51:54 +02:00
Mathieu Prouveur
ed8fad7390
Update example files so that tr_loss is not affected by args.gradient_accumulation_step
2019-04-24 14:07:00 +02:00
thomwolf
d94c6b0144
fix training schedules in examples to match new API
2019-04-23 11:17:06 +02:00
Thomas Wolf
c36cca075a
Merge pull request #515 from Rocketknight1/master
...
Fix --reduce_memory in finetune_on_pregenerated
2019-04-23 10:30:23 +02:00