Commit Graph

836 Commits

Author SHA1 Message Date
Thomas Wolf
277c77f1c5
Merge pull request #630 from tguens/master
Update run_squad.py
2019-06-14 16:56:26 +02:00
Thomas Wolf
659af2cbd0
Merge pull request #604 from samuelbroscheit/master
Fixing issue "Training beyond specified 't_total' steps with schedule 'warmup_linear'" reported in #556
2019-06-14 16:49:24 +02:00
Thomas Wolf
2d6a53490d
Merge pull request #597 from huggingface/attention
GPT-2 (medium size model, special_tokens, fine-tuning, attention) + repo code coverage metric
2019-06-14 16:47:32 +02:00
Thomas Wolf
35e6baab37
Merge branch 'master' into attention 2019-06-14 16:41:56 +02:00
thomwolf
5e1207b8ad add attention to all bert models and add test 2019-06-14 16:28:25 +02:00
thomwolf
bcc9e93e6f fix test 2019-06-14 15:38:20 +02:00
Thomas Wolf
f9cde97b31
Merge pull request #675 from meetshah1995/patch-1
[hotfix] Fix frozen pooler parameters in SWAG example.
2019-06-12 10:01:21 +02:00
Meet Pragnesh Shah
e02ce4dc79
[hotfix] Fix frozen pooler parameters in SWAG example. 2019-06-11 15:13:53 -07:00
Thomas Wolf
784c0ed89a
Merge pull request #668 from jeonsworld/patch-2
apply Whole Word Masking technique
2019-06-11 11:29:10 +02:00
jeonsworld
a3a604cefb
Update pregenerate_training_data.py
apply Whole Word Masking technique.
referred to [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py)
2019-06-10 12:17:23 +09:00
VictorSanh
ee0308f79d fix typo 2019-06-06 17:30:49 +02:00
VictorSanh
2d07f945ad fix error with torch.no_grad and loss computation 2019-06-06 17:10:24 +02:00
VictorSanh
6b8d227092 some cleaning 2019-06-06 17:07:03 +02:00
VictorSanh
122d5c52ac distinguish was is not trained 2019-06-06 17:02:51 +02:00
VictorSanh
2647ac3294 forgot bertForPreTraining 2019-06-06 16:57:40 +02:00
VictorSanh
cf44d98392 Add more examples to BERT models for torchhub 2019-06-06 16:36:02 +02:00
thomwolf
a3274ac40b adding attention outputs in bert 2019-06-03 16:11:45 -05:00
VictorSanh
826496580b Revert "add output_attentions for BertModel"
This reverts commit de5e5682a1.
2019-06-03 17:10:25 -04:00
VictorSanh
de5e5682a1 add output_attentions for BertModel 2019-06-03 17:05:24 -04:00
Thomas Wolf
2a329c6186
Merge pull request #651 from huggingface/gpt_torchhub
Add GPT* compatibility to torchhub
2019-05-31 14:44:52 +02:00
VictorSanh
45d21502f0 update doc 2019-05-31 01:04:16 -04:00
VictorSanh
98f5c7864f decorelate dependencies + fix bug 2019-05-31 01:00:29 -04:00
VictorSanh
c8bd026ef6 move dependecies list to hubconf 2019-05-31 00:36:58 -04:00
VictorSanh
19ef2b0a66 Fix typo in hubconf 2019-05-31 00:33:33 -04:00
VictorSanh
d0f591051c gpt_hubconf 2019-05-31 00:28:10 -04:00
VictorSanh
4a210c9fc6 Move bert_hubconf to hubconfs 2019-05-31 00:28:00 -04:00
VictorSanh
0c5a4fe9c9 modify from_pretrained for OpenAIGPT 2019-05-31 00:27:18 -04:00
VictorSanh
372a5c1cee Hubconf doc - Specia case loading 2019-05-30 16:06:21 -04:00
Victor SANH
96592b544b
default in __init__s for classification BERT models (#650) 2019-05-30 15:53:13 -04:00
VictorSanh
4cda86b08f Update hubconf for torchhub: paths+examples+doc 2019-05-30 18:38:00 +00:00
tguens
9e7bc51b95
Update run_squad.py
Indentation change so that the output "nbest_predictions.json" is not empty.
2019-05-22 17:27:59 +08:00
samuelbroscheit
94247ad6cb Make num_train_optimization_steps int 2019-05-13 12:38:22 +02:00
samuel.broscheit
49a77ac16f Clean up a little bit 2019-05-12 00:31:10 +02:00
samuel.broscheit
3bf3f9596f Fixing the issues reported in https://github.com/huggingface/pytorch-pretrained-BERT/issues/556
Reason for issue was that optimzation steps where computed from example size, which is different from actual size of dataloader when an example is chunked into multiple instances.

Solution in this pull request is to compute num_optimization_steps directly from len(data_loader).
2019-05-12 00:13:45 +02:00
Thomas Wolf
3fc63f126d
Merge pull request #598 from burcturkoglu/master
Updating learning rate with special warm up in examples
2019-05-10 13:48:12 +02:00
burcturkoglu
00c7fd2b79 Division to num_train_optimizer of global_step in lr_this_step is removed. 2019-05-09 10:57:03 +03:00
burcturkoglu
fa37b4da77 Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT 2019-05-09 10:55:24 +03:00
burcturkoglu
5289b4b9e0 Division to num_train_optimizer of global_step in lr_this_step is removed. 2019-05-09 10:51:38 +03:00
thomwolf
275179a003 output attentions in GPT-2 2019-05-08 22:24:42 +02:00
thomwolf
366a3b0285 clean up in tokenization 2019-05-08 21:43:51 +02:00
Thomas Wolf
701bd59b8b
Merge pull request #585 from huntzhan/master
Make the epsilon of LayerNorm configurable.
2019-05-08 16:56:38 +02:00
Thomas Wolf
303b5e2b92
Merge pull request #545 from ailzhang/cache_dir
move pytroch_pretrained_bert cache folder under same path as torch
2019-05-08 16:55:27 +02:00
Thomas Wolf
0198399d84
Merge pull request #570 from MottoX/fix-1
Create optimizer only when args.do_train is True
2019-05-08 16:07:50 +02:00
Thomas Wolf
50fa92c026
Merge pull request #571 from MottoX/patch-1
Fix documentation typo
2019-05-08 16:06:13 +02:00
thomwolf
0efc4ab632 adding dropout to GPT-2 and embedding dropout to GPT 2019-05-08 10:41:35 +02:00
thomwolf
ea9dbea9d5 update GPT2 loss computation for more flexbility 2019-05-07 23:27:18 +02:00
thomwolf
ce86336545 add predict_special_tokens option to GPT also 2019-05-07 16:47:22 +02:00
thomwolf
d1b6979aa5 GPT-2 option to avoid predicting special tokens 2019-05-07 16:25:53 +02:00
huntzhan
101ab4dd8e Make the epsilon of LayerNorm configurable. 2019-05-06 00:26:21 +08:00
thomwolf
e211785ada extract attention weights from GPT 2019-05-02 18:31:26 +02:00