Commit Graph

284 Commits

Author SHA1 Message Date
Thomas Wolf
303b5e2b92
Merge pull request #545 from ailzhang/cache_dir
move pytroch_pretrained_bert cache folder under same path as torch
2019-05-08 16:55:27 +02:00
thomwolf
0efc4ab632 adding dropout to GPT-2 and embedding dropout to GPT 2019-05-08 10:41:35 +02:00
thomwolf
ea9dbea9d5 update GPT2 loss computation for more flexbility 2019-05-07 23:27:18 +02:00
thomwolf
ce86336545 add predict_special_tokens option to GPT also 2019-05-07 16:47:22 +02:00
thomwolf
d1b6979aa5 GPT-2 option to avoid predicting special tokens 2019-05-07 16:25:53 +02:00
huntzhan
101ab4dd8e Make the epsilon of LayerNorm configurable. 2019-05-06 00:26:21 +08:00
thomwolf
e211785ada extract attention weights from GPT 2019-05-02 18:31:26 +02:00
thomwolf
db98a4a48b gpt-2 tokenizer 2019-05-01 11:40:48 +02:00
Ben Mann
74f7906db4
Fix #537 2019-04-30 19:48:22 -07:00
thomwolf
80f53f7380 gpt-2 from_pretrained can use special tokens 2019-04-30 11:10:22 +02:00
thomwolf
e79ceb1533 gpt-2 special tokens 2019-04-30 11:05:54 +02:00
thomwolf
c30139a013 add special tokens to gpt-2 2019-04-30 10:45:26 +02:00
Ailing Zhang
3963d57c89 move pytroch_pretrained_bert cache folder under same path as torch 2019-04-27 11:09:11 -07:00
thomwolf
b832d5bb8a Release: 0.6.2 2019-04-25 21:37:47 +02:00
Thomas Wolf
e6cf62d499
Merge pull request #488 from dhpollack/fix_multichoice
fixed BertForMultipleChoice model init and forward pass
2019-04-25 21:04:16 +02:00
lukovnikov
704037ad51 - updated docs for new LR API
- added some images for illustration
- updated comments in optimization
2019-04-25 15:59:39 +02:00
Thomas Wolf
d76a57b0ba
Merge pull request #506 from ailzhang/hubconf
Hubconf
2019-04-24 20:59:21 +02:00
thomwolf
80f995a141 revert BertForMultipleChoice linear classifier 2019-04-24 16:51:54 +02:00
lukovnikov
69850b4011 python 2 compat 2019-04-21 14:02:38 +02:00
lukovnikov
bb7557d3ab - removed __all__ in optimization
- removed unused plotting code
- using ABC for LRSchedule
- added some schedule object init tests
2019-04-21 13:48:33 +02:00
lukovnikov
34ccc8ebf4 Merge remote-tracking branch 'upstream/master' 2019-04-21 13:16:15 +02:00
Ailing Zhang
bfd6f6b257 fix from_pretrained positional args 2019-04-17 16:31:40 -07:00
thomwolf
23d4554ec0 is python 2 happy now 2019-04-17 14:48:34 +02:00
thomwolf
265550ec34 relax network connection requirements 2019-04-17 14:22:35 +02:00
thomwolf
fa76520240 fix file_utils on python 2 2019-04-17 13:32:22 +02:00
thomwolf
bcde2c61cb fix #497 2019-04-17 12:35:38 +02:00
Thomas Wolf
2e153930cf
Merge pull request #495 from SudoSharma/patch-2
Fix gradient overflow issue during attention mask
2019-04-17 11:10:36 +02:00
thomwolf
5afa497cbf fix GPT-2 tokenization to work also on python 3... 2019-04-17 11:04:41 +02:00
thomwolf
bc70779bf0 fixed GPT-2 tokenization on python 2 2019-04-17 10:56:15 +02:00
Abhi Sharma
9e666aaa29
Fix gradient overflow issue during attention mask
This fix is in reference to issue #382. GPT2 can now be trained in mixed precision, which I've confirmed with testing. I also tested unconditional generation on multiple seeds before and after changing 1e10 to 1e4 and there was no difference. Please let me know if there is anything else I can do to make this pull request better. Thanks for all your work!
2019-04-16 11:42:34 -07:00
thomwolf
bdaba1897c updating GPT tokenization 2019-04-16 17:44:06 +02:00
thomwolf
18a8a15f78 improving GPT2 tokenization and adding tests 2019-04-16 17:00:55 +02:00
Thomas Wolf
3d78e226e6
Merge pull request #489 from huggingface/tokenization_serialization
Better serialization for Tokenizers and Configuration classes - Also fix #466
2019-04-16 08:49:54 +02:00
Thomas Wolf
64b6ef4db0
Merge pull request #490 from huggingface/better_finetuning_GPT_GPT-2
Clean up GPT and GPT-2 losses computation
2019-04-15 16:14:50 +02:00
thomwolf
d616022455 fix openai special tokens loading 2019-04-15 16:07:45 +02:00
thomwolf
df5d9c3551 load all models on cpu 2019-04-15 15:43:01 +02:00
thomwolf
60ea6c59d2 added best practices for serialization in README and examples 2019-04-15 15:00:33 +02:00
thomwolf
b3c6ee0ac1 tokenization updates 2019-04-15 14:24:52 +02:00
thomwolf
9761aa4845 add to_json_file method to configuration classes 2019-04-15 14:12:08 +02:00
thomwolf
e8568a3b17 fixing tests 2019-04-15 12:55:38 +02:00
thomwolf
870b734bfd added tokenizers serialization tests 2019-04-15 12:03:56 +02:00
thomwolf
3e65f255dc add serialization semantics to tokenizers - fix transfo-xl tokenizer 2019-04-15 11:47:25 +02:00
David Pollack
38ba7b439b fixed BertForMultipleChoice model init and forward pass 2019-04-15 10:38:01 +02:00
thomwolf
fe2756ff41 update double head model 2019-04-15 10:04:05 +02:00
Martin Boyanov
34cf67fd6c Extend the BertForSequenceClassification docs to mention the special CLS token. 2019-04-12 21:30:28 +03:00
thomwolf
b509bf7655 updating loss computation 2019-04-12 12:12:33 +02:00
thomwolf
1d203a34c0 back to simple indexing 2019-04-11 23:51:03 +02:00
thomwolf
074c869bbe fix OpenAIGPTMultipleChoiceHead 2019-04-11 20:53:50 +02:00
thomwolf
a05fad8dce fix typo 2019-04-11 13:16:17 +02:00
thomwolf
4a82f4f856 update special token addition 2019-04-11 13:11:22 +02:00
thomwolf
991b8e65f4 Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT 2019-04-11 11:43:15 +02:00
thomwolf
e99b2014cc fixes #471 2019-04-11 11:43:13 +02:00
lukovnikov
fc7693adc3 schedule fix 2019-04-03 18:16:47 +02:00
lukovnikov
20686b78fc schedule fix 2019-04-03 18:13:52 +02:00
lukovnikov
5fed5bb3d6 schedule fix 2019-04-03 17:20:29 +02:00
lukovnikov
91a073f804 schedule fix 2019-04-03 17:10:08 +02:00
lukovnikov
1758c8fc72 - updated docs for optimization 2019-04-03 16:08:34 +02:00
lukovnikov
725a56329d Merge remote-tracking branch 'upstream/master' into optim
# Conflicts:
#	pytorch_pretrained_bert/optimization.py

- updated docs for optimization
2019-04-03 16:07:50 +02:00
Thomas Wolf
94980b529f
Merge pull request #404 from CatalinVoss/fix_lm_loss
Fix Language Modeling Loss
2019-04-03 11:35:30 +02:00
Thomas Wolf
db4dccd1b5
Merge pull request #389 from lukovnikov/master
Fix cosine schedule
2019-04-03 11:21:43 +02:00
thomwolf
19666dcb3b Should fix #438 2019-04-03 11:01:01 +02:00
thomwolf
1d8c232324 Fix #436 2019-04-03 10:51:03 +02:00
Mike Arpaia
8b5c63e4de Fixes to the TensorFlow conversion tool 2019-04-01 13:17:54 -06:00
Catalin Voss
01520d5412 Remove my unhelpful comments :) 2019-03-27 10:45:28 -07:00
Ikuya Yamada
0401317b23 Remove padding_idx from position_embeddings and token_type_embeddings 2019-03-26 21:56:35 +09:00
Catalin Voss
fda2f62395 Fix test failures due to old torch issue with non-contiguous view 2019-03-24 14:37:13 -07:00
Catalin Voss
0dd796e359 Also fix loss function issue with the double head models 2019-03-24 14:35:55 -07:00
Catalin Voss
472857c47f Fix typo syntax err (sorry, c/p from my repo) 2019-03-24 14:14:49 -07:00
Catalin Voss
2e6f5ffb96 Fix GPT language model loss here as well 2019-03-24 14:14:44 -07:00
Catalin Voss
5938f31fa7 Fix c/p typo from my experiment code 2019-03-24 14:14:40 -07:00
Catalin Voss
7797d21b8d Fix GPT2 language modeling loss computation 2019-03-24 14:14:35 -07:00
lukovnikov
262a9992d7 class weights 2019-03-18 18:29:12 +01:00
lukovnikov
19cc2c084e same 2019-03-18 15:13:35 +01:00
lukovnikov
2283dcca5e import revert 2019-03-18 13:40:12 +01:00
lukovnikov
b6c1cae67b branches, optim cosine fix 2019-03-18 13:32:04 +01:00
lukovnikov
ef28b2c747 branches, optim cosine fix 2019-03-18 13:18:07 +01:00
lukovnikov
90430ae7ec Merge remote-tracking branch 'origin/master'
# Conflicts:
#	pytorch_pretrained_bert/optimization.py
2019-03-18 13:15:29 +01:00
lukovnikov
bed6408dcc branches, optim cosine fix 2019-03-18 13:09:55 +01:00
thomwolf
e5f2d9122c adding absolute imports to gpt2, openai and transfo-xl 2019-03-14 09:55:01 +01:00
lukovnikov
20e652209c relation classification: replacing entity mention with mask token 2019-03-13 16:13:37 +01:00
lukovnikov
eac039d21f changing docker 2019-03-12 13:45:12 +01:00
lukovnikov
471daf1b6c changing docker 2019-03-12 13:32:42 +01:00
lukovnikov
9024613337 changing docker 2019-03-12 13:23:58 +01:00
lukovnikov
baf66d1419 restart cosine lr schedule 2019-03-12 13:22:23 +01:00
Thomas Wolf
9b03d67b83
Merge pull request #362 from Bharat123rox/patch-1
Make the hyperlink of NVIDIA Apex clickable
2019-03-11 09:08:51 +01:00
Thomas Wolf
13aa13dbc0
Merge pull request #358 from cdjhz/patch-1
add 'padding_idx=0' for BertEmbeddings
2019-03-11 09:06:55 +01:00
Bharat Raghunathan
f91ce0b803
Make the hyperlink of NVIDIA Apex clickable 2019-03-09 20:05:39 +05:30
lukovnikov
51efde54a9 cos fix 2019-03-09 02:45:25 +01:00
lukovnikov
f113a2dfdc readme de 2019-03-09 02:29:57 +01:00
lukovnikov
90a41dbe14 BertAdam schedule objects 2019-03-09 02:23:20 +01:00
lukovnikov
88874f6cf0 BertAdam schedule objects 2019-03-08 19:08:30 +01:00
Haozhe Ji
72fa8d03a7
add 'padding_idx=0' for BertEmbeddings 2019-03-07 20:02:55 +08:00
Philipp Glock
6190e8ce4c Fix: use dropout layer 2019-03-07 10:12:45 +01:00
thomwolf
5c85fc3977 fix typo - logger info 2019-03-06 10:05:21 +01:00
Thomas Wolf
21c88a07b7
Merge pull request #341 from potatochip/patch-1
catch exception if pathlib not install
2019-03-06 09:48:01 +01:00
Thomas Wolf
477ec4b6cc
Merge pull request #337 from CatalinVoss/patch-2
Allow tokenization of sequences > 512 for caching
2019-03-06 09:45:49 +01:00
Thomas Wolf
7b9e5a54b5
Merge pull request #327 from lukovnikov/master
Issue#324: warmup linear fixes
2019-03-06 09:44:56 +01:00
Catalin Voss
4a49c22584 Warn instead of raising in BERT and GPT-2 tokenizers as well, to allow for pre-caching of tokens 2019-03-05 12:31:45 -08:00
Aaron Mangum
0c970caa4a
catch exception if pathlib not install 2019-03-04 14:30:19 -08:00
Catalin Voss
9775b2eb27
Allow tokenization of sequences > 512 for caching
For many applications requiring randomized data access, it's easier to cache the tokenized representations than the words. So why not turn this into a warning?
2019-03-02 16:30:21 -08:00