Commit Graph

15053 Commits

Author SHA1 Message Date
LysandreJik
ab984a8b72 Python 2 compatibility 2019-09-19 15:01:33 +02:00
LysandreJik
3df208c93a Tokenizer accepts token list as well as string 2019-09-19 14:47:52 +02:00
LysandreJik
66ea76b8a9 prepare_for_model and prepare_pair_for_model methods. Added an option to select which sequence will be truncated. 2019-09-19 13:50:51 +02:00
LysandreJik
60414f31a9 GLUE updated with new methods 2019-09-19 10:55:06 +02:00
LysandreJik
baa74326ab Stride + tests + small fixes 2019-09-19 10:55:06 +02:00
LysandreJik
c10c7d59e7 Mask computing in standalone method. Tests. 2019-09-19 10:55:06 +02:00
LysandreJik
bf503158c5 Sentence -> Sequence. Removed output_mask from the special token addition methods. 2019-09-19 10:55:06 +02:00
LysandreJik
8cba057260 Doc + remove artefacts 2019-09-19 10:55:06 +02:00
LysandreJik
6393261e41 encode + encode_plus tests modified 2019-09-19 10:55:06 +02:00
LysandreJik
dcc9bb3252 Modified encode to return only lists. Added a more complete encode_plus method 2019-09-19 10:55:06 +02:00
LysandreJik
af23b626c8 Max encoding length + corresponding tests 2019-09-19 10:55:06 +02:00
LysandreJik
c4d4f3ec8c Updated DistilBERT test to reflect the sequence encoding 2019-09-19 10:55:06 +02:00
LysandreJik
d572d7027b Number of added tokens calculator 2019-09-19 10:55:06 +02:00
LysandreJik
de8e14b6c0 Added DistilBERT to run_squad script 2019-09-19 10:55:06 +02:00
LysandreJik
88368c2a16 Added DistilBERT to run_lm_finetuning 2019-09-19 10:55:06 +02:00
LysandreJik
2d8ec5a684 Changed warning to be more explicit
Co-authored by: julien_c <chaumond@gmail.com>
2019-09-19 10:55:06 +02:00
LysandreJik
75635072e1 Updated GLUE script to add DistilBERT. Cleaned up unused args in the utils file. 2019-09-19 10:55:06 +02:00
LysandreJik
92a9976e91 Distilbert sequence builder w/ mask 2019-09-19 10:55:06 +02:00
LysandreJik
59057abe52 typo 2019-09-19 10:55:06 +02:00
LysandreJik
bac332fec0 Updated the GLUE data processor. Corrections to RoBERTa and XLNet. 2019-09-19 10:55:06 +02:00
LysandreJik
c3df2136e1 Added binary masking tests 2019-09-19 10:55:06 +02:00
LysandreJik
e391d4735e Tokenizers' encode function can output binary masks 2019-09-19 10:55:06 +02:00
sshleifer
119610b5c5 Merge branch 'master' into delete-n-special-doc 2019-09-19 01:35:01 -07:00
sshleifer
08e4ad5eea Remove documentation for unused kwarg 2019-09-18 16:35:01 -07:00
Erik Chan
f0340eccf9
Typo
Typo
2019-09-18 13:42:11 -07:00
Thomas Wolf
0d1dad6d53
Merge pull request #1004 from erenup/master
Refactoring old run_swag.py
2019-09-18 21:42:51 +02:00
erenup
8960988f35 fixed to find best dev acc 2019-09-19 01:10:05 +08:00
erenup
b57bfb5fa0
Merge pull request #3 from erenup/run_multiple_choice_merge
Run multiple choice merge
2019-09-18 21:45:04 +08:00
erenup
46ffc28329 Merge branch 'master' into run_multiple_choice_merge
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
2019-09-18 21:43:46 +08:00
Simon Layton
ec94f4e0f8
Fix fp16 masking in PoolerEndLogits
Necessary to run xlnet (at least in squad) with `--fp16 --fp16_opt_level="O2"`, otherwise loss is immediately `NaN` and fine-tuning cannot proceed.
2019-09-18 09:30:58 -04:00
erenup
15143fbad6 move run_multiple_choice.py and utils_multiple_choice.py to examples 2019-09-18 21:18:46 +08:00
erenup
3cd6289758 Merge remote-tracking branch 'huggingface/master' into run_multiple_choice_merge
# Conflicts:
#	examples/contrib/run_swag.py
2019-09-18 21:16:59 +08:00
erenup
36362cf086 move schedule.step after optimizer.step 2019-09-18 21:13:40 +08:00
thomwolf
3a527fa820 OpenAI GPT tests ok 2019-09-18 14:15:48 +02:00
thomwolf
556442afb3 hot fix 2019-09-18 14:12:41 +02:00
thomwolf
160b5d6080 fix xlm lang_embeddings loading 2019-09-18 14:10:20 +02:00
thomwolf
26497d1199 fix tests 2019-09-18 12:17:21 +02:00
thomwolf
6a083fd447 update pt-tf conversion script 2019-09-18 12:11:32 +02:00
thomwolf
f6969cc12b upgrade max model difference to 2e-2 (for transfo-xl adaptive softmax + inputs) 2019-09-18 11:12:02 +02:00
thomwolf
e768f2322a update run_openai_gpt to fix #1264 2019-09-18 10:07:47 +02:00
thomwolf
8334993915 clean up examples - updated to new keyword inputs - #1246 2019-09-18 10:01:27 +02:00
Julien Chaumond
62760baf46 tiny fixes 2019-09-17 18:29:15 -04:00
thomwolf
45de034bf8 fix #1223 2019-09-17 10:25:06 +02:00
erenup
5a81e79e25
Merge pull request #2 from erenup/run_multiple_choice_add_doc
Run multiple choice add doc
2019-09-16 22:39:54 +08:00
erenup
5882c442e5 add example usage 2019-09-16 22:38:08 +08:00
erenup
a9debaca3d fixed init_weight 2019-09-16 19:55:24 +08:00
thomwolf
c88f05163d fix typo in XLM models 2019-09-16 13:42:20 +02:00
erenup
982f181aa7 Merge remote-tracking branch 'origin/master' into run_multiple_choice_add_doc 2019-09-16 19:12:00 +08:00
erenup
84b9d1c423 Merge remote-tracking branch 'huggingface/master'
# Conflicts:
#	pytorch_transformers/__init__.py
2019-09-16 19:06:12 +08:00
erenup
603b470a3d add warnning info 2019-09-16 18:53:37 +08:00