Commit Graph

481 Commits

Author SHA1 Message Date
erenup
3c7e676f8b add test related code: test the best dev acc model when model is training 2019-08-28 15:57:29 +08:00
VictorSanh
93e82ab424 Write README for DilBERT 2019-08-28 06:26:09 +00:00
VictorSanh
fea921d382 add licensing 2019-08-28 04:45:39 +00:00
VictorSanh
da1e4e53fc some fixes in train.py for loading previous checkpoint 2019-08-28 04:01:03 +00:00
VictorSanh
0d8f8848d5 add scripts/extract_for_distil.py 2019-08-28 04:00:19 +00:00
VictorSanh
7f2c384c80 add scripts/token_counts.py 2019-08-28 04:00:03 +00:00
VictorSanh
4d16b279e5 add scripts/binarized_data.py 2019-08-28 03:59:48 +00:00
VictorSanh
b247b0d880 add train.py for distillation 2019-08-28 02:12:47 +00:00
VictorSanh
780f183e55 add requirements 2019-08-28 01:39:52 +00:00
VictorSanh
e424d2e45d add README 2019-08-28 01:10:10 +00:00
VictorSanh
1ae81e4aa1 add dataset. distiller, utils 2019-08-28 01:10:05 +00:00
thomwolf
06510ccb53 typo 2019-08-23 22:08:10 +02:00
thomwolf
ab7bd5ef98 fixing tokenization and training 2019-08-23 17:31:21 +02:00
Thomas Wolf
90dcd8c05d
Merge branch 'master' into generative-finetuning 2019-08-22 10:43:30 +02:00
VictorSanh
57272d5ddf fix for glue 2019-08-22 00:25:49 -04:00
VictorSanh
b006a7a12f fix for squad 2019-08-22 00:25:42 -04:00
Thomas Wolf
9beaa85b07
Merge pull request #1055 from qipeng/run_squad_fix
Fix #1015 (tokenizer defaults to use_lower_case=True when loading from trained models)
2019-08-21 01:20:46 +02:00
Lysandre
2d042274ac Sequence special token handling for BERT and RoBERTa 2019-08-20 14:15:28 -04:00
Peng Qi
3bffd2e8e5 more fixes 2019-08-20 10:59:28 -07:00
Thomas Wolf
3b56427a1e
Merge pull request #1040 from FeiWang96/multi_gpu
Fix bug of multi-gpu training in lm finetuning
2019-08-20 17:13:44 +02:00
thomwolf
a690edab17 various fix and clean up on run_lm_finetuning 2019-08-20 15:52:12 +02:00
erenup
fc74132598 add best steps to train 2019-08-20 19:06:41 +08:00
Duzeyao
d86b49ac86 swap optimizer.step and scheduler.step 2019-08-20 16:46:34 +08:00
Duzeyao
45ab8bf60e Revert "Update finetune_on_pregenerated.py"
This reverts commit a1359b970c.
2019-08-20 16:40:39 +08:00
erenup
97c30b73d5 add test related code 2019-08-20 16:31:04 +08:00
erenup
d5e60e5b7a add test related code 2019-08-20 16:25:50 +08:00
Zeyao Du
a1359b970c
Update finetune_on_pregenerated.py 2019-08-20 16:00:07 +08:00
Zeyao Du
28f7ca1f80
swap optimizer.step and scheduler.step 2019-08-20 15:58:42 +08:00
Peng Qi
a368b87791 Fix #1015 2019-08-19 13:07:00 -07:00
Lysandre
f94f1c6016 Distributed training + tokenizer agnostic mask token 2019-08-19 14:58:50 -04:00
Thomas Wolf
5a49b793d9
Merge pull request #1023 from tuvuumass/patch-1
fix issue #824
2019-08-19 15:31:46 +02:00
erenup
4270d3da1b fix a bug of evaluating 2019-08-19 16:38:52 +08:00
Chi-Liang Liu
40acf6b52a don't save model without training 2019-08-18 05:02:25 -04:00
erenup
47e9aea0fe add args info to evaluate_result.txt 2019-08-18 17:00:53 +08:00
erenup
5582bc4b23 add multiple choice to robreta and xlnet, test on swag, roberta=0.82.28
, xlnet=0.80
2019-08-18 16:01:48 +08:00
wangfei
856a63da4d Fix: save model/model.module 2019-08-18 11:03:47 +08:00
wangfei
1ef41b8337 Revert "Fix: save model/model.module"
This reverts commit 00e9c4cc96.
2019-08-18 11:03:12 +08:00
wangfei
00e9c4cc96 Fix: save model/model.module 2019-08-18 11:02:02 +08:00
erenup
e384ae2b9d Merge remote-tracking branch 'huggingface/master'
merge huggingface/master to update
2019-08-17 12:05:57 +08:00
Jason Phang
d8923270e6 Correct truncation for RoBERTa in 2-input GLUE 2019-08-16 16:30:38 -04:00
Lysandre
5652f54ac2 Simplified data generator + better perplexity calculator
GPT-2 now obtains ~20 perplexity on WikiText-2
2019-08-16 13:49:56 -04:00
LysandreJik
7e7fc53da5 Fixing run_glue example with RoBERTa 2019-08-16 11:53:10 -04:00
LysandreJik
715534800a BERT + RoBERTa masking tokens handling + GPU device update. 2019-08-16 10:10:21 -04:00
LysandreJik
339e556feb CLM for BERT, beginning of CLM fot RoBERTa; still needs a better masking token mechanism. 2019-08-16 10:10:20 -04:00
LysandreJik
5c18825a18 Removed dataset limit 2019-08-16 10:10:20 -04:00
LysandreJik
3e3e145497 Added GPT to the generative fine-tuning. 2019-08-16 10:10:20 -04:00
LysandreJik
47975ed53e Language Modeling fine-tuning using GPT-2. 2019-08-16 10:10:20 -04:00
wangfei
b8ff56896c Fix bug of multi-gpu training in lm finetuning 2019-08-16 12:11:05 +08:00
Rabeeh KARIMI
3d47a7f8ab loads the tokenizer for each checkpoint, to solve the reproducability issue 2019-08-14 10:58:26 +02:00
LysandreJik
39f426be65 Added special tokens <pad> and <mask> to RoBERTa. 2019-08-13 15:19:50 -04:00
Julien Chaumond
baf08ca1d4 [RoBERTa] run_glue: correct pad_token + reorder labels 2019-08-13 12:51:15 -04:00
tuvuumass
ba4bce2581
fix issue #824 2019-08-13 11:26:27 -04:00
Julien Chaumond
912fdff899 [RoBERTa] Update run_glue for RoBERTa 2019-08-12 13:49:50 -04:00
erenup
b219029c45 refactoring old run_swag. This script is mainly refatored from run_squad in pytorch_transformers 2019-08-11 15:20:37 +08:00
Thomas Wolf
b4f9464f90
Merge pull request #960 from ethanjperez/patch-1
Fixing unused weight_decay argument
2019-08-07 10:09:55 +02:00
Thomas Wolf
d43dc48b34
Merge branch 'master' into auto_models 2019-08-05 19:17:35 +02:00
thomwolf
70c10caa06 add option mentioned in #940 2019-08-05 17:09:37 +02:00
thomwolf
b90e29d52c working on automodels 2019-08-05 16:06:34 +02:00
Ethan Perez
28ba345ecc
Fixing unused weight_decay argument
Currently the L2 regularization is hard-coded to "0.01", even though there is a --weight_decay flag implemented (that is unused). I'm making this flag control the weight decay used for fine-tuning in this script.
2019-08-04 12:31:46 -04:00
Thomas Wolf
c054b5ee64
Merge pull request #896 from zijunsun/master
fix multi-gpu training bug when using fp16
2019-07-26 19:31:02 +02:00
zijunsun
f0aeb7a814 multi-gpu training also should be after apex fp16(squad) 2019-07-26 15:23:29 +08:00
zijunsun
adb3ef6368 multi-gpu training also should be after apex fp16 2019-07-25 13:09:10 +08:00
Chi-Liang Liu
a7fce6d917 fix squad v1 error (na_prob_file should be None) 2019-07-24 16:11:36 +08:00
thomwolf
6070b55443 fix #868 2019-07-23 17:46:01 +02:00
thomwolf
2c9a3115b7 fix #858 2019-07-23 16:45:55 +02:00
Thomas Wolf
268c6cc160
Merge pull request #845 from rabeehk/master
fixed version issues in run_openai_gpt
2019-07-23 15:29:31 +02:00
Peiqin Lin
76be189b08 typos 2019-07-21 20:39:42 +08:00
Rabeeh KARIMI
f63ff536ad fixed version issues in run_openai_gpt 2019-07-20 12:43:07 +02:00
Thomas Wolf
a615499076
Merge pull request #797 from yzy5630/fix-examples
fix some errors for distributed lm_finetuning
2019-07-18 23:32:33 +02:00
yzy5630
a1fe4ba9c9 use new API for save and load 2019-07-18 15:45:23 +08:00
yzy5630
a7ba27b1b4 add parser for adam 2019-07-18 08:52:51 +08:00
yzy5630
d6522e2873 change loss and optimizer to new API 2019-07-17 21:22:34 +08:00
thomwolf
71d597dad0 fix #800 2019-07-17 13:51:09 +02:00
yzy5630
123da5a2fa fix errors for lm_finetuning examples 2019-07-17 09:56:07 +08:00
yzy5630
60a1bdcdac fix some errors for distributed lm_finetuning 2019-07-17 09:16:20 +08:00
thomwolf
e848b54730 fix #792 2019-07-16 21:22:19 +02:00
thomwolf
1849aa7d39 update readme and pretrained model weight files 2019-07-16 15:11:29 +02:00
thomwolf
f31154cb9d Merge branch 'xlnet' 2019-07-16 11:51:13 +02:00
thomwolf
76da9765b6 fix run_generation test 2019-07-15 17:52:35 +02:00
thomwolf
e691fc0963 update QA models tests + run_generation 2019-07-15 17:45:24 +02:00
thomwolf
15d8b1266c update tokenizer - update squad example for xlnet 2019-07-15 17:30:42 +02:00
thomwolf
3b469cb422 updating squad for compatibility with XLNet 2019-07-15 15:28:37 +02:00
thomwolf
0e9825e252 small fix to run_glue 2019-07-14 23:43:28 +02:00
thomwolf
2397f958f9 updating examples and doc 2019-07-14 23:20:10 +02:00
thomwolf
c490f5ce87 added generation examples in tests 2019-07-13 15:26:58 +02:00
thomwolf
7d4b200e40 good quality generation example for GPT, GPT-2, Transfo-XL, XLNet 2019-07-13 15:25:03 +02:00
thomwolf
7322c314a6 remove python2 testing for examples 2019-07-12 14:24:08 +02:00
thomwolf
936e813c84 clean up examples - added squad example and test 2019-07-12 14:16:06 +02:00
thomwolf
762ded9b1c wip examples 2019-07-12 11:28:52 +02:00
LysandreJik
3821ecbf4a Byte order mark management in TSV glue reading. 2019-07-11 20:16:28 -04:00
thomwolf
c6bf1a400d fix test examples et model pretrained 2019-07-11 22:29:08 +02:00
thomwolf
92a782b108 fix run_glue test 2019-07-11 22:20:10 +02:00
thomwolf
ccb6947dc1 optimization tests 2019-07-11 17:39:47 +02:00
thomwolf
b21d84b027 update examples 2019-07-11 15:37:34 +02:00
thomwolf
ec07cf5a66 rewamp optimization 2019-07-11 14:48:22 +02:00
thomwolf
4fef5919a5 updating examples 2019-07-11 12:03:08 +02:00
thomwolf
50b7e52a7f WIP examples 2019-07-10 15:33:34 +02:00
thomwolf
ed6c8d37f4 fix merge 2019-07-09 17:14:52 +02:00
thomwolf
4ce237c880 update run_glue 2019-07-09 17:00:32 +02:00
thomwolf
3b7cb7bf44 small update to run_glue 2019-07-09 16:12:15 +02:00
thomwolf
d0efbd3cd1 update sequencesummary module 2019-07-09 15:46:43 +02:00
thomwolf
d5481cbe1b adding tests to examples - updating summary module - coverage update 2019-07-09 15:29:42 +02:00
thomwolf
b19786985d unified tokenizer api and serialization + tests 2019-07-09 10:25:18 +02:00
thomwolf
3d5f291386 updates to run_glue 2019-07-05 17:22:15 +02:00
thomwolf
99b90edab1 cleaning up run_glue example 2019-07-05 17:09:35 +02:00
thomwolf
1113f97f33 clean up glue example 2019-07-05 16:31:13 +02:00
thomwolf
162ba383b0 fix model loading 2019-07-05 15:57:14 +02:00
thomwolf
36bca545ff tokenization abstract class - tests for examples 2019-07-05 15:02:59 +02:00
Thomas Wolf
78462aad61
Merge pull request #733 from ceremonious/parallel-generation
Added option to use multiple workers to create training data
2019-07-05 12:04:30 +02:00
thomwolf
0bab55d5d5 [BIG] name change 2019-07-05 11:55:36 +02:00
thomwolf
c41f2bad69 WIP XLM + refactoring 2019-07-03 22:54:39 +02:00
Lei Mao
64b2a828c0 fix evaluation bug 2019-07-01 14:56:24 -07:00
thomwolf
2b56e98892 standardizing API across models - XLNetForSeqClass working 2019-06-28 16:35:09 +02:00
thomwolf
3a00674cbf fix imports 2019-06-27 17:18:46 +02:00
Mayhul Arora
08ff056c43 Added option to use multiple workers to create training data for lm fine tuning 2019-06-26 16:16:12 -07:00
thomwolf
59cefd4f98 fix #726 - get_lr in examples 2019-06-26 11:28:27 +02:00
thomwolf
092dacfd62 changing is_regression to unified API 2019-06-26 09:54:05 +02:00
thomwolf
e55d4c4ede various updates to conversion, models and examples 2019-06-26 00:57:53 +02:00
thomwolf
7334bf6c21 pad on left for xlnet 2019-06-24 15:05:11 +02:00
thomwolf
c888663f18 overwrite output directories if needed 2019-06-24 14:38:24 +02:00
thomwolf
62d78aa37e updating GLUE utils for compatibility with XLNet 2019-06-24 14:36:11 +02:00
thomwolf
24ed0b9346 updating run_xlnet_classifier 2019-06-24 12:00:09 +02:00
thomwolf
f6081f2255 add xlnetforsequence classif and run_classifier example for xlnet 2019-06-24 10:01:07 +02:00
Rocketknight1
c7b2808ed7 Update LM finetuning README to include a literature reference 2019-06-22 15:04:01 +01:00
thomwolf
181075635d updating model loading and adding special tokens ids 2019-06-21 23:23:37 +02:00
thomwolf
ebd2cb8d74 update from_pretrained to load XLNetModel as well 2019-06-21 21:08:44 +02:00
thomwolf
edfe91c36e first version bertology ok 2019-06-19 23:43:04 +02:00
thomwolf
7766ce66dd update bertology 2019-06-19 22:29:51 +02:00
thomwolf
e4b46d86ce update head pruning 2019-06-19 22:16:30 +02:00
thomwolf
0f40e8d6a6 debugger 2019-06-19 15:38:46 +02:00
thomwolf
0e1e8128bf more logging 2019-06-19 15:35:49 +02:00
thomwolf
909d4f1af2 cuda again 2019-06-19 15:32:10 +02:00
thomwolf
14f0e8e557 fix cuda 2019-06-19 15:29:28 +02:00
thomwolf
34d706a0e1 pruning in bertology 2019-06-19 15:25:49 +02:00
thomwolf
dc8e0019b7 updating examples 2019-06-19 13:23:20 +02:00
thomwolf
68ab9599ce small fix and updates to readme 2019-06-19 09:38:38 +02:00
thomwolf
f7e2ac01ea update barrier 2019-06-18 22:43:35 +02:00
thomwolf
4d8c4337ae test barrier in distrib training 2019-06-18 22:41:28 +02:00
thomwolf
3359955622 updating run_classif 2019-06-18 22:23:10 +02:00
thomwolf
29b7b30eaa updating evaluation on a single gpu 2019-06-18 22:20:21 +02:00
thomwolf
7d2001aa44 overwrite_output_dir 2019-06-18 22:13:30 +02:00
thomwolf
16a1f338c4 fixing 2019-06-18 17:06:31 +02:00
thomwolf
92e0ad5aba no numpy 2019-06-18 17:00:52 +02:00
thomwolf
4e6edc3274 hop 2019-06-18 16:57:15 +02:00
thomwolf
f55b60b9ee fixing again 2019-06-18 16:56:52 +02:00
thomwolf
8bd9118294 quick fix 2019-06-18 16:54:41 +02:00
thomwolf
3e847449ad fix out_label_ids 2019-06-18 16:53:31 +02:00
thomwolf
aad3a54e9c fix paths 2019-06-18 16:48:04 +02:00
thomwolf
40dbda6871 updating classification example 2019-06-18 16:45:52 +02:00
thomwolf
7388c83b60 update run_classifier for distributed eval 2019-06-18 16:32:49 +02:00