Commit Graph

49 Commits

Author SHA1 Message Date
altsoph
079bfb32fb Evaluation fixed. 2019-10-28 10:18:58 -04:00
altsoph
438f2730a0 Evaluation code fixed. 2019-10-28 10:18:58 -04:00
Luran He
f382a8decd convert int to str before adding to a str 2019-10-10 19:20:39 -04:00
Thomas Wolf
6596e3d566
Merge pull request #1454 from bkkaggle/pytorch-built-in-tensorboard
Change tensorboard imports to use built-in tensorboard if available
2019-10-10 11:56:55 +02:00
Lysandre Debut
e84470ef81
Merge pull request #1384 from huggingface/encoding-qol
Quality of life enhancements in encoding + patch MLM masking
2019-10-09 11:18:24 -04:00
jinoobaek-qz
69629c4f0f Improve naming and only do regex when necessary 2019-10-09 08:48:40 -04:00
jinoobaek-qz
bf34a252b8 Golden path 2019-10-09 08:48:40 -04:00
jinoobaek-qz
528d3f327b Improve readability and improve make less assumptions about checkpoint format 2019-10-09 08:48:40 -04:00
jinoobaek-qz
56301bd9e8 Extract method 2019-10-09 08:48:40 -04:00
jinoobaek-qz
d6c5469712 Delete older checkpoint after saving new checkpoint 2019-10-09 08:48:40 -04:00
jinoobaek-qz
54a31f50fb Add save_total_limit 2019-10-09 08:48:40 -04:00
Bilal Khan
5ce8d29abe Change tensorboard imports to use built-in tensorboard if available 2019-10-08 16:29:43 -05:00
thomwolf
6c1d0bc066 update encode_plus - add truncation strategies 2019-10-04 17:38:38 -04:00
LysandreJik
aebd83230f Update naming + remove f string in run_lm_finetuning example 2019-10-03 11:31:36 -04:00
LysandreJik
5ed50a93fb LM finetuning won't mask special tokens anymore 2019-10-03 11:31:36 -04:00
Brian Ma
2195c0d5f9 Evaluation result.txt path changing #1286 2019-10-03 12:49:12 +08:00
Thomas Wolf
963529e29b
Merge pull request #1288 from echan00/master
Typo with LM Fine tuning script
2019-10-01 18:46:07 -04:00
thomwolf
f7978f70ec use format instead of f-strings 2019-10-01 18:45:38 -04:00
Denny
9478590630
Update run_lm_finetuning.py
The previous method, just as phrased, did not exist in the class.
2019-09-27 15:18:42 -03:00
mgrankin
f71a4577b8 faster dataset building 2019-09-26 16:53:13 +03:00
thomwolf
31c23bd5ee [BIG] pytorch-transformers => transformers 2019-09-26 10:15:53 +02:00
LysandreJik
bf503158c5 Sentence -> Sequence. Removed output_mask from the special token addition methods. 2019-09-19 10:55:06 +02:00
LysandreJik
88368c2a16 Added DistilBERT to run_lm_finetuning 2019-09-19 10:55:06 +02:00
Erik Chan
f0340eccf9
Typo
Typo
2019-09-18 13:42:11 -07:00
Rohit Kumar Singh
e5df36397b
changes in return statement of evaluate function
changed `results` to `result` and removed `results` dict defined previously
2019-09-09 19:55:57 +05:30
LysandreJik
593c070435 Better examples 2019-09-06 12:00:12 -04:00
thomwolf
06510ccb53 typo 2019-08-23 22:08:10 +02:00
thomwolf
ab7bd5ef98 fixing tokenization and training 2019-08-23 17:31:21 +02:00
Lysandre
2d042274ac Sequence special token handling for BERT and RoBERTa 2019-08-20 14:15:28 -04:00
thomwolf
a690edab17 various fix and clean up on run_lm_finetuning 2019-08-20 15:52:12 +02:00
Matthew Carrigan
f19ba35b2b Move old finetuning script into the new folder 2019-03-20 16:47:06 +00:00
thomwolf
5c85fc3977 fix typo - logger info 2019-03-06 10:05:21 +01:00
Davide Fiocco
65df0d78ed
--do_lower_case is duplicated in parser args
Deleting one repetition (please review!)
2019-02-13 15:30:05 +01:00
Thomas Wolf
03cdb2a390
Merge pull request #254 from huggingface/python_2
Adding OpenAI GPT and Transformer-XL models, compatibility with Python 2
2019-02-11 14:19:26 +01:00
tholor
9aebc711c9 adjust error message related to args.do_eval 2019-02-07 11:49:38 +01:00
tholor
4a450b25d5 removing unused argument eval_batch_size from LM finetuning #256 2019-02-07 10:06:38 +01:00
Thomas Wolf
848aae49e1
Merge branch 'master' into python_2 2019-02-06 00:13:20 +01:00
thomwolf
448937c00d python 2 compatibility 2019-02-06 00:07:46 +01:00
Thomas Wolf
e9e77cd3c4
Merge pull request #218 from matej-svejda/master
Fix learning rate problems in run_classifier.py
2019-02-05 15:40:44 +01:00
thomwolf
1579c53635 more explicit notation: num_train_step => num_train_optimization_steps 2019-02-05 15:36:33 +01:00
tholor
ce75b169bd avoid confusion of inplace masking of tokens_a / tokens_b 2019-01-31 11:42:06 +01:00
Matej Svejda
5169069997 make examples consistent, revert error in num_train_steps calculation 2019-01-30 11:47:25 +01:00
Matej Svejda
9c6a48c8c3 fix learning rate/fp16 and warmup problem for all examples 2019-01-27 14:07:24 +01:00
nhatchan
6c65cb2492 lm_finetuning compatibility with Python 3.5
dicts are not ordered in Python 3.5 or prior, which is a cause of #175.
This PR replaces one with a list, to keep its order.
2019-01-13 21:09:13 +09:00
tholor
506e5bb0c8 add do_lower_case arg and adjust model saving for lm finetuning. 2019-01-11 08:32:46 +01:00
thomwolf
2e4db64cab add do_lower_case tokenizer loading optino in run_squad and ine_tuning examples 2019-01-07 13:06:42 +01:00
thomwolf
c9fd350567 remove default when action is store_true in arguments 2019-01-07 13:01:54 +01:00
tholor
e5fc98c542 add exemplary training data. update to nvidia apex. refactor 'item -> line in doc' mapping. add warning for unknown word. 2018-12-20 18:30:52 +01:00
deepset
a58361f197
Add example for fine tuning BERT language model (#1)
Adds an example for loading a pre-trained BERT model and fine tune it as a language model (masked tokens & nextSentence) on your target corpus.
2018-12-18 10:32:25 +01:00