Commit Graph

168 Commits

Author SHA1 Message Date
Nikolay Korolev
53282b5bd0
Change attention mask dtype to be bool. Fix #1119 2019-08-27 14:19:03 +03:00
LysandreJik
e08c01aa1a fix #1102 2019-08-26 18:13:06 -04:00
Abhishek Rao
c603d099aa reraise EnvironmentError in from_pretrained functions of Model and Tokenizer 2019-08-22 15:25:40 -07:00
Abhishek Rao
14eef67eb2 Fix at config rather than model 2019-08-21 15:48:43 -07:00
Abhishek Rao
296df2b18c reraise exception 2019-08-21 15:29:30 -07:00
thomwolf
fdc487d8b3 Add max length 2019-08-21 02:35:01 +02:00
thomwolf
aa05dc8935 adding gpt-2 large 2019-08-21 02:29:34 +02:00
Thomas Wolf
e4515faf54
Merge pull request #1057 from huggingface/fixes
Add a few of typos corrections, bugs fixes and small improvements
2019-08-21 01:54:05 +02:00
Thomas Wolf
41789c6c3d
Merge pull request #1059 from GuillemGSubies/master
Better use of spacy tokenizer in open ai and xlm tokenizers
2019-08-21 01:53:48 +02:00
Thomas Wolf
d30cbaf5dc
Merge branch 'master' into iterative_split_on_token 2019-08-21 01:33:02 +02:00
Thomas Wolf
e753f249e1
Merge pull request #806 from wschin/fix-a-path
Fix a path so that a test can run on Windows
2019-08-21 01:14:40 +02:00
thomwolf
43489756ad adding proxies options for the from_pretrained methods 2019-08-20 16:59:11 +02:00
Guillem García Subies
388e3251fa
Update tokenization_xlm.py 2019-08-20 14:19:39 +02:00
Guillem García Subies
f5e2ed0fd8
Update tokenization_openai.py 2019-08-20 14:19:25 +02:00
Guillem García Subies
562b998366
Update tokenization_openai.py 2019-08-20 14:10:19 +02:00
Guillem García Subies
bb04446285
Update tokenization_openai.py 2019-08-20 14:07:40 +02:00
Guillem García Subies
bfd75056b0
Update tokenization_xlm.py 2019-08-20 14:06:17 +02:00
thomwolf
6d0aa73981 fix #1034 2019-08-20 12:20:21 +02:00
Julien Chaumond
b0b9b8091b minor typo 2019-08-20 11:33:46 +02:00
thomwolf
53c8f700f4 fix #808 2019-08-20 11:29:26 +02:00
thomwolf
901dde0e45 fix #1014 2019-08-20 11:05:51 +02:00
thomwolf
fecaed0ed4 add force_download option to from_pretrained methods 2019-08-20 10:56:12 +02:00
Lysandre
c589862b78 Doc: loading from config alone does not load the model weights 2019-08-19 10:17:47 -04:00
LysandreJik
ab05280666 Order of strings in AutoModel/AutoTokenizer updated. 2019-08-16 09:53:26 -04:00
LysandreJik
83dba0b67b Added RoBERTa tokenizer to AutoTokenizer 2019-08-15 17:07:07 -04:00
LysandreJik
e24e19ce3b Added RoBERTa to AutoModel/AutoConfig 2019-08-15 14:02:11 -04:00
LysandreJik
fe02e45e48 Release: 1.1.0 2019-08-15 11:15:08 -04:00
Lysandre Debut
88efc65bac
Merge pull request #964 from huggingface/RoBERTa
RoBERTa: model conversion, inference, tests 🔥
2019-08-15 11:11:10 -04:00
LysandreJik
8308170156 Warning for RoBERTa sequences encoded without special tokens. 2019-08-15 10:29:04 -04:00
LysandreJik
572dcfd1db Doc 2019-08-14 14:56:14 -04:00
samvelyan
9ce36e3e4b Re-implemented tokenize() iteratively in PreTrainedTokenizer. 2019-08-14 08:57:09 +00:00
LysandreJik
39f426be65 Added special tokens <pad> and <mask> to RoBERTa. 2019-08-13 15:19:50 -04:00
LysandreJik
3d87991f60 Fixed error with encoding 2019-08-13 12:00:24 -04:00
LysandreJik
634a3172d8 Added integration tests for sequence builders. 2019-08-12 15:14:15 -04:00
LysandreJik
22ac004a7c Added documentation and changed parameters for special_tokens_sentences_pair. 2019-08-12 15:13:53 -04:00
Julien Chaumond
b3d83d68db Fixup 9d0603148b 2019-08-12 12:28:55 -04:00
thomwolf
aaedfc35a8 Merge branch 'master' of https://github.com/huggingface/pytorch-transformers 2019-08-10 20:04:37 +02:00
thomwolf
c683c3d5a5 fix #993 2019-08-10 20:04:35 +02:00
Kevin Trebing
7060766490 Corrected logger.error info
Signed-off-by: Kevin Trebing <Kevin.Trebing@gmx.net>
2019-08-09 19:36:44 -04:00
LysandreJik
75d5f98fd2 Roberta tokenization + fixed tests (py3 + py2). 2019-08-09 15:02:13 -04:00
LysandreJik
14e970c271 Tokenization encode/decode class-based sequence handling 2019-08-09 15:01:38 -04:00
LysandreJik
3566d27919 Clarified PreTrainedModel.from_pretrained warning messages in documentation. 2019-08-08 19:04:34 -04:00
LysandreJik
fbd746bd06 Updated test architecture 2019-08-08 18:21:34 -04:00
LysandreJik
6c41a8f5dc Encode and Decode are back in the superclass. They now handle sentence pairs special tokens. 2019-08-08 18:20:32 -04:00
Julien Chaumond
e367ac469c [RoBERTa] Re-apply 39d72bcc7b
cc @lysandrejik
2019-08-08 11:26:11 -04:00
Julien Chaumond
9d0603148b [RoBERTa] RobertaForSequenceClassification + conversion 2019-08-08 11:24:54 -04:00
LysandreJik
f2b300df6b fix #976 2019-08-08 10:38:57 -04:00
LysandreJik
7df303f5ad fix #971 2019-08-08 10:36:26 -04:00
LysandreJik
d2cc6b101e Merge branch 'master' into RoBERTa 2019-08-08 09:42:05 -04:00
LysandreJik
39d72bcc7b Fixed the RoBERTa checkpoint conversion script according to the LM head refactoring. 2019-08-07 14:21:57 -04:00