Nikolay Korolev
|
53282b5bd0
|
Change attention mask dtype to be bool. Fix #1119
|
2019-08-27 14:19:03 +03:00 |
|
LysandreJik
|
e08c01aa1a
|
fix #1102
|
2019-08-26 18:13:06 -04:00 |
|
Abhishek Rao
|
c603d099aa
|
reraise EnvironmentError in from_pretrained functions of Model and Tokenizer
|
2019-08-22 15:25:40 -07:00 |
|
Abhishek Rao
|
14eef67eb2
|
Fix at config rather than model
|
2019-08-21 15:48:43 -07:00 |
|
Abhishek Rao
|
296df2b18c
|
reraise exception
|
2019-08-21 15:29:30 -07:00 |
|
thomwolf
|
fdc487d8b3
|
Add max length
|
2019-08-21 02:35:01 +02:00 |
|
thomwolf
|
aa05dc8935
|
adding gpt-2 large
|
2019-08-21 02:29:34 +02:00 |
|
Thomas Wolf
|
e4515faf54
|
Merge pull request #1057 from huggingface/fixes
Add a few of typos corrections, bugs fixes and small improvements
|
2019-08-21 01:54:05 +02:00 |
|
Thomas Wolf
|
41789c6c3d
|
Merge pull request #1059 from GuillemGSubies/master
Better use of spacy tokenizer in open ai and xlm tokenizers
|
2019-08-21 01:53:48 +02:00 |
|
Thomas Wolf
|
d30cbaf5dc
|
Merge branch 'master' into iterative_split_on_token
|
2019-08-21 01:33:02 +02:00 |
|
Thomas Wolf
|
e753f249e1
|
Merge pull request #806 from wschin/fix-a-path
Fix a path so that a test can run on Windows
|
2019-08-21 01:14:40 +02:00 |
|
thomwolf
|
43489756ad
|
adding proxies options for the from_pretrained methods
|
2019-08-20 16:59:11 +02:00 |
|
Guillem García Subies
|
388e3251fa
|
Update tokenization_xlm.py
|
2019-08-20 14:19:39 +02:00 |
|
Guillem García Subies
|
f5e2ed0fd8
|
Update tokenization_openai.py
|
2019-08-20 14:19:25 +02:00 |
|
Guillem García Subies
|
562b998366
|
Update tokenization_openai.py
|
2019-08-20 14:10:19 +02:00 |
|
Guillem García Subies
|
bb04446285
|
Update tokenization_openai.py
|
2019-08-20 14:07:40 +02:00 |
|
Guillem García Subies
|
bfd75056b0
|
Update tokenization_xlm.py
|
2019-08-20 14:06:17 +02:00 |
|
thomwolf
|
6d0aa73981
|
fix #1034
|
2019-08-20 12:20:21 +02:00 |
|
Julien Chaumond
|
b0b9b8091b
|
minor typo
|
2019-08-20 11:33:46 +02:00 |
|
thomwolf
|
53c8f700f4
|
fix #808
|
2019-08-20 11:29:26 +02:00 |
|
thomwolf
|
901dde0e45
|
fix #1014
|
2019-08-20 11:05:51 +02:00 |
|
thomwolf
|
fecaed0ed4
|
add force_download option to from_pretrained methods
|
2019-08-20 10:56:12 +02:00 |
|
Lysandre
|
c589862b78
|
Doc: loading from config alone does not load the model weights
|
2019-08-19 10:17:47 -04:00 |
|
LysandreJik
|
ab05280666
|
Order of strings in AutoModel/AutoTokenizer updated.
|
2019-08-16 09:53:26 -04:00 |
|
LysandreJik
|
83dba0b67b
|
Added RoBERTa tokenizer to AutoTokenizer
|
2019-08-15 17:07:07 -04:00 |
|
LysandreJik
|
e24e19ce3b
|
Added RoBERTa to AutoModel/AutoConfig
|
2019-08-15 14:02:11 -04:00 |
|
LysandreJik
|
fe02e45e48
|
Release: 1.1.0
|
2019-08-15 11:15:08 -04:00 |
|
Lysandre Debut
|
88efc65bac
|
Merge pull request #964 from huggingface/RoBERTa
RoBERTa: model conversion, inference, tests 🔥
|
2019-08-15 11:11:10 -04:00 |
|
LysandreJik
|
8308170156
|
Warning for RoBERTa sequences encoded without special tokens.
|
2019-08-15 10:29:04 -04:00 |
|
LysandreJik
|
572dcfd1db
|
Doc
|
2019-08-14 14:56:14 -04:00 |
|
samvelyan
|
9ce36e3e4b
|
Re-implemented tokenize() iteratively in PreTrainedTokenizer.
|
2019-08-14 08:57:09 +00:00 |
|
LysandreJik
|
39f426be65
|
Added special tokens <pad> and <mask> to RoBERTa.
|
2019-08-13 15:19:50 -04:00 |
|
LysandreJik
|
3d87991f60
|
Fixed error with encoding
|
2019-08-13 12:00:24 -04:00 |
|
LysandreJik
|
634a3172d8
|
Added integration tests for sequence builders.
|
2019-08-12 15:14:15 -04:00 |
|
LysandreJik
|
22ac004a7c
|
Added documentation and changed parameters for special_tokens_sentences_pair.
|
2019-08-12 15:13:53 -04:00 |
|
Julien Chaumond
|
b3d83d68db
|
Fixup 9d0603148b
|
2019-08-12 12:28:55 -04:00 |
|
thomwolf
|
aaedfc35a8
|
Merge branch 'master' of https://github.com/huggingface/pytorch-transformers
|
2019-08-10 20:04:37 +02:00 |
|
thomwolf
|
c683c3d5a5
|
fix #993
|
2019-08-10 20:04:35 +02:00 |
|
Kevin Trebing
|
7060766490
|
Corrected logger.error info
Signed-off-by: Kevin Trebing <Kevin.Trebing@gmx.net>
|
2019-08-09 19:36:44 -04:00 |
|
LysandreJik
|
75d5f98fd2
|
Roberta tokenization + fixed tests (py3 + py2).
|
2019-08-09 15:02:13 -04:00 |
|
LysandreJik
|
14e970c271
|
Tokenization encode/decode class-based sequence handling
|
2019-08-09 15:01:38 -04:00 |
|
LysandreJik
|
3566d27919
|
Clarified PreTrainedModel.from_pretrained warning messages in documentation.
|
2019-08-08 19:04:34 -04:00 |
|
LysandreJik
|
fbd746bd06
|
Updated test architecture
|
2019-08-08 18:21:34 -04:00 |
|
LysandreJik
|
6c41a8f5dc
|
Encode and Decode are back in the superclass. They now handle sentence pairs special tokens.
|
2019-08-08 18:20:32 -04:00 |
|
Julien Chaumond
|
e367ac469c
|
[RoBERTa] Re-apply 39d72bcc7b
cc @lysandrejik
|
2019-08-08 11:26:11 -04:00 |
|
Julien Chaumond
|
9d0603148b
|
[RoBERTa] RobertaForSequenceClassification + conversion
|
2019-08-08 11:24:54 -04:00 |
|
LysandreJik
|
f2b300df6b
|
fix #976
|
2019-08-08 10:38:57 -04:00 |
|
LysandreJik
|
7df303f5ad
|
fix #971
|
2019-08-08 10:36:26 -04:00 |
|
LysandreJik
|
d2cc6b101e
|
Merge branch 'master' into RoBERTa
|
2019-08-08 09:42:05 -04:00 |
|
LysandreJik
|
39d72bcc7b
|
Fixed the RoBERTa checkpoint conversion script according to the LM head refactoring.
|
2019-08-07 14:21:57 -04:00 |
|