thomwolf
|
5dd7b677ad
|
clean up all byte-level bpe tests
|
2019-08-30 12:43:08 +02:00 |
|
thomwolf
|
fd10d79b55
|
update GPT2 docstring
|
2019-08-30 12:23:12 +02:00 |
|
thomwolf
|
0517e7a1cb
|
Fix GPT2 and RoBERTa tokenizer to beging with a space - update Roberta tokenizer
|
2019-08-30 11:23:49 +02:00 |
|
thomwolf
|
fdc487d8b3
|
Add max length
|
2019-08-21 02:35:01 +02:00 |
|
thomwolf
|
aa05dc8935
|
adding gpt-2 large
|
2019-08-21 02:29:34 +02:00 |
|
thomwolf
|
009273dbdd
|
big doc update [WIP]
|
2019-08-04 12:14:57 +02:00 |
|
thomwolf
|
ac27548b25
|
fix unk_token test
|
2019-07-27 11:50:47 +02:00 |
|
thomwolf
|
57e54ec070
|
add unk_token to gpt2
|
2019-07-26 17:09:07 +02:00 |
|
thomwolf
|
15d8b1266c
|
update tokenizer - update squad example for xlnet
|
2019-07-15 17:30:42 +02:00 |
|
thomwolf
|
699bc7e86e
|
fix gpt-2 unk token test
|
2019-07-12 11:46:57 +02:00 |
|
LysandreJik
|
e3fb4310d6
|
From pretrained correct initialization. Unknown token handling for gpt2.
|
2019-07-11 18:44:29 -04:00 |
|
thomwolf
|
b19786985d
|
unified tokenizer api and serialization + tests
|
2019-07-09 10:25:18 +02:00 |
|
thomwolf
|
36bca545ff
|
tokenization abstract class - tests for examples
|
2019-07-05 15:02:59 +02:00 |
|
thomwolf
|
0bab55d5d5
|
[BIG] name change
|
2019-07-05 11:55:36 +02:00 |
|