Commit Graph

21 Commits

Author SHA1 Message Date
LysandreJik
c10c7d59e7 Mask computing in standalone method. Tests. 2019-09-19 10:55:06 +02:00
LysandreJik
bf503158c5 Sentence -> Sequence. Removed output_mask from the special token addition methods. 2019-09-19 10:55:06 +02:00
LysandreJik
6393261e41 encode + encode_plus tests modified 2019-09-19 10:55:06 +02:00
LysandreJik
af23b626c8 Max encoding length + corresponding tests 2019-09-19 10:55:06 +02:00
LysandreJik
d572d7027b Number of added tokens calculator 2019-09-19 10:55:06 +02:00
LysandreJik
c3df2136e1 Added binary masking tests 2019-09-19 10:55:06 +02:00
thomwolf
5c6cac102b adding test for common properties and cleaning up a bit base class 2019-09-05 21:31:29 +02:00
thomwolf
fede4ef45d fixing #1133 2019-09-02 02:27:39 +02:00
Thomas Wolf
d483cd8e46
Merge pull request #1074 from huggingface/improved_testing
Shortcut to special tokens' ids - fix GPT2 & RoBERTa tokenizers - improved testing for GPT/GPT-2
2019-08-30 23:18:58 +02:00
thomwolf
7044ed6b05 fix tokenizers serialization 2019-08-30 17:36:11 +02:00
thomwolf
69da972ace added test and debug tokenizer configuration serialization 2019-08-30 17:09:36 +02:00
thomwolf
abe734ca1f fix GPT-2 and RoBERTa tests to be clean now 2019-08-30 12:20:18 +02:00
thomwolf
d51f72d5de adding shortcut to the ids of all the special tokens 2019-08-30 11:41:11 +02:00
thomwolf
328afb7097 cleaning up tokenizer tests structure (at last) - last remaining ppb refs 2019-08-05 14:08:56 +02:00
thomwolf
1849aa7d39 update readme and pretrained model weight files 2019-07-16 15:11:29 +02:00
thomwolf
15d8b1266c update tokenizer - update squad example for xlnet 2019-07-15 17:30:42 +02:00
thomwolf
d743f2f34e updating test 2019-07-09 15:58:58 +02:00
thomwolf
c079d7ddff fix python 2 tests 2019-07-09 10:40:59 +02:00
thomwolf
b19786985d unified tokenizer api and serialization + tests 2019-07-09 10:25:18 +02:00
thomwolf
6dacc79d39 fix python2 tests 2019-07-05 15:11:59 +02:00
thomwolf
0bab55d5d5 [BIG] name change 2019-07-05 11:55:36 +02:00