Commit Graph

18 Commits

Author SHA1 Message Date
thomwolf
8678ff8df5 adding 17 and 100 xlm models 2019-08-30 16:26:04 +02:00
thomwolf
82462c5cba Added option to setup pretrained tokenizer arguments 2019-08-30 15:30:41 +02:00
Shijie Wu
ca4baf8ca1 Match order of casing in OSS XLM; Improve document; Clean up dependency 2019-08-27 20:03:18 -04:00
Shijie Wu
e85123d398 Add custom tokenizer for zh and ja 2019-08-23 20:27:52 -04:00
Shijie Wu
436ce07218 Tokenization behave the same as original XLM proprocessing for most languages except zh, ja and th; Change API to allow specifying language in tokenize 2019-08-23 14:40:17 -04:00
Guillem García Subies
388e3251fa
Update tokenization_xlm.py 2019-08-20 14:19:39 +02:00
Guillem García Subies
bfd75056b0
Update tokenization_xlm.py 2019-08-20 14:06:17 +02:00
LysandreJik
22ac004a7c Added documentation and changed parameters for special_tokens_sentences_pair. 2019-08-12 15:13:53 -04:00
LysandreJik
14e970c271 Tokenization encode/decode class-based sequence handling 2019-08-09 15:01:38 -04:00
thomwolf
1849aa7d39 update readme and pretrained model weight files 2019-07-16 15:11:29 +02:00
thomwolf
15d8b1266c update tokenizer - update squad example for xlnet 2019-07-15 17:30:42 +02:00
LysandreJik
7fdbc47822 Added the two CLM XLM pretrained checkpoints.
Fixed file extensions for config/vocab/merges of XLM models.
2019-07-10 19:37:24 -04:00
LysandreJik
dee3e45b93 Fixed XLM weights conversion script. Added 5 new checkpoints for XLM. 2019-07-10 19:04:21 -04:00
LysandreJik
f773faa258 Fixed all links. Removed TPU. Changed CLI to Converting TF models. Many minor formatting adjustments. Added "TODO Lysandre filled" where necessary. 2019-07-10 14:45:56 -04:00
thomwolf
d5481cbe1b adding tests to examples - updating summary module - coverage update 2019-07-09 15:29:42 +02:00
thomwolf
b19786985d unified tokenizer api and serialization + tests 2019-07-09 10:25:18 +02:00
thomwolf
36bca545ff tokenization abstract class - tests for examples 2019-07-05 15:02:59 +02:00
thomwolf
0bab55d5d5 [BIG] name change 2019-07-05 11:55:36 +02:00