Yiqing-Zhou
b1019d2a8e
token[-1] -> token.rstrip('\n')
2019-07-23 20:41:26 +08:00
Yiqing-Zhou
bef0c629ca
fix
...
Remove '\n' before adding token into vocab
2019-07-22 22:30:49 +08:00
Yiqing-Zhou
897d0841be
read().splitlines() -> readlines()
...
splitlines() does not work as what we expect here for bert-base-chinese because there is a '\u2028' (unicode line seperator) token in vocab file. Value of '\u2028'.splitlines() is ['', ''].
Perhaps we should use readlines() instead.
2019-07-22 20:49:09 +08:00
thomwolf
15d8b1266c
update tokenizer - update squad example for xlnet
2019-07-15 17:30:42 +02:00
thomwolf
ab49fafc04
update tokenization docstrings for #328
2019-07-15 12:51:23 +02:00
thomwolf
a9ab15174c
fix #328
2019-07-15 12:42:12 +02:00
thomwolf
e468192e2f
Merge branch 'pytorch-transformers' into xlnet
2019-07-09 17:05:37 +02:00
thomwolf
d5481cbe1b
adding tests to examples - updating summary module - coverage update
2019-07-09 15:29:42 +02:00
thomwolf
b19786985d
unified tokenizer api and serialization + tests
2019-07-09 10:25:18 +02:00
thomwolf
36bca545ff
tokenization abstract class - tests for examples
2019-07-05 15:02:59 +02:00
thomwolf
0bab55d5d5
[BIG] name change
2019-07-05 11:55:36 +02:00