Thomas Wolf
|
848aae49e1
|
Merge branch 'master' into python_2
|
2019-02-06 00:13:20 +01:00 |
|
thomwolf
|
448937c00d
|
python 2 compatibility
|
2019-02-06 00:07:46 +01:00 |
|
WrRan
|
3f60a60eed
|
text in never_split should not lowercase
|
2019-01-08 13:33:57 +08:00 |
|
WrRan
|
751beb9e73
|
never split some text
|
2019-01-08 10:54:51 +08:00 |
|
Thomas Wolf
|
7fb94ab934
|
Merge pull request #127 from patrick-s-h-lewis/tokenizer-error-on-long-seqs
raises value error for bert tokenizer for long sequences
|
2018-12-19 10:29:17 +01:00 |
|
Julien Chaumond
|
d57763f582
|
Fix typos
|
2018-12-18 19:23:22 -05:00 |
|
Patrick Lewis
|
78cf7b4ab4
|
added code to raise value error for bert tokenizer for covert_tokens_to_indices
|
2018-12-18 14:41:30 +00:00 |
|
thomwolf
|
4a4b0e5783
|
remove logging. basicConfig from library code
|
2018-12-14 14:46:25 +01:00 |
|
thomwolf
|
d6f06c03f4
|
fixed loading pre-trained tokenizer from directory
|
2018-11-30 14:09:06 +01:00 |
|
thomwolf
|
298107fed7
|
Added new bert models
|
2018-11-30 13:56:02 +01:00 |
|
thomwolf
|
32167cdf4b
|
remove convert_to_unicode and printable_text from examples
|
2018-11-26 23:33:22 +01:00 |
|
thomwolf
|
982339d829
|
fixing unicode error
|
2018-11-23 12:22:12 +01:00 |
|
weiyumou
|
37b6c9b21b
|
Fixed UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128)
|
2018-11-19 23:01:28 -05:00 |
|
thomwolf
|
1de35b624b
|
preparing for first release
|
2018-11-15 20:56:10 +01:00 |
|