Commit Graph

22 Commits

Author SHA1 Message Date
thomwolf
1d8c232324 Fix #436 2019-04-03 10:51:03 +02:00
thomwolf
5c85fc3977 fix typo - logger info 2019-03-06 10:05:21 +01:00
Thomas Wolf
477ec4b6cc
Merge pull request #337 from CatalinVoss/patch-2
Allow tokenization of sequences > 512 for caching
2019-03-06 09:45:49 +01:00
Catalin Voss
4a49c22584 Warn instead of raising in BERT and GPT-2 tokenizers as well, to allow for pre-caching of tokens 2019-03-05 12:31:45 -08:00
John Hewitt
4d1ad83236 update docstring of BERT tokenizer to reflect do_wordpiece_only 2019-02-27 14:50:41 -08:00
John Hewitt
e14c6b52e3 add BertTokenizer flag to skip basic tokenization 2019-02-26 20:11:24 -08:00
Yongbo Wang
813e4d18ba
typo 2019-02-20 21:10:07 +08:00
thomwolf
edcb56fd96 more explicit variable name 2019-02-08 09:54:49 +01:00
Thomas Wolf
848aae49e1
Merge branch 'master' into python_2 2019-02-06 00:13:20 +01:00
thomwolf
448937c00d python 2 compatibility 2019-02-06 00:07:46 +01:00
WrRan
3f60a60eed text in never_split should not lowercase 2019-01-08 13:33:57 +08:00
WrRan
751beb9e73 never split some text 2019-01-08 10:54:51 +08:00
Thomas Wolf
7fb94ab934
Merge pull request #127 from patrick-s-h-lewis/tokenizer-error-on-long-seqs
raises value error for bert tokenizer for long sequences
2018-12-19 10:29:17 +01:00
Julien Chaumond
d57763f582 Fix typos 2018-12-18 19:23:22 -05:00
Patrick Lewis
78cf7b4ab4 added code to raise value error for bert tokenizer for covert_tokens_to_indices 2018-12-18 14:41:30 +00:00
thomwolf
4a4b0e5783 remove logging. basicConfig from library code 2018-12-14 14:46:25 +01:00
thomwolf
d6f06c03f4 fixed loading pre-trained tokenizer from directory 2018-11-30 14:09:06 +01:00
thomwolf
298107fed7 Added new bert models 2018-11-30 13:56:02 +01:00
thomwolf
32167cdf4b remove convert_to_unicode and printable_text from examples 2018-11-26 23:33:22 +01:00
thomwolf
982339d829 fixing unicode error 2018-11-23 12:22:12 +01:00
weiyumou
37b6c9b21b Fixed UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128) 2018-11-19 23:01:28 -05:00
thomwolf
1de35b624b preparing for first release 2018-11-15 20:56:10 +01:00