thomwolf
5c85fc3977
fix typo - logger info
2019-03-06 10:05:21 +01:00
Catalin Voss
9775b2eb27
Allow tokenization of sequences > 512 for caching
...
For many applications requiring randomized data access, it's easier to cache the tokenized representations than the words. So why not turn this into a warning?
2019-03-02 16:30:21 -08:00
thomwolf
c6bea08448
OpenAI GPT Tokenizer can fallback on using BERT BasicTokenizer
2019-02-13 10:11:00 +01:00
thomwolf
b514a60c36
added tests for OpenAI GPT and Transformer-XL tokenizers
2019-02-11 10:17:16 +01:00
thomwolf
f99f2fb661
docstrings
2019-02-07 17:07:22 +01:00
thomwolf
448937c00d
python 2 compatibility
2019-02-06 00:07:46 +01:00
thomwolf
6179f537a3
clean up tokenization spaces
2019-02-04 17:41:22 +01:00
thomwolf
850da1cc36
strip decoded outputs
2019-02-04 17:35:05 +01:00
thomwolf
01a3966bc6
more options on special tokens
2019-02-04 17:26:25 +01:00
thomwolf
05f961840b
logging
2019-02-04 13:06:19 +01:00
thomwolf
d77dd62ff8
directly load from TF checkpoints + code cleanup
2019-01-28 16:50:23 +01:00
thomwolf
3cf12b235a
added tests + fixed losses
2019-01-08 16:24:23 +01:00
thomwolf
eed51c5bdf
add OpenAI GPT
2019-01-08 12:26:58 +01:00
thomwolf
93f563b8a8
adding OpenAI GPT
2019-01-07 12:55:36 +01:00