transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-17 03:28:22 +06:00

Author	SHA1	Message	Date
thomwolf	5c85fc3977	fix typo - logger info	2019-03-06 10:05:21 +01:00
Catalin Voss	9775b2eb27	Allow tokenization of sequences > 512 for caching For many applications requiring randomized data access, it's easier to cache the tokenized representations than the words. So why not turn this into a warning?	2019-03-02 16:30:21 -08:00
thomwolf	c6bea08448	OpenAI GPT Tokenizer can fallback on using BERT BasicTokenizer	2019-02-13 10:11:00 +01:00
thomwolf	b514a60c36	added tests for OpenAI GPT and Transformer-XL tokenizers	2019-02-11 10:17:16 +01:00
thomwolf	f99f2fb661	docstrings	2019-02-07 17:07:22 +01:00
thomwolf	448937c00d	python 2 compatibility	2019-02-06 00:07:46 +01:00
thomwolf	6179f537a3	clean up tokenization spaces	2019-02-04 17:41:22 +01:00
thomwolf	850da1cc36	strip decoded outputs	2019-02-04 17:35:05 +01:00
thomwolf	01a3966bc6	more options on special tokens	2019-02-04 17:26:25 +01:00
thomwolf	05f961840b	logging	2019-02-04 13:06:19 +01:00
thomwolf	d77dd62ff8	directly load from TF checkpoints + code cleanup	2019-01-28 16:50:23 +01:00
thomwolf	3cf12b235a	added tests + fixed losses	2019-01-08 16:24:23 +01:00
thomwolf	eed51c5bdf	add OpenAI GPT	2019-01-08 12:26:58 +01:00
thomwolf	93f563b8a8	adding OpenAI GPT	2019-01-07 12:55:36 +01:00