transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 21:30:07 +06:00

Author	SHA1	Message	Date
thomwolf	b3c6ee0ac1	tokenization updates	2019-04-15 14:24:52 +02:00
thomwolf	870b734bfd	added tokenizers serialization tests	2019-04-15 12:03:56 +02:00
thomwolf	3e65f255dc	add serialization semantics to tokenizers - fix transfo-xl tokenizer	2019-04-15 11:47:25 +02:00
thomwolf	1d8c232324	Fix #436	2019-04-03 10:51:03 +02:00
thomwolf	5c85fc3977	fix typo - logger info	2019-03-06 10:05:21 +01:00
Thomas Wolf	477ec4b6cc	Merge pull request #337 from CatalinVoss/patch-2 Allow tokenization of sequences > 512 for caching	2019-03-06 09:45:49 +01:00
Catalin Voss	4a49c22584	Warn instead of raising in BERT and GPT-2 tokenizers as well, to allow for pre-caching of tokens	2019-03-05 12:31:45 -08:00
John Hewitt	4d1ad83236	update docstring of BERT tokenizer to reflect do_wordpiece_only	2019-02-27 14:50:41 -08:00
John Hewitt	e14c6b52e3	add BertTokenizer flag to skip basic tokenization	2019-02-26 20:11:24 -08:00
Yongbo Wang	813e4d18ba	typo	2019-02-20 21:10:07 +08:00
thomwolf	edcb56fd96	more explicit variable name	2019-02-08 09:54:49 +01:00
Thomas Wolf	848aae49e1	Merge branch 'master' into python_2	2019-02-06 00:13:20 +01:00
thomwolf	448937c00d	python 2 compatibility	2019-02-06 00:07:46 +01:00
WrRan	3f60a60eed	text in never_split should not lowercase	2019-01-08 13:33:57 +08:00
WrRan	751beb9e73	never split some text	2019-01-08 10:54:51 +08:00
Thomas Wolf	7fb94ab934	Merge pull request #127 from patrick-s-h-lewis/tokenizer-error-on-long-seqs raises value error for bert tokenizer for long sequences	2018-12-19 10:29:17 +01:00
Julien Chaumond	d57763f582	Fix typos	2018-12-18 19:23:22 -05:00
Patrick Lewis	78cf7b4ab4	added code to raise value error for bert tokenizer for covert_tokens_to_indices	2018-12-18 14:41:30 +00:00
thomwolf	4a4b0e5783	remove logging. basicConfig from library code	2018-12-14 14:46:25 +01:00
thomwolf	d6f06c03f4	fixed loading pre-trained tokenizer from directory	2018-11-30 14:09:06 +01:00
thomwolf	298107fed7	Added new bert models	2018-11-30 13:56:02 +01:00
thomwolf	32167cdf4b	remove convert_to_unicode and printable_text from examples	2018-11-26 23:33:22 +01:00
thomwolf	982339d829	fixing unicode error	2018-11-23 12:22:12 +01:00
weiyumou	37b6c9b21b	Fixed UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128)	2018-11-19 23:01:28 -05:00
thomwolf	1de35b624b	preparing for first release	2018-11-15 20:56:10 +01:00

25 Commits