transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-17 03:28:22 +06:00

Author	SHA1	Message	Date
Yiqing-Zhou	b1019d2a8e	token[-1] -> token.rstrip('\n')	2019-07-23 20:41:26 +08:00
Yiqing-Zhou	bef0c629ca	fix Remove '\n' before adding token into vocab	2019-07-22 22:30:49 +08:00
Yiqing-Zhou	897d0841be	read().splitlines() -> readlines() splitlines() does not work as what we expect here for bert-base-chinese because there is a '\u2028' (unicode line seperator) token in vocab file. Value of '\u2028'.splitlines() is ['', '']. Perhaps we should use readlines() instead.	2019-07-22 20:49:09 +08:00
thomwolf	15d8b1266c	update tokenizer - update squad example for xlnet	2019-07-15 17:30:42 +02:00
thomwolf	ab49fafc04	update tokenization docstrings for #328	2019-07-15 12:51:23 +02:00
thomwolf	a9ab15174c	fix #328	2019-07-15 12:42:12 +02:00
thomwolf	e468192e2f	Merge branch 'pytorch-transformers' into xlnet	2019-07-09 17:05:37 +02:00
thomwolf	d5481cbe1b	adding tests to examples - updating summary module - coverage update	2019-07-09 15:29:42 +02:00
thomwolf	b19786985d	unified tokenizer api and serialization + tests	2019-07-09 10:25:18 +02:00
thomwolf	36bca545ff	tokenization abstract class - tests for examples	2019-07-05 15:02:59 +02:00
thomwolf	0bab55d5d5	[BIG] name change	2019-07-05 11:55:36 +02:00