transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-15 10:38:23 +06:00

Author	SHA1	Message	Date
Masatoshi Suzuki	48c6c6139f	Support additional dictionaries for BERT Japanese tokenizers (#6515 ) * Update BERT Japanese tokenizers * Update CircleCI config to download unidic * Specify to use the latest dictionary packages	2020-08-17 12:00:23 +08:00
Sam Shleifer	13deb95a40	Move tests/utils.py -> transformers/testing_utils.py (#5350 )	2020-07-01 10:31:17 -04:00
Anthony MOI	36434220fc	[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510 ) * Use tokenizers pre-tokenized pipeline * failing pretrokenized test * Fix is_pretokenized in python * add pretokenized tests * style and quality * better tests for batched pretokenized inputs * tokenizers clean up - new padding_strategy - split the files * [HUGE] refactoring tokenizers - padding - truncation - tests * style and quality * bump up requied tokenizers version to 0.8.0-rc1 * switched padding/truncation API - simpler better backward compat * updating tests for custom tokenizers * style and quality - tests on pad * fix QA pipeline * fix backward compatibility for max_length only * style and quality * Various cleans up - add verbose * fix tests * update docstrings * Fix tests * Docs reformatted * __call__ method documented Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2020-06-15 17:12:51 -04:00
Julien Chaumond	d4c2cb402d	Kill model archive maps (#4636 ) * Kill model archive maps * Fixup * Also kill model_archive_map for MaskedBertPreTrainedModel * Unhook config_archive_map * Tokenizers: align with model id changes * make style && make quality * Fix CI	2020-06-02 09:39:33 -04:00
Julien Chaumond	865d4d595e	[ci] Close #4481	2020-05-20 18:27:42 -04:00
Sam Shleifer	07dd7c2fd8	[cleanup] test_tokenization_common.py (#4390 )	2020-05-19 10:46:55 -04:00
Yohei Tamura	8594dd80dd	BertJapaneseTokenizer accept options for mecab (#3566 ) * BertJapaneseTokenizer accept options for mecab * black * fix mecab_option to Option[str]	2020-04-03 11:12:19 -04:00
Julien Chaumond	83a41d39b3	💄 super	2020-01-15 18:33:50 -05:00
alberduris	81d6841b4b	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
alberduris	dd4df80f0b	Moved the encoded_prompts to correct device	2020-01-06 15:11:12 +01:00
Aymeric Augustin	1c62e87b34	Use built-in open(). On Python 3, `open is io.open`.	2019-12-22 18:38:56 +01:00
Aymeric Augustin	c824d15aa1	Remove __future__ imports.	2019-12-22 17:47:54 +01:00
Aymeric Augustin	00204f2b4c	Replace CommonTestCases for tokenizers with a mixin. This is the same change as for (TF)CommonTestCases for modeling.	2019-12-22 15:35:25 +01:00
Aymeric Augustin	a3c5883f2c	Rename file for consistency.	2019-12-22 15:35:25 +01:00
Aymeric Augustin	ced0a94204	Switch test files to the standard test_*.py scheme.	2019-12-22 14:15:13 +01:00

15 Commits