transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-03 03:31:05 +06:00

Author	SHA1	Message	Date
thomwolf	0f091062d4	Merge branch 'glue-example' into tf2	2019-09-25 10:21:52 +02:00
thomwolf	c4acc3a8e9	let encode accept tensor inputs	2019-09-25 10:19:14 +02:00
thomwolf	a6981076ec	various updates	2019-09-24 16:46:26 +02:00
LysandreJik	d340e2329e	create_mask_from_sequences -> create_token_type_ids_from_sequences	2019-09-24 09:09:28 -04:00
LysandreJik	c832f43a4d	`output_token_type` -> `token_type_ids`	2019-09-24 07:21:38 -04:00
LysandreJik	0ea82b246f	Updated tests	2019-09-24 07:10:09 -04:00
LysandreJik	9d44236f70	Updated DistilBERT	2019-09-24 07:03:24 -04:00
LysandreJik	ab984a8b72	Python 2 compatibility	2019-09-19 15:01:33 +02:00
LysandreJik	3df208c93a	Tokenizer accepts token list as well as string	2019-09-19 14:47:52 +02:00
LysandreJik	66ea76b8a9	prepare_for_model and prepare_pair_for_model methods. Added an option to select which sequence will be truncated.	2019-09-19 13:50:51 +02:00
LysandreJik	baa74326ab	Stride + tests + small fixes	2019-09-19 10:55:06 +02:00
LysandreJik	c10c7d59e7	Mask computing in standalone method. Tests.	2019-09-19 10:55:06 +02:00
LysandreJik	bf503158c5	Sentence -> Sequence. Removed output_mask from the special token addition methods.	2019-09-19 10:55:06 +02:00
LysandreJik	8cba057260	Doc + remove artefacts	2019-09-19 10:55:06 +02:00
LysandreJik	dcc9bb3252	Modified encode to return only lists. Added a more complete encode_plus method	2019-09-19 10:55:06 +02:00
LysandreJik	af23b626c8	Max encoding length + corresponding tests	2019-09-19 10:55:06 +02:00
LysandreJik	d572d7027b	Number of added tokens calculator	2019-09-19 10:55:06 +02:00
LysandreJik	2d8ec5a684	Changed warning to be more explicit Co-authored by: julien_c <chaumond@gmail.com>	2019-09-19 10:55:06 +02:00
LysandreJik	c3df2136e1	Added binary masking tests	2019-09-19 10:55:06 +02:00
LysandreJik	e391d4735e	Tokenizers' encode function can output binary masks	2019-09-19 10:55:06 +02:00
thomwolf	5c6cac102b	adding test for common properties and cleaning up a bit base class	2019-09-05 21:31:29 +02:00
maru0kun	d737947725	Fix typo	2019-09-05 19:24:57 +09:00
LysandreJik	31d3373bc9	Appends space before special token	2019-09-01 21:07:00 -04:00
thomwolf	fede4ef45d	fixing #1133	2019-09-02 02:27:39 +02:00
Thomas Wolf	f3d18c71ec	Merge pull request #1152 from epwalsh/fix-special-tokens fix adding special tokens	2019-08-30 23:21:59 +02:00
Thomas Wolf	d483cd8e46	Merge pull request #1074 from huggingface/improved_testing Shortcut to special tokens' ids - fix GPT2 & RoBERTa tokenizers - improved testing for GPT/GPT-2	2019-08-30 23:18:58 +02:00
Thomas Wolf	cd65c41a83	Merge branch 'master' into xlm-tokenization	2019-08-30 17:15:16 +02:00
thomwolf	69da972ace	added test and debug tokenizer configuration serialization	2019-08-30 17:09:36 +02:00
thomwolf	88111de07c	saving and reloading tokenizer configurations	2019-08-30 16:55:48 +02:00
thomwolf	82462c5cba	Added option to setup pretrained tokenizer arguments	2019-08-30 15:30:41 +02:00
Thomas Wolf	50e615f43d	Merge branch 'master' into improved_testing	2019-08-30 13:40:35 +02:00
thomwolf	8faf2e086b	more doc on special tokens	2019-08-30 13:36:22 +02:00
thomwolf	d51f72d5de	adding shortcut to the ids of all the special tokens	2019-08-30 11:41:11 +02:00
epwalsh	07e21307b6	fix adding special tokens	2019-08-29 13:44:50 -07:00
Thomas Wolf	5f297c7be3	Merge pull request #1087 from huggingface/fix-warnings Decode now calls private property instead of public method	2019-08-28 22:22:11 +02:00
Thomas Wolf	0ecfd17f49	Merge pull request #987 from huggingface/generative-finetuning Generative finetuning	2019-08-28 16:51:50 +02:00
thomwolf	a175a9dc01	add kwargs to base encode function	2019-08-27 14:05:59 +02:00
LysandreJik	529a16dec6	Generic encoding implementation.	2019-08-26 15:00:43 -04:00
thomwolf	3bcbebd440	max_len_single_sentence & max_len_sentences_pair as attributes so they can be modified	2019-08-23 22:07:26 +02:00
thomwolf	47d6853439	adding max_lengths for single sentences and sentences pairs	2019-08-23 17:31:11 +02:00
Abhishek Rao	c603d099aa	reraise EnvironmentError in from_pretrained functions of Model and Tokenizer	2019-08-22 15:25:40 -07:00
LysandreJik	2ba1a14fb0	Decode now calls private property instead of public method	2019-08-22 17:25:55 -04:00
Thomas Wolf	e4515faf54	Merge pull request #1057 from huggingface/fixes Add a few of typos corrections, bugs fixes and small improvements	2019-08-21 01:54:05 +02:00
Thomas Wolf	d30cbaf5dc	Merge branch 'master' into iterative_split_on_token	2019-08-21 01:33:02 +02:00
thomwolf	43489756ad	adding proxies options for the from_pretrained methods	2019-08-20 16:59:11 +02:00
thomwolf	fecaed0ed4	add force_download option to from_pretrained methods	2019-08-20 10:56:12 +02:00
LysandreJik	572dcfd1db	Doc	2019-08-14 14:56:14 -04:00
samvelyan	9ce36e3e4b	Re-implemented tokenize() iteratively in PreTrainedTokenizer.	2019-08-14 08:57:09 +00:00
LysandreJik	3d87991f60	Fixed error with encoding	2019-08-13 12:00:24 -04:00
LysandreJik	22ac004a7c	Added documentation and changed parameters for special_tokens_sentences_pair.	2019-08-12 15:13:53 -04:00

1 2

70 Commits