transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-23 14:29:01 +06:00

Author	SHA1	Message	Date
Morgan Funtowicz	63e36007ee	Make sure padding, cls and another non-context tokens cannot appear in the answer.	2019-12-10 16:47:35 +01:00
Morgan Funtowicz	40a39ab650	Reuse recent SQuAD refactored data structure inside QA pipelines.	2019-12-10 15:59:38 +01:00
Morgan Funtowicz	aae74065df	Added QuestionAnsweringPipeline unit tests.	2019-12-10 13:37:20 +01:00
Morgan Funtowicz	a7d3794a29	Remove token_type_ids for compatibility with DistilBert	2019-12-10 13:37:20 +01:00
Morgan Funtowicz	fe0f552e00	Use attention_mask everywhere.	2019-12-10 13:37:20 +01:00
Morgan Funtowicz	348e19aa21	Expose attention_masks and input_lengths arguments to batch_encode_plus	2019-12-10 13:37:18 +01:00
Morgan Funtowicz	c2407fdd88	Enable the Tensorflow backend.	2019-12-10 13:37:14 +01:00
Morgan Funtowicz	f116cf599c	Allow hidding frameworks through environment variables (NO_TF, NO_TORCH).	2019-12-10 13:37:07 +01:00
Morgan Funtowicz	6e61e06051	batch_encode_plus generates the encoder_attention_mask to avoid attending over padded values.	2019-12-10 13:37:07 +01:00
Morgan Funtowicz	02110485b0	Added batching, topk, chars index and scores.	2019-12-10 13:36:55 +01:00
Morgan Funtowicz	e1d89cb24d	Added QuestionAnsweringPipeline with batch support.	2019-12-10 13:36:55 +01:00
Morgan Funtowicz	81babb227e	Added download command through the cli. It allows to predownload models and tokenizers.	2019-12-10 12:18:59 +01:00
thomwolf	31a3a73ee3	updating CLI	2019-12-10 12:18:59 +01:00
thomwolf	7c1697562a	compatibility with sklearn and keras	2019-12-10 12:12:22 +01:00
thomwolf	b81ab431f2	updating AutoModels and AutoConfiguration - adding pipelines	2019-12-10 12:11:33 +01:00
thomwolf	2d8559731a	add pipeline - train	2019-12-10 11:34:16 +01:00
thomwolf	72c36b9ea2	[WIP] - CLI	2019-12-10 11:33:14 +01:00
Thomas Wolf	e57d00ee10	Merge pull request #1984 from huggingface/squad-refactor [WIP] Squad refactor	2019-12-10 11:07:26 +01:00
Thomas Wolf	ecabbf6d28	Merge pull request #2107 from huggingface/encoder-mask-shape create encoder attention mask from shape of hidden states	2019-12-10 10:07:56 +01:00
Julien Chaumond	1d18930462	Harmonize `no_cuda` flag with other scripts	2019-12-09 20:37:55 -05:00
Rémi Louf	f7eba09007	clean for release	2019-12-09 20:37:55 -05:00
Rémi Louf	2a64107e44	improve device usage	2019-12-09 20:37:55 -05:00
Rémi Louf	c0707a85d2	add README	2019-12-09 20:37:55 -05:00
Rémi Louf	ade3cdf5ad	integrate ROUGE	2019-12-09 20:37:55 -05:00
Rémi Louf	076602bdc4	prevent BERT weights from being downloaded twice	2019-12-09 20:37:55 -05:00
Rémi Louf	5909f71028	add py-rouge dependency	2019-12-09 20:37:55 -05:00
Rémi Louf	a1994a71ee	simplified model and configuration	2019-12-09 20:37:55 -05:00
Rémi Louf	3a9a9f7861	default output dir to documents dir	2019-12-09 20:37:55 -05:00
Rémi Louf	693606a75c	update the docs	2019-12-09 20:37:55 -05:00
Rémi Louf	c0443df593	remove beam search	2019-12-09 20:37:55 -05:00
Rémi Louf	2403a66598	give transformers API to BertAbs	2019-12-09 20:37:55 -05:00
Rémi Louf	4d18199902	cast bool tensor to long for pytorch < 1.3	2019-12-09 20:37:55 -05:00
Rémi Louf	9f75565ea8	setup training	2019-12-09 20:37:55 -05:00
Rémi Louf	4735c2af07	tweaks to the BeamSearch API	2019-12-09 20:37:55 -05:00
Rémi Louf	ba089c780b	share pretrained embeddings	2019-12-09 20:37:55 -05:00
Rémi Louf	9660ba1cbd	Add beam search	2019-12-09 20:37:55 -05:00
Rémi Louf	1c71ecc880	load the pretrained weights for encoder-decoder We currently save the pretrained_weights of the encoder and decoder in two separate directories `encoder` and `decoder`. However, for the `from_pretrained` function to operate with automodels we need to specify the type of model in the path to the weights. The path to the encoder/decoder weights is handled by the `PreTrainedEncoderDecoder` class in the `save_pretrained` function. Sice there is no easy way to infer the type of model that was initialized for the encoder and decoder we add a parameter `model_type` to the function. This is not an ideal solution as it is error prone, and the model type should be carried by the Model classes somehow. This is a temporary fix that should be changed before merging.	2019-12-09 20:37:55 -05:00
Rémi Louf	07f4cd73f6	update function to add special tokens Since I started my PR the `add_special_token_single_sequence` function has been deprecated for another; I replaced it with the new function.	2019-12-09 20:37:55 -05:00
Pierric Cistac	5c877fe94a	fix albert links	2019-12-09 18:53:00 -05:00
Bilal Khan	79526f82f5	Remove unnecessary epoch variable	2019-12-09 16:24:35 -05:00
Bilal Khan	9626e0458c	Add functionality to continue training from last saved global_step	2019-12-09 16:24:35 -05:00
Bilal Khan	2d73591a18	Stop saving current epoch	2019-12-09 16:24:35 -05:00
Bilal Khan	0eb973b0d9	Use saved optimizer and scheduler states if available	2019-12-09 16:24:35 -05:00
Bilal Khan	a03fcf570d	Save tokenizer after each epoch to be able to resume training from a checkpoint	2019-12-09 16:24:35 -05:00
Bilal Khan	f71b1bb05a	Save optimizer state, scheduler state and current epoch	2019-12-09 16:24:35 -05:00
LysandreJik	2a4ef098d6	Add ALBERT and XLM to SQuAD script	2019-12-09 10:46:47 -05:00
Lysandre Debut	00c4e39581	Merge branch 'master' into squad-refactor	2019-12-09 10:41:15 -05:00
Rémi Louf	3520be7824	create encoder attention mask from shape of hidden states We currently create encoder attention masks (when they're not provided) based on the shape of the inputs to the encoder. This is obviously wrong; sequences can be of different lengths. We now create the encoder attention mask based on the batch_size and sequence_length of the encoder hidden states.	2019-12-09 11:19:45 +01:00
Aymeric Augustin	0cb163865a	Remove pytest dependency. (#2093 )	2019-12-07 07:46:14 -05:00
Michael Watkins	2670b0d682	Fix bug which lowercases special tokens	2019-12-06 16:15:53 -05:00

1 2 3 4 5 ...

2452 Commits