transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Rémi Louf	cb26b035c6	remove potential UndefinedError	2019-10-28 10:49:49 +01:00
Rémi Louf	b915ba9dfe	pad sequence with 0, mask with -1	2019-10-28 10:49:49 +01:00
Rémi Louf	dc580dd4c7	add lm_labels for the LM cross-entropy	2019-10-28 10:49:49 +01:00
Rémi Louf	f873a3edb2	the decoder attends to the output of the encoder stack (last layer)	2019-10-28 10:49:00 +01:00
thomwolf	56e2ee4ead	fix model2model	2019-10-17 16:33:31 +02:00
thomwolf	8cd56e3036	fix data processing in script	2019-10-17 16:33:26 +02:00
Rémi Louf	578d23e061	add training pipeline (formatting temporary)	2019-10-17 14:02:27 +02:00
Rémi Louf	47a06d88a0	use two different tokenizers for storyand summary	2019-10-17 13:04:26 +02:00
Rémi Louf	bfb9b540d4	add Model2Model to __init__	2019-10-17 12:59:51 +02:00
Rémi Louf	c1bc709c35	correct the truncation and padding of dataset	2019-10-17 10:41:53 +02:00
Rémi Louf	87d60b6e19	reword explanation of encoder_attention_mask	2019-10-17 10:18:19 +02:00
Rémi Louf	638fe7f5a4	correct composition of padding and causal masks	2019-10-17 10:13:07 +02:00
Rémi Louf	4e0f24348f	document the MLM modification + raise exception on MLM training with encoder-decoder	2019-10-17 09:41:53 +02:00
Rémi Louf	624a5644cc	revert black formatting to conform with lib style	2019-10-17 09:27:56 +02:00
Rémi Louf	9b71fc9a18	tying weights is going to be a clusterfuck	2019-10-16 21:31:38 +02:00
Rémi Louf	95ec1d08be	separate inputs into encoder & decoder inputs	2019-10-16 20:55:42 +02:00
Rémi Louf	e4e0ee14bd	add separator between data import and train	2019-10-16 20:05:32 +02:00
Rémi Louf	a424892fab	correct syntax error: dim() and not dims()	2019-10-16 18:24:32 +02:00
Rémi Louf	33c01368b1	remove Bert2Rnd test	2019-10-16 18:13:05 +02:00
Rémi Louf	0752069617	adapt attention masks for the decoder case The introduction of a decoder introduces 2 changes: - We need to be able to specify a separate mask in the cross attention to mask the positions corresponding to padding tokens in the encoder state. - The self-attention in the decoder needs to be causal on top of not attending to padding tokens.	2019-10-16 16:12:22 +02:00
Rémi Louf	c5a94a6100	fix function that defines masks in XLM the definition of `get_masks` would blow with the proper combination of arguments. It was just a matter of moving a definition outside of a control structure.	2019-10-16 13:00:32 +02:00
Rémi Louf	488a664151	add `is_decoder` attribute to `PretrainedConfig` We currenctly instantiate encoders and decoders for the seq2seq by passing the `is_decoder` keyword argument to the `from_pretrained` classmethod. On the other hand, the model class looks for the value of the `is_decoder` attribute in its config. In order for the value to propagate from the kwarg to the configuration we simply need to define `is_decoder` as an attribute to the base `PretrainedConfig`, with a default at `False`.	2019-10-15 21:03:32 +02:00
Rémi Louf	4c81960b9b	comment the seq2seq functions	2019-10-15 20:52:28 +02:00
Rémi Louf	6d6c326737	take path to pretrained for encoder and decoder for init	2019-10-15 16:08:27 +02:00
Rémi Louf	0d81fc853e	specify in readme that both datasets are required	2019-10-15 15:26:33 +02:00
Rémi Louf	19e9964780	remove Bert2Bert from module declaration	2019-10-15 15:20:28 +02:00
Rémi Louf	1aec940587	test the full story processing	2019-10-15 15:18:07 +02:00
Rémi Louf	22e1af6859	truncation function is fully tested	2019-10-15 14:43:50 +02:00
Rémi Louf	260ac7d9a8	wip commit, switching computers	2019-10-15 12:24:35 +02:00
Rémi Louf	fe25eefc15	add instructions to fetch the dataset	2019-10-14 20:45:39 +02:00
Rémi Louf	412793275d	delegate the padding with special tokens to the tokenizer	2019-10-14 20:45:16 +02:00
Rémi Louf	447fffb21f	process the raw CNN/Daily Mail dataset the data provided by Li Dong et al. were already tokenized, which means that they are not compatible with all the models in the library. We thus process the raw data directly and tokenize them using the models' tokenizers.	2019-10-14 18:12:20 +02:00
Rémi Louf	67d10960ae	load and prepare CNN/Daily Mail data We write a function to load an preprocess the CNN/Daily Mail dataset as provided by Li Dong et al. The issue is that this dataset has already been tokenized by the authors, so we actually need to find the original, plain-text dataset if we want to apply it to all models.	2019-10-14 14:11:20 +02:00
thomwolf	d9d387afce	clean up	2019-10-14 12:14:40 +02:00
thomwolf	b7141a1bc6	maxi simplication	2019-10-14 12:14:08 +02:00
thomwolf	bfbe68f035	update forward pass	2019-10-14 12:04:23 +02:00
thomwolf	0ef9bc923a	Cleaning up seq2seq [WIP]	2019-10-14 11:58:13 +02:00
Rémi Louf	b3261e7ace	read parameters from CLI, load model & tokenizer	2019-10-11 18:40:38 +02:00
Rémi Louf	d889e0b71b	add base for seq2seq finetuning	2019-10-11 17:36:12 +02:00
Rémi Louf	f8e98d6779	load pretrained embeddings in Bert decoder In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence Generation Tasks", Bert2Bert is initialized with pre-trained weights for the encoder, and only pre-trained embeddings for the decoder. The current version of the code completely randomizes the weights of the decoder. We write a custom function to initiliaze the weights of the decoder; we first initialize the decoder with the weights and then randomize everything but the embeddings.	2019-10-11 16:48:11 +02:00
Rémi Louf	1e68c28670	add test for initialization of Bert2Rnd	2019-10-10 18:07:11 +02:00
Rémi Louf	fa218e648a	fix syntax errors	2019-10-10 15:16:07 +02:00
Rémi Louf	3e1cd8241e	fix stupid (re)naming issue	2019-10-10 14:18:20 +02:00
Rémi Louf	81ee29ee8d	remove the staticmethod used to load the config	2019-10-10 14:13:37 +02:00
Rémi Louf	d7092d592c	rename the attributes in the Bert Layer Since the preloading of weights relies on the name of the class's attributes changing the namespace breaks loading pretrained weights on Bert and all related models. I reverted `self_attention` to `attention` and us `crossattention` for the decoder instead.	2019-10-10 12:51:14 +02:00
Rémi Louf	51261167b4	prune both attention and self-attention heads	2019-10-10 12:17:22 +02:00
Rémi Louf	17177e7379	add is_decoder as an attribute to Config class	2019-10-10 12:03:58 +02:00
Rémi Louf	df85a0ff0b	replace double quotes with simple quotes	2019-10-10 11:38:26 +02:00
Rémi Louf	9ca788b2e8	merge the two Bert layers classes	2019-10-10 11:33:28 +02:00
Rémi Louf	edfc8f8225	Remove and do the branching in	2019-10-10 10:17:27 +02:00

1 2 3 4 5 ...

1897 Commits