Commit Graph

1850 Commits

Author SHA1 Message Date
Rémi Louf
df85a0ff0b replace double quotes with simple quotes 2019-10-10 11:38:26 +02:00
Rémi Louf
9ca788b2e8 merge the two Bert layers classes 2019-10-10 11:33:28 +02:00
Rémi Louf
edfc8f8225 Remove and do the branching in 2019-10-10 10:17:27 +02:00
Rémi Louf
09cfd12235 remove and do the branching in 2019-10-10 10:15:27 +02:00
Rémi Louf
877ef2c6ca override from_pretrained in Bert2Rnd
In the seq2seq model we need to both load pretrained weights in the
encoder and initialize the decoder randomly. Because the
`from_pretrained` method defined in the base class relies on module
names to assign weights, it would also initialize the decoder with
pretrained weights. To avoid this we override the method to only
initialize the encoder with pretrained weights.
2019-10-10 10:02:18 +02:00
Rémi Louf
851ef592c5 add comment on recursive weights loading 2019-10-10 10:02:03 +02:00
Rémi Louf
770b15b58c rename class in __init__ 2019-10-08 17:32:28 +02:00
Rémi Louf
61ed889005 remove old seq2seq file 2019-10-08 16:30:58 +02:00
Rémi Louf
8abfee9ec3 rename Bert2Bert -> Bert2Rnd 2019-10-08 16:30:58 +02:00
Rémi Louf
82628b0fc9 add a placeholder test 2019-10-08 16:30:58 +02:00
Rémi Louf
0700983090 Add BertDecoderModel and Bert2Bert classes
I am not sure what happens when the class is initialized with the
pretrained weights.
2019-10-08 16:30:58 +02:00
Rémi Louf
75feacf172 add general structure for Bert2Bert class 2019-10-08 16:30:58 +02:00
Rémi Louf
15a2fc88a6 add General attention classes
The modifications that I introduced in a previous commit did break
Bert's internal API. I reverted these changes and added more general
classes to handle the encoder-decoder attention case.

There may be a more elegant way to deal with retro-compatibility (I am
not comfortable with the current state of the code), but I cannot see it
right now.
2019-10-08 16:30:58 +02:00
Rémi Louf
cd6a59d5c1 add a decoder layer for Bert 2019-10-08 16:30:58 +02:00
Rémi Louf
a0dcefa382 generalize BertSelfAttention to take separate query, key, value
There is currently no way to specify the quey, key and value separately
in the Attention module. However, the decoder's "encoder-decoder
attention" layers take the decoder's last output as a query, the
encoder's states as key and value. We thus modify the existing code so
query, key and value can be added separately.

This obviously poses some naming conventions; `BertSelfAttention` is not
a self-attention module anymore. The way the residual is forwarded is
now awkard, etc. We will need to do some refacto once the decoder is
fully implemented.
2019-10-07 17:53:58 +02:00
Rémi Louf
31adbb247c add class wireframes for Bert decoder 2019-10-07 16:43:21 +02:00
Rémi Louf
dda1adad6d rename BertLayer to BertEncoderLayer 2019-10-07 16:31:46 +02:00
Rémi Louf
0053c0e052 do some (light) housekeeping
Several packages were imported but never used, indentation and line
spaces did not follow PEP8.
2019-10-07 16:29:15 +02:00
Rémi Louf
386e86e222 raise exception when class initialized with __init__ 2019-10-07 13:00:06 +02:00
Rémi Louf
4446c02b8a add wireframe for seq2seq model 2019-10-07 12:04:05 +02:00
Christopher Goh
904158ac4d Rephrase forward method to reduce ambiguity 2019-10-06 23:40:52 -04:00
Christopher Goh
0f65d8cbbe Fix some typos in README 2019-10-06 23:40:52 -04:00
LysandreJik
f3e0218fbb Correct device assignment in run_generation 2019-10-05 21:05:16 -04:00
VictorSanh
0820bb0555 unecessary carriage return 2019-10-04 17:23:15 -04:00
VictorSanh
f5891c3821 run_squad --> run_squad_w_distillation 2019-10-04 17:23:15 -04:00
VictorSanh
764a7923ec add distillation+finetuning option in run_squad 2019-10-04 17:23:15 -04:00
Lysandre Debut
bb464289ce New model addition issue template 2019-10-04 16:41:26 -04:00
LysandreJik
7bddb45a6f Decode documentaton 2019-10-04 14:27:38 -04:00
Thomas Wolf
b3cfd97946
Merge pull request #1373 from TimYagan/fix-css
Fixed critical css font-family issues
2019-10-03 19:04:02 -04:00
Lysandre Debut
81a1e12469
Merge pull request #1313 from enzoampil/master
Add option to use a 'stop token'
2019-10-03 22:43:57 +00:00
Lysandre Debut
d3f24dfad7
Merge branch 'master' into master 2019-10-03 22:43:09 +00:00
LysandreJik
ecc4f1bdfa XLM use_lang_embedding flag in run_generation 2019-10-03 17:42:16 -04:00
LysandreJik
c2c2ca0fdb Added XLM to run_generation, with prompt language selection. 2019-10-03 17:18:48 -04:00
Thomas Wolf
1569610f2d
Merge pull request #1296 from danai-antoniou/add-duplicate-tokens-error
Added ValueError for duplicates in list of added tokens
2019-10-03 17:06:17 -04:00
drc10723
e1b2949ae6 DistillBert Documentation Code Example fixes 2019-10-03 15:51:33 -04:00
VictorSanh
e2ae9c0b73 fix links in doc index 2019-10-03 11:42:21 -04:00
Brian Ma
7af0777910 Update run_glue.py
add DistilBert model shortcut into ALL_MODELS
2019-10-03 15:31:11 +00:00
VictorSanh
c1689ac301 fix name 2019-10-03 10:56:39 -04:00
VictorSanh
4a790c40b1 update doc for distil* 2019-10-03 10:54:02 -04:00
VictorSanh
6be46a6e64 update links to new weights 2019-10-03 10:27:11 -04:00
VictorSanh
5f07d8f11a prepare release 2019-10-03 10:27:11 -04:00
VictorSanh
35071007cb incoming release 🔥 update links to arxiv preprint 2019-10-03 10:27:11 -04:00
VictorSanh
f1f23ad171 fix buf in convert_pt_chkpt_to_tf2 2019-10-03 10:27:11 -04:00
VictorSanh
2a91f6071f upddate README - TODO updadte link to paper 2019-10-03 10:27:11 -04:00
VictorSanh
c51e533a5f update train.py 2019-10-03 10:27:11 -04:00
VictorSanh
a76c3f9cb0 update requirements 2019-10-03 10:27:11 -04:00
VictorSanh
bb9c5ead54 update distiller 2019-10-03 10:27:11 -04:00
VictorSanh
a12ab0a8db update binarized_data 2019-10-03 10:27:11 -04:00
VictorSanh
4d6dfbd376 update extract 2019-10-03 10:27:11 -04:00
VictorSanh
23edebc079 update extract_distilbert 2019-10-03 10:27:11 -04:00