Rémi Louf
df85a0ff0b
replace double quotes with simple quotes
2019-10-10 11:38:26 +02:00
Rémi Louf
9ca788b2e8
merge the two Bert layers classes
2019-10-10 11:33:28 +02:00
Rémi Louf
edfc8f8225
Remove and do the branching in
2019-10-10 10:17:27 +02:00
Rémi Louf
09cfd12235
remove and do the branching in
2019-10-10 10:15:27 +02:00
Rémi Louf
877ef2c6ca
override from_pretrained
in Bert2Rnd
...
In the seq2seq model we need to both load pretrained weights in the
encoder and initialize the decoder randomly. Because the
`from_pretrained` method defined in the base class relies on module
names to assign weights, it would also initialize the decoder with
pretrained weights. To avoid this we override the method to only
initialize the encoder with pretrained weights.
2019-10-10 10:02:18 +02:00
Rémi Louf
851ef592c5
add comment on recursive weights loading
2019-10-10 10:02:03 +02:00
Rémi Louf
770b15b58c
rename class in __init__
2019-10-08 17:32:28 +02:00
Rémi Louf
61ed889005
remove old seq2seq file
2019-10-08 16:30:58 +02:00
Rémi Louf
8abfee9ec3
rename Bert2Bert -> Bert2Rnd
2019-10-08 16:30:58 +02:00
Rémi Louf
82628b0fc9
add a placeholder test
2019-10-08 16:30:58 +02:00
Rémi Louf
0700983090
Add BertDecoderModel and Bert2Bert classes
...
I am not sure what happens when the class is initialized with the
pretrained weights.
2019-10-08 16:30:58 +02:00
Rémi Louf
75feacf172
add general structure for Bert2Bert class
2019-10-08 16:30:58 +02:00
Rémi Louf
15a2fc88a6
add General attention classes
...
The modifications that I introduced in a previous commit did break
Bert's internal API. I reverted these changes and added more general
classes to handle the encoder-decoder attention case.
There may be a more elegant way to deal with retro-compatibility (I am
not comfortable with the current state of the code), but I cannot see it
right now.
2019-10-08 16:30:58 +02:00
Rémi Louf
cd6a59d5c1
add a decoder layer for Bert
2019-10-08 16:30:58 +02:00
Rémi Louf
a0dcefa382
generalize BertSelfAttention to take separate query, key, value
...
There is currently no way to specify the quey, key and value separately
in the Attention module. However, the decoder's "encoder-decoder
attention" layers take the decoder's last output as a query, the
encoder's states as key and value. We thus modify the existing code so
query, key and value can be added separately.
This obviously poses some naming conventions; `BertSelfAttention` is not
a self-attention module anymore. The way the residual is forwarded is
now awkard, etc. We will need to do some refacto once the decoder is
fully implemented.
2019-10-07 17:53:58 +02:00
Rémi Louf
31adbb247c
add class wireframes for Bert decoder
2019-10-07 16:43:21 +02:00
Rémi Louf
dda1adad6d
rename BertLayer to BertEncoderLayer
2019-10-07 16:31:46 +02:00
Rémi Louf
0053c0e052
do some (light) housekeeping
...
Several packages were imported but never used, indentation and line
spaces did not follow PEP8.
2019-10-07 16:29:15 +02:00
Rémi Louf
386e86e222
raise exception when class initialized with __init__
2019-10-07 13:00:06 +02:00
Rémi Louf
4446c02b8a
add wireframe for seq2seq model
2019-10-07 12:04:05 +02:00
Christopher Goh
904158ac4d
Rephrase forward method to reduce ambiguity
2019-10-06 23:40:52 -04:00
Christopher Goh
0f65d8cbbe
Fix some typos in README
2019-10-06 23:40:52 -04:00
LysandreJik
f3e0218fbb
Correct device assignment in run_generation
2019-10-05 21:05:16 -04:00
VictorSanh
0820bb0555
unecessary carriage return
2019-10-04 17:23:15 -04:00
VictorSanh
f5891c3821
run_squad --> run_squad_w_distillation
2019-10-04 17:23:15 -04:00
VictorSanh
764a7923ec
add distillation+finetuning option in run_squad
2019-10-04 17:23:15 -04:00
Lysandre Debut
bb464289ce
New model addition issue template
2019-10-04 16:41:26 -04:00
LysandreJik
7bddb45a6f
Decode documentaton
2019-10-04 14:27:38 -04:00
Thomas Wolf
b3cfd97946
Merge pull request #1373 from TimYagan/fix-css
...
Fixed critical css font-family issues
2019-10-03 19:04:02 -04:00
Lysandre Debut
81a1e12469
Merge pull request #1313 from enzoampil/master
...
Add option to use a 'stop token'
2019-10-03 22:43:57 +00:00
Lysandre Debut
d3f24dfad7
Merge branch 'master' into master
2019-10-03 22:43:09 +00:00
LysandreJik
ecc4f1bdfa
XLM use_lang_embedding flag in run_generation
2019-10-03 17:42:16 -04:00
LysandreJik
c2c2ca0fdb
Added XLM to run_generation, with prompt language selection.
2019-10-03 17:18:48 -04:00
Thomas Wolf
1569610f2d
Merge pull request #1296 from danai-antoniou/add-duplicate-tokens-error
...
Added ValueError for duplicates in list of added tokens
2019-10-03 17:06:17 -04:00
drc10723
e1b2949ae6
DistillBert Documentation Code Example fixes
2019-10-03 15:51:33 -04:00
VictorSanh
e2ae9c0b73
fix links in doc index
2019-10-03 11:42:21 -04:00
Brian Ma
7af0777910
Update run_glue.py
...
add DistilBert model shortcut into ALL_MODELS
2019-10-03 15:31:11 +00:00
VictorSanh
c1689ac301
fix name
2019-10-03 10:56:39 -04:00
VictorSanh
4a790c40b1
update doc for distil*
2019-10-03 10:54:02 -04:00
VictorSanh
6be46a6e64
update links to new weights
2019-10-03 10:27:11 -04:00
VictorSanh
5f07d8f11a
prepare release
2019-10-03 10:27:11 -04:00
VictorSanh
35071007cb
incoming release 🔥 update links to arxiv preprint
2019-10-03 10:27:11 -04:00
VictorSanh
f1f23ad171
fix buf in convert_pt_chkpt_to_tf2
2019-10-03 10:27:11 -04:00
VictorSanh
2a91f6071f
upddate README - TODO updadte link to paper
2019-10-03 10:27:11 -04:00
VictorSanh
c51e533a5f
update train.py
2019-10-03 10:27:11 -04:00
VictorSanh
a76c3f9cb0
update requirements
2019-10-03 10:27:11 -04:00
VictorSanh
bb9c5ead54
update distiller
2019-10-03 10:27:11 -04:00
VictorSanh
a12ab0a8db
update binarized_data
2019-10-03 10:27:11 -04:00
VictorSanh
4d6dfbd376
update extract
2019-10-03 10:27:11 -04:00
VictorSanh
23edebc079
update extract_distilbert
2019-10-03 10:27:11 -04:00