Lysandre
1cfd974868
Option to benchmark only one of the two libraries
2019-10-22 13:32:23 -04:00
Lysandre
777faa8ae7
Fix #1597
2019-10-22 11:26:42 -04:00
Thomas Wolf
b8c9ea0010
Merge pull request #1580 from pminervini/master
...
Gradient norm clipping should be done right before calling the optimiser
2019-10-22 13:59:20 +02:00
Pasquale Minervini
abd7110e21
gradient norm clipping should be done right before calling the optimiser - fixing run_glue and run_ner as well
2019-10-21 19:56:52 +01:00
thomwolf
4d456542e9
Fix citation
2019-10-21 16:34:14 +02:00
Thomas Wolf
0e64fec1ab
Merge pull request #1568 from daemon/patch-1
...
Fix hanging when loading pretrained models
2019-10-21 14:31:57 +02:00
Lorenzo Ampil
3a52b65795
Add special tokens to documentation for bert examples to resolve issue: #1561
2019-10-21 12:55:51 +08:00
erenup
86a630702d
Merge branch 'huggingface/master'
2019-10-21 12:06:09 +08:00
Pasquale Minervini
3775550c4b
gradient norm clipping should be done right before calling the optimiser
2019-10-20 22:33:56 +01:00
Pasquale Minervini
bf2c36a920
Merge pull request #1 from huggingface/master
...
update
2019-10-20 23:30:45 +02:00
Ralph Tang
a2c8c8ef00
Fix hanging when loading pretrained models
...
- Fix hanging when loading pretrained models from the cache without having internet access. This is a widespread issue on supercomputers whose internal compute nodes are firewalled.
2019-10-19 16:19:20 -04:00
LysandreJik
82f6abd98a
Benchmark section added to the documentation
2019-10-18 17:27:10 -04:00
LysandreJik
7dd29ed2f1
Benchmarks example script
2019-10-18 10:53:04 -04:00
Lysandre Debut
8efc0ec91a
Add Benchmarks to issue templates
2019-10-18 10:45:44 -04:00
William Tambellini
0919389d9a
Add speed log to examples/run_squad.py
...
Add a speed estimate log (time per example)
for evaluation to examples/run_squad.py
2019-10-17 14:41:04 -07:00
VictorSanh
fd97761c5a
soft launch distilroberta
2019-10-17 15:28:58 -04:00
leo-du
ecd15667f3
fix repetition penalty
2019-10-17 14:47:14 -04:00
thomwolf
56e2ee4ead
fix model2model
2019-10-17 16:33:31 +02:00
thomwolf
8cd56e3036
fix data processing in script
2019-10-17 16:33:26 +02:00
Rémi Louf
578d23e061
add training pipeline (formatting temporary)
2019-10-17 14:02:27 +02:00
Rémi Louf
47a06d88a0
use two different tokenizers for storyand summary
2019-10-17 13:04:26 +02:00
Rémi Louf
bfb9b540d4
add Model2Model to __init__
2019-10-17 12:59:51 +02:00
Rémi Louf
c1bc709c35
correct the truncation and padding of dataset
2019-10-17 10:41:53 +02:00
Rémi Louf
87d60b6e19
reword explanation of encoder_attention_mask
2019-10-17 10:18:19 +02:00
Rémi Louf
638fe7f5a4
correct composition of padding and causal masks
2019-10-17 10:13:07 +02:00
Rémi Louf
4e0f24348f
document the MLM modification + raise exception on MLM training with encoder-decoder
2019-10-17 09:41:53 +02:00
Rémi Louf
624a5644cc
revert black formatting to conform with lib style
2019-10-17 09:27:56 +02:00
Rémi Louf
9b71fc9a18
tying weights is going to be a clusterfuck
2019-10-16 21:31:38 +02:00
Rémi Louf
95ec1d08be
separate inputs into encoder & decoder inputs
2019-10-16 20:55:42 +02:00
Rémi Louf
e4e0ee14bd
add separator between data import and train
2019-10-16 20:05:32 +02:00
Rémi Louf
a424892fab
correct syntax error: dim() and not dims()
2019-10-16 18:24:32 +02:00
Rémi Louf
33c01368b1
remove Bert2Rnd test
2019-10-16 18:13:05 +02:00
Lysandre Debut
c544194611
Remove special_tokens_mask
from inputs in README
...
Co-authored-by: Thomas Wolf @thomwolf
2019-10-16 11:05:13 -04:00
Rémi Louf
0752069617
adapt attention masks for the decoder case
...
The introduction of a decoder introduces 2 changes:
- We need to be able to specify a separate mask in the cross
attention to mask the positions corresponding to padding tokens in the
encoder state.
- The self-attention in the decoder needs to be causal on top of not
attending to padding tokens.
2019-10-16 16:12:22 +02:00
Rémi Louf
c5a94a6100
fix function that defines masks in XLM
...
the definition of `get_masks` would blow with the proper combination of
arguments. It was just a matter of moving a definition outside of a
control structure.
2019-10-16 13:00:32 +02:00
Rémi Louf
488a664151
add is_decoder
attribute to PretrainedConfig
...
We currenctly instantiate encoders and decoders for the seq2seq by
passing the `is_decoder` keyword argument to the `from_pretrained`
classmethod. On the other hand, the model class looks for the value
of the `is_decoder` attribute in its config.
In order for the value to propagate from the kwarg to the configuration
we simply need to define `is_decoder` as an attribute to the base
`PretrainedConfig`, with a default at `False`.
2019-10-15 21:03:32 +02:00
Rémi Louf
4c81960b9b
comment the seq2seq functions
2019-10-15 20:52:28 +02:00
Rémi Louf
6d6c326737
take path to pretrained for encoder and decoder for init
2019-10-15 16:08:27 +02:00
Rémi Louf
0d81fc853e
specify in readme that both datasets are required
2019-10-15 15:26:33 +02:00
Rémi Louf
19e9964780
remove Bert2Bert from module declaration
2019-10-15 15:20:28 +02:00
Rémi Louf
1aec940587
test the full story processing
2019-10-15 15:18:07 +02:00
Rémi Louf
22e1af6859
truncation function is fully tested
2019-10-15 14:43:50 +02:00
Rémi Louf
260ac7d9a8
wip commit, switching computers
2019-10-15 12:24:35 +02:00
thomwolf
be916cb3fb
Merge branch 'master' of https://github.com/huggingface/transformers
2019-10-15 10:37:13 +02:00
thomwolf
5875aaf762
install tensorboard
2019-10-15 10:36:46 +02:00
Thomas Wolf
40f14ff545
Merge pull request #1513 from slayton58/amp_fp16_einsum
...
Force einsum to run in fp16
2019-10-15 10:25:00 +02:00
Thomas Wolf
e703e4dfe1
Merge pull request #1509 from julian-pani/patch-3
...
remove leftover usage of DUMMY_INPUTS
2019-10-15 10:24:13 +02:00
thomwolf
898ce064f8
add tests on TF2.0 & PT checkpoint => model convertion functions
2019-10-15 10:04:19 +02:00
Thomas Wolf
d147671c6c
Merge pull request #1508 from tlkh/master
...
Added performance enhancements (XLA, AMP) to examples
2019-10-15 09:57:18 +02:00
thomwolf
2c1d5564ad
add readme information
2019-10-15 09:56:52 +02:00