* Add decoder_head_mask for PyTorch T5 model
* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration
* Slightly change the order of input args to be in accordance
with the convention from BART-based models introduced within the PR #9569.
* Make style for modeling_t5.py
* Add decoder_head_mask for TF T5 models
* Separate head_mask and decoder_head_mask args in TF T5 models
* Slightly change the order of input args to follow convention
of BART-based models updated in PR #9569
* Update test_forward_signature tests/test_modeling_tf_common.py
w.r.t. the changed order of input args
* Add FutureWarnings for T5 and TFT5 models
* Add FutureWarnings for T5 and TFT5 models warning a user that
input argument `head_mask` was split into two arguments -
`head_mask` and `decoder_head_mask`
* Add default behaviour - `decoder_head_mask` is set to copy
`head_mask`
* Fix T5 modeling and FutureWarning
* Make proper usage of head_mask and decoder_head_mask
in cross_attention
* Fix conditions for raising FutureWarning
* Reformat FutureWarning in T5 modeling
* Refactor the warning message
File "/share/apps/anaconda3/envs/my_env/lib/python3.7/site-packages/transformers/integrations.py", line 419, in __init__
self._SummaryWriter = SummaryWriter
UnboundLocalError: local variable 'SummaryWriter' referenced before assignment
* Update past_key_values in gpt2 (#9391)
* Update generation_utils, and rename some items
* Update modeling_gpt2 to avoid an error in gradient_checkpointing
* Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2
* Change the location of '_reorder_cache' in modeling files
* Add '_reorder_cache' in modeling_ctrl
* Fix a bug of my last commit in CTRL
* Add '_reorder_cache' to GPT2DoubleHeadsModel
* Manage 'use_cache' in config of test_modeling_gpt2
* Clean up the doc string
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix the doc string (GPT-2, CTRL)
* improve gradient_checkpointing_behavior
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add head_mask/decoder_head_mask for BART
This branch implement head_mask and decoder_head_mask
for BART-based models. Full list below:
- BART
- MBart
- Blenderbot
- BlenderbotSmall
- Marian
- Pegasus
Everything is accompanied with updated testing.
* Fix test_headmasking for BART models
* Fix text_headmasking for BART-like models
which has only 2 layers in each modules.
The condition
```
self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0)
```
is, therefore, invalid for encoder-decoder models considering
the `head_mask`
```
head_mask = torch.ones(
self.model_tester.num_hidden_layers,
self.model_tester.num_attention_heads,
device=torch_device,
)
head_mask[0, 0] = 0
head_mask[-1, :-1] = 0
```
specified in the `test_headmasking` test/function.
* Adjust test_modeling_common.py to reflect T5 input args
* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
* make fix-copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* note on how to get to deps from shell
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix text
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Switch metrics in run_ner to datasets
* Add flag to return all metrics
* Upstream (and rename) sortish_sampler
* Revert "Upstream (and rename) sortish_sampler"
This reverts commit e07d0dcf65.
* Update run_glue for do_predict with local test data (#9442)
* Update run_glue (#9442): fix comments ('files' to 'a file')
* Update run_glue (#9442): reflect the code review
* Update run_glue (#9442): auto format
* Update run_glue (#9442): reflect the code review