* Tmp.
* Fixing BC for question answering with long context.
* Capping model_max_length to avoid tf overflow.
* Bad workaround bugged roberta.
* Fixing name.
* Fix cross-attention head mask for Torch BART models
* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus
* Enable test_headmasking for M2M_100 model
* Fix cross_head_mask for FSMT, LED and T5
* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5
* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`
* Update template
* Fix template for BartForCausalLM
* Fix cross_head_mask for Speech2Text models
* Fix cross_head_mask in templates
* Fix args order in BartForCausalLM template
* Fix doc in BART templates
* Make more explicit naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Fix doc
* make style quality
* Fix speech2text docstring
* Add {decoder_,}head_mask to LED
* Fix create_custom_forward signatue in encoder
* Add head_mask to longformer
* Add head_mask to longformer to fix dependencies
of LED on Longformer.
* Not working yet
* Add mising one input in longofrmer_modeling.py
* make fix-copies
* create model
* add integration
* save current state
* make integration tests pass
* add one more test
* add explanation to tests
* remove from bart
* add padding
* remove unnecessary test
* make all tests pass
* re-add cookie cutter tests
* finish PyTorch
* fix attention test
* Update tests/test_modeling_common.py
* revert change
* remove unused file
* add string to doc
* save intermediate
* make tf integration tests pass
* finish tf
* fix doc
* fix docs again
* add led to doctree
* add to auto tokenizer
* added tips for led
* make style
* apply jplus statements
* correct tf longformer
* apply lysandres suggestions
* apply sylvains suggestions
* Apply suggestions from code review