* Add cross attentions to TFGPT2Model
* Add TFEncoderDecoderModel
* Add TFBaseModelOutputWithPoolingAndCrossAttentions
* Add cross attentions to TFBertModel
* Fix past or past_key_values argument issue
* Fix generation
* Fix save and load
* Add some checks and comments
* Clean the code that deals with past keys/values
* Add kwargs to processing_inputs
* Add serving_output to TFEncoderDecoderModel
* Some cleaning + fix use_cache value issue
* Fix tests + add bert2bert/bert2gpt2 tests
* Fix more tests
* Ignore crossattention.bias when loading GPT2 weights into TFGPT2
* Fix return_dict_in_generate in tf generation
* Fix is_token_logit_eos_token bug in tf generation
* Finalize the tests after fixing some bugs
* Fix another is_token_logit_eos_token bug in tf generation
* Add/Update docs
* Add TFBertEncoderDecoderModelTest
* Clean test script
* Add TFEncoderDecoderModel to the library
* Add cross attentions to TFRobertaModel
* Add TFRobertaEncoderDecoderModelTest
* make style
* Change the way of position_ids computation
* bug fix
* Fix copies in tf_albert
* Remove some copied from and apply some fix-copies
* Remove some copied
* Add cross attentions to some other TF models
* Remove encoder_hidden_states from TFLayoutLMModel.call for now
* Make style
* Fix TFRemBertForCausalLM
* Revert the change to longformer + Remove copies
* Revert the change to albert and convbert + Remove copies
* make quality
* make style
* Add TFRembertEncoderDecoderModelTest
* make quality and fix-copies
* test TFRobertaForCausalLM
* Fixes for failed tests
* Fixes for failed tests
* fix more tests
* Fixes for failed tests
* Fix Auto mapping order
* Fix TFRemBertEncoder return value
* fix tf_rembert
* Check copies are OK
* Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined
* Add TFEncoderDecoderModelSaveLoadTests
* fix tf weight loading
* check the change of use_cache
* Revert the change
* Add missing test_for_causal_lm for TFRobertaModelTest
* Try cleaning past
* fix _reorder_cache
* Revert some files to original versions
* Keep as many copies as possible
* Apply suggested changes - Use raise ValueError instead of assert
* Move import to top
* Fix wrong require_torch
* Replace more assert by raise ValueError
* Add test_pt_tf_model_equivalence (the test won't pass for now)
* add test for loading/saving
* finish
* finish
* Remove test_pt_tf_model_equivalence
* Update tf modeling template
* Remove pooling, added in the prev. commit, from MainLayer
* Update tf modeling test template
* Move inputs["use_cache"] = False to modeling_tf_utils.py
* Fix torch.Tensor in the comment
* fix use_cache
* Fix missing use_cache in ElectraConfig
* Add a note to from_pretrained
* Fix style
* Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt
* Fix TFMLP (in TFGPT2) activation issue
* Fix None past_key_values value in serving_output
* Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub
* Apply review suggestions - style for cross_attns in serving_output
* Apply review suggestions - change assert + docstrings
* break the error message to respect the char limit
* deprecate the argument past
* fix docstring style
* Update the encoder-decoder rst file
* fix Unknown interpreted text role "method"
* fix typo
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
We use `torch.unique` here only to check whether every elements have
the same value.
Therefore, we can use `torch.unique_consecutive` here.
This function eliminates all but the first element from every consecutive
group of equivalent elements.
Like, if we apply this function to `[1, 2, 2, 1]`, it will result in
`[1, 2, 1]`.
As you could see, this is enough for checking whether every elements
have the same value.
Since `torch.unique_consecutive` do less thing, it is much more faster.
On my computer, it is 25x faster on GPU and 15x faster on CPU.
* Make gradient_checkpointing a training argument
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix tests
* Style
* document Gradient Checkpointing as a performance feature
* Small rename
* PoC for not using the config
* Adapt BC to new PoC
* Forgot to save
* Rollout changes to all other models
* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
* refactor GPT Config to allow dyn. properties
* make attribute_map a class attribute
* remove old code
* update unit test to test config: Add test for common properties setter
* update unit test to test config: Add test for common properties passed as parameters to __init__
* update to black code format
* Allow that setters are not defined for certain config classes
* update config classes to implement attribute_map
* bugfix lxmert config - id2labels was not defined when num_labels was set
* update broken configs - add attribute_maps
* update bart config
* update black codestyle
* update documentation on common config attributes
* update GPTJ config to new attribute map
* update docs on common attributes
* gptj config: add max_position_embeddings
* gptj config: format with black
* update speech to text 2 config
* format doc file to max_len 119
* update config template
* Add option to add flax
* Add flax template for __init__.py
* Add flax template for .rst
* Copy TF modeling template
* Add a missing line in modeling_tf_... template
* Update first half of modeling_flax_..
* Update encoder flax template
* Copy test_modeling_tf... as test_modeling_flax...
* Replace some TF to Flax in test_modeling_flax_...
* Replace tf to np
some function might not work, like _assert_tensors_equal
* Replace remaining tf to np (might not work)
* Fix cookiecutter
* Add Flax in to_replace_... template
* Update transformers-cli add-new-model
* Save generate_flax in configuration.json
This will be read by transformers-cli
* Fix to_replace_... and cli
* Fix replace cli
* Fix cookiecutter name
* Move docstring earlier to avoid not defined error
* Fix a missing Module
* Add encoder-decoder flax template from bart
* Fix flax test
* Make style
* Fix endif
* Fix replace all "utf-8 -> unp-8"
* Update comment
* Fix flax template (add missing ..._DOCSTRING)
* Use flax_bart imports in template (was t5)
* Fix unp
* Update templates/adding_a_new_model/tests
* Revert "Fix unp"
This reverts commit dc9002a41d.
* Remove one line of copied from to suppress CI error
* Use generate_tensorflow_pytorch_and_flax
* Add a missing part
* fix typo
* fix flax config
* add examples for flax
* small rename
* correct modeling imports
* correct auto loading
* corrects some flax tests
* correct small typo
* correct as type
* finish modif
* correct more templates
* final fixes
* add file testers
* up
* make sure tests match template regex
* correct pytorch
* correct tf
* correct more tf
* correct imports
* minor error
* minor error
* correct init
* more fixes
* correct more flax tests
* correct flax test
* more fixes
* correct docs
* update
* fix
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Initial work
* All auto models
* All tf auto models
* All flax auto models
* Tokenizers
* Add feature extractors
* Fix typos
* Fix other typo
* Use the right config
* Remove old mapping names and update logic in AutoTokenizer
* Update check_table
* Fix copies and check_repo script
* Fix last test
* Add back name
* clean up
* Update template
* Update template
* Forgot a )
* Use alternative to fixup
* Fix TF model template
* Address review comments
* Address review comments
* Style
* registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing
* sytle format
* adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue
* adding the try catch to the fix as persistent flag is only available from PT >1.6
* adding version check
* added the condition to only use the token_type_ids buffer when its autogenerated not passed by user
* adding comments and making the conidtion where token_type_ids are None to use the registered buffer
* taking out position-embeddding from the if block
* adding comments
* handling the case if buffer for position_ids was not registered
* reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings
* reverting the token_type_ids in case of None to the previous version
* reverting changes on position_ids adding back the if block
* changes added by running make fix-copies
* changes added by running make fix-copies and added the import version as it was getting used
* changes added by running make fix-copies
* changes added by running make fix-copies
* fixing the import format
* fixing the import format
* modified to use temp tensor for trimed and expanded token_type_ids buffer
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* clean up
* clean up
* clean up
* clean up
* Nit
* Nit
* Nit
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* changes based on latest in master
* Adapt templates
* Add version import
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
* Fixing all models QA clamp_ bug.
* Add cross_attn_head_mask to BART
* Fix cross_attentions in TFBart-like models
* This commit enables returning of `cross_attentions`
for TFBart-like models
* It also fixes attention head masking in cross-attenion module
* Update TF model templates
* Fix missing , in TF model templates
* Fix typo: congig -> config
* Fix cross-attention head mask for Torch BART models
* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus
* Enable test_headmasking for M2M_100 model
* Fix cross_head_mask for FSMT, LED and T5
* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5
* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`
* Update template
* Fix template for BartForCausalLM
* Fix cross_head_mask for Speech2Text models
* Fix cross_head_mask in templates
* Fix args order in BartForCausalLM template
* Fix doc in BART templates
* Make more explicit naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Fix doc
* make style quality
* Fix speech2text docstring
* Assumption of padding_idx <2 might not stand
* Use offset instead of 2
* Fix with black
* Change behavior to warning instead for backward compatibility.
* Fix with black
* Remove warning
* Make padding_idx non-required
* padding_idx fix for blenderbot
* padding_idx fix for blenderbot_small
* padding_idx fix for led
* padding_idx fix for mbart
* Remove extra whitespaces
* padding_idx fix for template
* Fix padding_idx passed to nn.Embedding mistake
* Fixed padding_idx passed to positional embedding in template
* Remove padding_idx from pytorch learned positional embeddings
* Remove accidentally added quotes
* Remove padding_idx from tf learned positional embeddings
* Remove zeroing of weights in __init__
Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>
* change tokenizer requirement
* split line
* Correct typo from list to str
* improve style
* make other function pretty as well
* add comment
* correct typo
* add new test
* pass tests for tok without padding token
* Apply suggestions from code review
* Fix computation of attention_probs when head_mask is provided.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Apply changes to the template
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Update past_key_values in gpt2 (#9391)
* Update generation_utils, and rename some items
* Update modeling_gpt2 to avoid an error in gradient_checkpointing
* Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2
* Change the location of '_reorder_cache' in modeling files
* Add '_reorder_cache' in modeling_ctrl
* Fix a bug of my last commit in CTRL
* Add '_reorder_cache' to GPT2DoubleHeadsModel
* Manage 'use_cache' in config of test_modeling_gpt2
* Clean up the doc string
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix the doc string (GPT-2, CTRL)
* improve gradient_checkpointing_behavior
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Main init work
* Add version
* Change from absolute to relative imports
* Fix imports
* One more typo
* More typos
* Styling
* Make quality script pass
* Add necessary replace in template
* Fix typos
* Spaces are ignored in replace for some reason
* Forgot one models.
* Fixes for import
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Add documentation
* Styling
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Resize the biases in same time than the embeddings
* Trigger CI
* Biases are not reset anymore
* Remove get_output_embeddings + better LM model detection in generation utils
* Apply style
* First test on BERT
* Update docstring + new name
* Apply the new resizing logic to all the models
* fix tests
* Apply style
* Update the template
* Fix naming
* Fix naming
* Apply style
* Apply style
* Remove unused import
* Revert get_output_embeddings
* Trigger CI
* Update num parameters
* Restore get_output_embeddings in TFPretrainedModel and add comments
* Style
* Add decoder resizing
* Style
* Fix tests
* Separate bias and decoder resize
* Fix tests
* Fix tests
* Apply style
* Add bias resizing in MPNet
* Trigger CI
* Apply style
* Put models in subfolders
* Styling
* Fix imports in tests
* More fixes in test imports
* Sneaky hidden imports
* Fix imports in doc files
* More sneaky imports
* Finish fixing tests
* Fix examples
* Fix path for copies
* More fixes for examples
* Fix dummy files
* More fixes for example
* More model import fixes
* Is this why you're unhappy GitHub?
* Fix imports in conver command
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Run on the real suite
* Fix slow tests