* First draft
* Update self-attention of RoBERTa as proposition
* Improve conversion script
* Add TrOCR decoder-only model
* More improvements
* Make forward pass with pretrained weights work
* More improvements
* Some more improvements
* More improvements
* Make conversion work
* Clean up print statements
* Add documentation, processor
* Add test files
* Small improvements
* Some more improvements
* Make fix-copies, improve docs
* Make all vision encoder decoder model tests pass
* Make conversion script support other models
* Update URL for OCR image
* Update conversion script
* Fix style & quality
* Add support for the large-printed model
* Fix some issues
* Add print statement for debugging
* Add print statements for debugging
* Make possible fix for sinusoidal embedding
* Further debugging
* Potential fix v2
* Add more print statements for debugging
* Add more print statements for debugging
* Deubg more
* Comment out print statements
* Make conversion of large printed model possible, address review comments
* Make it possible to convert the stage1 checkpoints
* Clean up code, apply suggestions from code review
* Apply suggestions from code review, use Microsoft models in tests
* Rename encoder_hidden_size to cross_attention_hidden_size
* Improve docs
* Add cross attentions to TFGPT2Model
* Add TFEncoderDecoderModel
* Add TFBaseModelOutputWithPoolingAndCrossAttentions
* Add cross attentions to TFBertModel
* Fix past or past_key_values argument issue
* Fix generation
* Fix save and load
* Add some checks and comments
* Clean the code that deals with past keys/values
* Add kwargs to processing_inputs
* Add serving_output to TFEncoderDecoderModel
* Some cleaning + fix use_cache value issue
* Fix tests + add bert2bert/bert2gpt2 tests
* Fix more tests
* Ignore crossattention.bias when loading GPT2 weights into TFGPT2
* Fix return_dict_in_generate in tf generation
* Fix is_token_logit_eos_token bug in tf generation
* Finalize the tests after fixing some bugs
* Fix another is_token_logit_eos_token bug in tf generation
* Add/Update docs
* Add TFBertEncoderDecoderModelTest
* Clean test script
* Add TFEncoderDecoderModel to the library
* Add cross attentions to TFRobertaModel
* Add TFRobertaEncoderDecoderModelTest
* make style
* Change the way of position_ids computation
* bug fix
* Fix copies in tf_albert
* Remove some copied from and apply some fix-copies
* Remove some copied
* Add cross attentions to some other TF models
* Remove encoder_hidden_states from TFLayoutLMModel.call for now
* Make style
* Fix TFRemBertForCausalLM
* Revert the change to longformer + Remove copies
* Revert the change to albert and convbert + Remove copies
* make quality
* make style
* Add TFRembertEncoderDecoderModelTest
* make quality and fix-copies
* test TFRobertaForCausalLM
* Fixes for failed tests
* Fixes for failed tests
* fix more tests
* Fixes for failed tests
* Fix Auto mapping order
* Fix TFRemBertEncoder return value
* fix tf_rembert
* Check copies are OK
* Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined
* Add TFEncoderDecoderModelSaveLoadTests
* fix tf weight loading
* check the change of use_cache
* Revert the change
* Add missing test_for_causal_lm for TFRobertaModelTest
* Try cleaning past
* fix _reorder_cache
* Revert some files to original versions
* Keep as many copies as possible
* Apply suggested changes - Use raise ValueError instead of assert
* Move import to top
* Fix wrong require_torch
* Replace more assert by raise ValueError
* Add test_pt_tf_model_equivalence (the test won't pass for now)
* add test for loading/saving
* finish
* finish
* Remove test_pt_tf_model_equivalence
* Update tf modeling template
* Remove pooling, added in the prev. commit, from MainLayer
* Update tf modeling test template
* Move inputs["use_cache"] = False to modeling_tf_utils.py
* Fix torch.Tensor in the comment
* fix use_cache
* Fix missing use_cache in ElectraConfig
* Add a note to from_pretrained
* Fix style
* Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt
* Fix TFMLP (in TFGPT2) activation issue
* Fix None past_key_values value in serving_output
* Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub
* Apply review suggestions - style for cross_attns in serving_output
* Apply review suggestions - change assert + docstrings
* break the error message to respect the char limit
* deprecate the argument past
* fix docstring style
* Update the encoder-decoder rst file
* fix Unknown interpreted text role "method"
* fix typo
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* adapt wav2vec2
* add example
* add files
* adapt
* remove bogus file
* Apply suggestions from code review
* adapt files more
* upload changes
* del old files
* up
* up
* up
* up
* up
* correct gradient checkpoitning
* add readme
* finish
* finish
* up
* more fixes
* up
* up
* add demo run to readme
* up
* Replace all assert by ValueError in src/transformers/models/electra
* Reformat with black to pass check_code_quality test
* Change some assert to ValueError of modeling_bert & modeling_tf_albert
* Change some assert in multiples models
* Change multiples models assertion to ValueError in order to validate
check_code_style test and models template test.
* Black reformat
* Change some more asserts in multiples models
* Change assert to ValueError in modeling_layoutlm.py to fix copy error in code_style_check
* Add proper message to ValueError in modeling_tf_albert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in models/bert/modeling_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add ValueError message to models/convbert/modeling_tf_convbert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add error message for ValueError to modeling_tf_electra.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in models/tapas/modeling_tapas.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in models/electra/modeling_electra.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add ValueError message in src/transformers/models/bert/modeling_tf_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in src/transformers/models/rembert/modeling_rembert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in src/transformers/models/albert/modeling_albert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* #12789 Replace assert statements with exceptions
* fix-copies: made copy changes to utils_qa.py in examples/pytorch/question-answering and examples/tensorflow/question-answering
* minor refactor for clarity