* First draft
* Update self-attention of RoBERTa as proposition
* Improve conversion script
* Add TrOCR decoder-only model
* More improvements
* Make forward pass with pretrained weights work
* More improvements
* Some more improvements
* More improvements
* Make conversion work
* Clean up print statements
* Add documentation, processor
* Add test files
* Small improvements
* Some more improvements
* Make fix-copies, improve docs
* Make all vision encoder decoder model tests pass
* Make conversion script support other models
* Update URL for OCR image
* Update conversion script
* Fix style & quality
* Add support for the large-printed model
* Fix some issues
* Add print statement for debugging
* Add print statements for debugging
* Make possible fix for sinusoidal embedding
* Further debugging
* Potential fix v2
* Add more print statements for debugging
* Add more print statements for debugging
* Deubg more
* Comment out print statements
* Make conversion of large printed model possible, address review comments
* Make it possible to convert the stage1 checkpoints
* Clean up code, apply suggestions from code review
* Apply suggestions from code review, use Microsoft models in tests
* Rename encoder_hidden_size to cross_attention_hidden_size
* Improve docs