Yoni Gozlan
2b46943195
Add GOT-OCR 2.0 to Transformers ( #34721 )
...
* init modular got_ocr2
* Get correct got_ocr architecture
* add processing
* run modular with processing
* add working inference
* apply modular
* Refactor and fix style
* Refactor, cleanup, fix style
* fix init order
* Fix docs
* add base modeling tests
* fix style and consistency
* rename doc file
* fix repo consistency
* fix inference with box
* add image processing and support for crop_to_multi_page
* Fix batch inference
* add tests
* fixup
* fix slow test
* fix docstrings
* Add model doc
* update to new init
* fix input autocast pixel_values dtype
* update doc
* move doc to multimodal
* Reformat crop_image_to_patches and add docstrings
* Fix example in forward docstring
* Address Pablo review
* [run slow] got_ocr2
* remove defaults defined twice
* apply modular
* add torch_device to integration tests
* update modular
* follow-up Pavel review
* add device variable in doc
* fix doc multi-page
* Force eager attention for vision encoder to avoid attn implementation conflict
* revert qwen2vl doc changes
* use Qwen2ForCausalLM instead of Qwen2Model
* make fixup
* refactor gotocr2 to llava style
* uniformize function names and reduce checks
* final nits
* fix pixel_values dtype error
* change checkpoint names
* fix modular
2025-01-31 11:28:13 -05:00