* Image processor compile fix (#38540)
* Added a compile-friendly versiom of resize to BaseImgProcessorFast
* Changed qwen2 processor to use its parent class .resize
* Style
* underlined issue only happens on AMD w/ comment and bool check
* Fixed some utils functions
* Fixed the same issue for bridgetower
* Fixed the same issue for llava_next
* Repo consistency for llava onevision
* Update src/transformers/image_processing_utils_fast.py
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
---------
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
* Added an Expectation to an internvl test
* Made qwen2_vl use the resize method of its parent clas
* Changed to torch.where
---------
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
* i guessreverted all CdGen classes
* style
* llava onevision
* fix copies
* fix some tests
* some more tests
* dump
* skip these
* nevermind, i am dumb
* revert fix not needed
* fixup
* fixup
* another fixup
* more fixup to make ci finally happy
* fixup after rebasing
* fix qwen tests
* add internVL + typos here and there
* image token index -> id
* style
* fix init weights
* revert blip-2 not supported
* address comments
* fix copies
* revert blip2 test file as well
* as discussed internally, revert back CdGen models
* fix some tests
* fix more tests for compile
* CI red
* fix copies
* enumerate explicitly allowed models
* address comments
* fix tests
* fixup
* style again
* add tests for new model class
* another fixup ( x _ x )
* [fixup] unused attributes can be removed post-deprecation
* enable internvl UTs on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style per comments
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
* initial commit
* add convert internvl
* add first end-to-end working internvl
* nit prompt and image proc
* add working chat template
* add conversion llama-based models
* add tests
* pass all tests
* fix isort
* fix modular after main merge
* add video processing for internvl
* add support for interlaced images and videos
* Remove processing and config from modular, add more tests
* add llama model tests
* Modify processor for compatibility with refactored got ocr image processor
* add comments in processor
* Add docs and nits
* change video processing to use custom sample_indices_fn
* rebase and fix tests
* add processor tests
* Add changes Raushan review
* Use the new attention interface for the vision model
* nits
* add support for custom video_load_backend
* remove mention to InternVLTokenizer
* refactor vision model to simplify logic
* refactor processor for better readibility
* fix copies
* fix require av processor test
* refactor internVL vision
* Update processor and fix processing tests
* fix docstring
* update convert_weights for internvl3
* change image processor to fast by default
* remove do_center_crop=True in convert_weights
* force use_cache to True
* push_to_hub before reloading
* fix internVLVision for larger models
* update convert weight for qk norm
* fix convert_weights
* fix eos_token_id in convert
* update docs and integration tests
* make modifs after review
* fix wrong k_norm and reduce modular
* change image_token_index to image_token_id
* change checkpoint to OpenGVLab org
* last nits
* explicitely del self.num_key_value_groups
* add extra special tokens