Commit Graph

10 Commits

Author SHA1 Message Date
Raushan Turganbay
f8b88866f5
[VLMs] support passing embeds along with pixels (#38467)
* VLMs can work with embeds now

* update more models

* fix tests

* fix copies

* fixup

* fix

* style

* unskip tests

* fix copies

* fix tests

* style

* omni modality models

* qwen models had extra indentation

* fix some other tests

* fix copies

* fix test last time

* unrelated changes revert

* we can't rely only on embeds

* delete file

* de-flake mistral3

* fix qwen models

* fix style

* fix tests

* fix copies

* deflake the test

* modular reverted by fixes, fix again

* flaky test, overwritten

* fix copies

* style
2025-07-01 11:33:20 +00:00
Yao Matrix
2100ee6545
fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8 (#39116)
* fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* zamba2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* internvl

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* tp cases

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-30 11:49:03 +02:00
Rémi Ouazan
25c44d4b68
Internvl fix (#38946)
* Image processor compile fix (#38540)

* Added a compile-friendly versiom of resize to BaseImgProcessorFast

* Changed qwen2 processor to use its parent class .resize

* Style

* underlined issue only happens on AMD w/ comment and bool check

* Fixed some utils functions

* Fixed the same issue for bridgetower

* Fixed the same issue for llava_next

* Repo consistency for llava onevision

* Update src/transformers/image_processing_utils_fast.py

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

* Added an Expectation to an internvl test

* Made qwen2_vl use the resize method of its parent clas

* Changed to torch.where

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
2025-06-26 13:44:59 +02:00
Rémi Ouazan
9ff246db00
Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
Yih-Dar
04cdf83244
Update some tests for torch 2.7.1 (#38701)
* fix 1

* fix 2

* fix 3

* fix 4

* fp16

* break

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-10 11:46:52 +02:00
Yih-Dar
ebeec13609
Fix InternVL integration test (#38612)
Some checks failed
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled
Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled
Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled
* fix

* fix

* fix OOM

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-07 08:30:47 +02:00
Raushan Turganbay
17742bd9c8
🔴 [VLM] Add base model without head (#37033)
* i guessreverted all CdGen classes

* style

* llava onevision

* fix copies

* fix some tests

* some more tests

* dump

* skip these

* nevermind, i am dumb

* revert fix not needed

* fixup

* fixup

* another fixup

* more fixup to make ci finally happy

* fixup after rebasing

* fix qwen tests

* add internVL + typos here and there

* image token index -> id

* style

* fix init weights

* revert blip-2 not supported

* address comments

* fix copies

* revert blip2 test file as well

* as discussed internally, revert back CdGen models

* fix some tests

* fix more tests for compile

* CI red

* fix copies

* enumerate explicitly allowed models

* address comments

* fix tests

* fixup

* style again

* add tests for new model class

* another fixup ( x _ x )

* [fixup] unused attributes can be removed post-deprecation
2025-05-07 17:47:51 +02:00
Yao Matrix
34f26e2c3e
enable internvl UTs on XPU (#37779)
* enable internvl UTs on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style per comments

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-30 10:29:40 +02:00
Raushan Turganbay
1e9087368c
[internvl] fix chat template (#37656)
* fix chat template

* update

* update conversion

* rename `fake_image_token` in tests
2025-04-23 16:56:36 +02:00
Yoni Gozlan
a245011252
Add InternVL (2.5 MPO) (#35968)
* initial commit

* add convert internvl

* add first end-to-end working internvl

* nit prompt and image proc

* add working chat template

* add conversion llama-based models

* add tests

* pass all tests

* fix isort

* fix modular after main merge

* add video processing for internvl

* add support for interlaced images and videos

* Remove processing and config from modular, add more tests

* add llama model tests

* Modify processor for compatibility with refactored got ocr image processor

* add comments in processor

* Add docs and nits

* change video processing to use custom sample_indices_fn

* rebase and fix tests

* add processor tests

* Add changes Raushan review

* Use the new attention interface for the vision model

* nits

* add support for custom video_load_backend

* remove mention to InternVLTokenizer

* refactor vision model to simplify logic

* refactor processor for better readibility

* fix copies

* fix require av processor test

* refactor internVL vision

* Update processor and fix processing tests

* fix docstring

* update convert_weights for internvl3

* change image processor to fast by default

* remove do_center_crop=True in convert_weights

* force use_cache to True

* push_to_hub before reloading

* fix internVLVision for larger models

* update convert weight for qk norm

* fix convert_weights

* fix eos_token_id in convert

* update docs and integration tests

* make modifs after review

* fix wrong k_norm and reduce modular

* change image_token_index to image_token_id

* change checkpoint to OpenGVLab org

* last nits

* explicitely del self.num_key_value_groups

* add extra special tokens
2025-04-18 18:57:33 +02:00