Raushan Turganbay
a29eabd0eb
Expand inputs in processors for VLMs ( #30962 )
...
* let it be
* draft
* should not have changed
* add warnings
* fix & add tests
* fix tests
* ipnuts embeds cannot be passed with pixels
* more updates
* paligemma ready!
* minor typos
* update blip-2
* fix tests & raise error
* docstring
* add blip2 test
* tmp
* add image seq length to config
* update docstring
* delete
* fix tests
* fix blip
* fix paligemma
* out-of-place scatter
* add llava-next-video
* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* remove tmp
* codestyle
* nits
* more nits
* remove overriding in tests
* comprehension when merging video
* fix-copies
* revert changes for embeds test
* fix tests after making comprehension
* Update src/transformers/models/blip_2/processing_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Update src/transformers/models/blip_2/processing_blip_2.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* more updates
* fix tests
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-08-13 10:14:39 +05:00
Raushan Turganbay
3aefb4ec7f
LLaVaNeXT: pad on right if training ( #32134 )
...
* pad on right if training
* docs
* add tests
2024-07-23 10:23:55 +05:00
Raushan Turganbay
e71f2863d7
Add LLaVa NeXT Video ( #31252 )
...
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-06-26 21:52:28 +05:00
Xuan-Phi Nguyen
5ca085b882
Better llava next. ( #29850 )
...
* Better llava next.
- Batched forward with multiple image of different sizes (number of patches).
- Support training, for cases without any image.
- Support multi-image in same sequence. e.g: ["<image> <image> the first image is a dog while the second is a cat", "<image> <image> <image> <image> these 4 image are..."]
Current limitation:
- Haven't done testing
- Only support right padding (for training)
- left padding (batched generation) is not ready yet.
- PR not ready.
* fix bugs in batched generation
* add tests
* fix batch-gen bugs, left-padding positions and incorrect attention mask
* remove better modeling llava
* fix formatting
* fix test
* fix test
* fix testing
* fix test
* fix formatting
* Update src/transformers/models/llava_next/modeling_llava_next.py
add clarity
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update modeling_llava_next.py
remove assert
* fix bug modeling_llava_next.py
* update modeling
* fix bugs
* fix format
* fix error
* fix new_token_positions
* Update modeling_llava_next.py
* update formatting
* add args
* removecomments
* add slow tests for batched inference
* failing tf/flax tests
* this one ic correct
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix docs
* make fixup
* more fixup
* add test for batch equivalence
* Update tests/models/llava_next/test_modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* pr comments
* hardcode padding side for bs=1
* update
* [run-slow] llava_next
* [run-slow] llava_next
* make fix-copies
---------
Co-authored-by: NGUYEN, Xuan Phi <x.nguyen@alibaba-inc.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2024-05-15 19:02:56 +05:00
Fraser Mince
5090ea3f68
Fix llava half precision and autocast issues ( #29721 )
...
* Ensure input_embeds and image_features are the same dtype in autocast
* Fix nans in half precision llava-next and fix autocasting behavior.
* Fix styling issues.
* fix randn newline instantiation
* fix broken slow llava test
* Fix llava next init.
* fix styling issues
* [run-slow]llava,llava_next
* fix styling issues
2024-05-01 17:49:44 +01:00
Raushan Turganbay
9d31b32e9d
Use text config's vocab size in testing models ( #30568 )
...
use text config's vocab size
2024-05-01 12:32:45 +05:00
Raushan Turganbay
e60491adc9
Fix Llava for 0-embeddings ( #30473 )
2024-04-25 20:28:51 +05:00
Raushan Turganbay
cc75f1ac73
Fix vipllava for generation ( #29874 )
...
* fix vipllava generation
* consistent llava code
* revert llava tests changes
2024-04-03 17:00:08 +01:00
NielsRogge
d91fd7f92c
Add LLaVa-1.6, bis ( #29586 )
...
* First draft
* Fix tests, add docs
* Improve docstrings
* Fix test
* Address comments
* Address comments
* Remove vocab_size attribute
* Remove batch_size
* Address comment
* Add image processor tests
* Support fx
* Update docstring
* Add support for 34b
* Convert 34b model
* Add integration tests
* Update checkpoints
* Convert vicuna-13b, remove doc tests
* Remove script
* Remove file
* Address comments
* Improve docstrings
* Deprecate vocab_size
* Remove aspect_ratio_setting
* Address comments
* Update READMEs
* Add tips about chat templates
* Fix tests
* Deprecate vocab_size safely
* Update tests
---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-20 15:51:12 +00:00