* Use PyAV instead of Decord
* Get frame indices
* Fix number of frames
* Update src/transformers/models/videomae/image_processing_videomae.py
* Fix up
* Fix copies
* Update timesformer doctests
* Update docstrings
Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
* Change the .view call to .reshape
* Change the .view call to .reshape to all the copies from bart attention
* Fix copies and style
* Fix copies and style
* Fix copies and style
* Fix copies and style
* Fix copies and style
* Revert unneccessary changes
* Revert unneccessary changes
* Revert unneccessary changes
* Revert unneccessary changes
* trying to figure out whether model is NLP
* drop my changes and apply easier fix
* trying to handle all int input types
* fix logic
---------
Co-authored-by: Stas Bekman <stas@stason.org>
* rounding_mode = "floor" instead of // to prevent behavioral change
* add other TODO
* use `torch_int_div` from pytrch_utils
* same for tests
* fix copies
* style
* use relative imports when needed
* Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* First commit for the improved PT-TF weight loading
* Remove workarounds from TFEncoderDecoder tests
* Allow a custom weight renaming function in from_pretrained and use that to clean up EncoderDecoder
* make fixup
* First attempt at visionencoderdecoder
* Disable tensorfloat32 in tests to get consistent outputs
* Quick fix to tf_vision_encoder_decoder tests
* make fixup
* Update Blenderbot tests
* Remove unused arg in modeling_tf_opt
* load_tf_sharded_weights had strict=True! This meant transfer learning was impossible, so I'm setting it to False.
* Support prefixes when loading sharded TF checkpoints
* make fixup
* Add test to load sharded models with a weight prefix
* Fix sharded weight loading test
* Add a test for transfer from a sharded checkpoint
* make fixup
* Add test to check that crossloading from PT with a prefix works
* Refactor from_pretrained in the encoderdecoder classes
* Refactor from_pretrained in the encoderdecoder classes
* missmatched -> mismatched
* Explicitly check for None
* No comments showing my very impressive and attractive knowledge of Py3.9+
* Disable TF32 across all TF tests
* Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval
* minor fix return_dict
* implement test for loss computation
---------
Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
* zero shot object detection part 1
* added batch prediction section
* added image guided object detection section
* make style
* added the task guide to the TOC
* minor polishing
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* added embedded owlvit demo
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* minor fix
* make style
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* If applied, this commit fixes generate bug in gptj
* Remove extra same code block
* formatting and test fix
* Conflict fix and declaration error fix
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix the issue of blip model returning loss even when the label is not provoided
* Fix ruff failure
* Incorporate PR feedbacks
* Incorporate PR feedbacks
* Incorporate PR feedbacks
* Incorporate PR feedbacks