Adds image-guided object detection method to OwlViTForObjectDetection class as described in the original paper. One-shot/ image-guided object detection enables users to use a query image to search for similar objects in the input image.
Co-Authored-By: Dhruv Karan k4r4n.dhruv@gmail.com
* WIP: Added CLIP resources from HuggingFace blog
* ADD: Notebooks documentation to clip
* Add link straight to notebook
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Change notebook links to colab
Co-authored-by: Ambuj Pawar <your_email@abc.example>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Slightly alter Keras dummy loss
* Slightly alter Keras dummy loss
* Add sample weight to test_keras_fit
* Fix test_keras_fit for datasets
* Skip the sample_weight stuff for models where the model tester has no batch_size
* allow loading projection in text and vision model
* begin tests
* finish test for CLIPTextModelTest
* style
* add slow tests
* add new classes for projection heads
* remove with_projection
* add in init
* add in doc
* fix tests
* fix some more tests
* fix copies
* fix docs
* remove leftover from fix-copies
* add the head models in IGNORE_NON_AUTO_CONFIGURED
* fix docstr
* fix tests
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add docstr for models
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Try PT1.13 by removing torch scatter
* Skip failing tests
* Style
* Remvoe testing extras for repo utils
* Try with all decorators
* Try to wipe the cache
* Fix all tests?
* Try this way
* Fix comma
* Update to main
* Try with less deps
* Quality
* add `accelerate` support for `ViT` family
- add `_no_split_modules`
- manually cast to the right `dtype`: to change
* enable `float16` for `deit`
* fix `make fixup`
* add `slow` test for `fp16` inference
* another safety check
* Update src/transformers/models/deit/modeling_deit.py
* update relative positional embedding
* make fix copies
* add `use_cache` to list of arguments
* fixup
* 1line fucntion
* add `test_decoder_model_past_with_large_inputs_relative_pos_emb`
* add relative pos embedding test for more models
* style
* Fix ImageSegmentationPipelineTests
* Use 0.9
* no zip
* links to show images
* links to show images
* rebase
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* docs: fix: set overflowing image width to auto-scale
* docs: fix: new language Korean is also affected
* docs: fix: unnecessary line break in index page
* merge conflicts
* bos and eos in datacollator
* (temp) hardcode removal of attention mask
* freeze encoder
* actually freeze encoder
* set max length / num beams according to gen kwargs
* (temp) fix tests
* don't pop attn mask
* override return attention mask config from Hub
* Hub configs updated 🤗
* final fixes
* update type annotations
* backward comp
docs: i18n: first draft of index page
docs: fix: first revision of index page
docs: i18n: missed section - supported frameworks
docs: fix: second revision of index page
review by @ArthurZucker
refactor: remove untranslated files from korean
docs: fix: remove untranslated references from toctree.yml
feat: enable korean docs in gh actions
docs: feat: add in_translation page as placeholder
docs: bug: testing if internal toc need alphabet chars
docs: fix: custom english anchor for non-alphanumeric headings
review by @sgugger
docs: i18n: translate comments on install methods in _config.py
docs: refactor: more concise wording for translations
* Proposal Remove the weird `inspect` in ASR pipeline and make
WhisperEncoder just nice to use.
It seems that accepting `attention_mask` is kind of an invariant of our
models. For Seq2Seq ASR models, we had a special comment on how it
actually was important to send it.
`inspecting` seems pretty brittle way to handle this case.
My suggestion is to simply add it as an kwarg that and just ignoring
it with the docstring explaining why it's ignored.
* Fixup.
* Update src/transformers/models/whisper/modeling_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Doc fixing .
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add model files etc for MobileNetV2
* rename files for MobileNetV1
* initial implementation of MobileNetV1
* fix conversion script
* cleanup
* write docs
* tweaks
* fix conversion script
* extract hidden states
* fix test cases
* make fixup
* fixup it all
* rename V1 to V2
* fix checkpoints
* fixup
* implement first block + weight conversion
* add remaining layers
* add output stride and dilation
* fixup
* add tests
* add deeplabv3+ head
* a bit of fixup
* finish deeplab conversion
* add link to doc
* fix issue with JIT trace
in_height and in_width would be Tensor objects during JIT trace, which caused Core ML conversion to fail on the remainder op. By making them ints, the result of the padding calculation becomes a constant value.
* cleanup
* fix order of models
* fix rebase error
* remove main from doc link
* add image processor
* remove old feature extractor
* fix converter + other issues
* fixup
* fix unit test
* add to onnx tests (but these appear broken now)
* add post_process_semantic_segmentation
* use google org
* remove unused imports
* move args
* replace weird assert
* Apply fix
* Fix test
* Remove another argument which is not used
* Fix pipeline test
* Add argument back, add deprecation warning
* Add warning add other location
* Use warnings instead
* Add num_channels to config
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>