mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
Add CLIP resources (#26534)
* docs: feat: model resources for CLIP * fix: resolve suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: resolve suggestion * fix: resolve suggestion * fix: resolve suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: resolve suggestion * fix: resolve suggestions Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
parent
7cc6f822a3
commit
d6e5b02ef3
@ -83,9 +83,23 @@ This model was contributed by [valhalla](https://huggingface.co/valhalla). The o
|
||||
|
||||
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP.
|
||||
|
||||
- A blog post on [How to fine-tune CLIP on 10,000 image-text pairs](https://huggingface.co/blog/fine-tune-clip-rsicd).
|
||||
- CLIP is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text).
|
||||
- A [notebook](https://colab.research.google.com/drive/1zip3zmrbuKerAfC1d2uS1mqQS-QykXnl?usp=sharing) on how to fine-tune the CLIP model with Korean multimodal dataset. 🌎🇰🇷
|
||||
- [Fine tuning CLIP with Remote Sensing (Satellite) images and captions](https://huggingface.co/blog/fine-tune-clip-rsicd), a blog post about how to fine-tune CLIP with [RSICD dataset](https://github.com/201528014227051/RSICD_optimal) and comparison of performance changes due to data augmentation.
|
||||
- This [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text) shows how to train a CLIP-like vision-text dual encoder model using a pre-trained vision and text encoder using [COCO dataset](https://cocodataset.org/#home).
|
||||
|
||||
<PipelineTag pipeline="image-to-text"/>
|
||||
|
||||
- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search for image captioning. 🌎
|
||||
|
||||
**Image retrieval**
|
||||
|
||||
- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
|
||||
- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎
|
||||
- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎
|
||||
- A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets. 🌎
|
||||
|
||||
**Explainability**
|
||||
|
||||
- A [notebook](https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/CLIP_explainability.ipynb) on how to visualize similarity between input token and image segment. 🌎
|
||||
|
||||
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it.
|
||||
The resource should ideally demonstrate something new instead of duplicating an existing resource.
|
||||
|
Loading…
Reference in New Issue
Block a user