mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-06 22:30:09 +06:00

* Initial add model additions * Test * All weights loading * Can perform full forward pass * Local and remote the same * Matching local and remote * Fixup * Idefics2Model importable; fixup docstrings * Don't skip by default * Remove deprecated use_resampler arg * Remove self.config * DecoupledLinear takes config * Tidy up * Enable eager attention and tidy up * Most tests passing * Update for batch of processed images * Add image processor * Update doc pages * Update conversion script * Remove erroneous breakpoint * Remove accidendtal spelling change * Update to reflect changes on hub - make generate work * Fix up * Image processor tests * Update tests * Add a processor * Add a processor * Update convert script * Update modeling file - remove fixmes * Bug fix * Add processing test * Use processor * Fix up * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix test * Update config - PR comments and defaults align with checkpoint * Reviewer comments * Add copied froms for flahs attention * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove qk_layer_norm and freeze_layers functionality * Fix * Remove freeze_layer options from config * Sync with upstream main * Fix attention shapes siglip * Remove Llava-next refs - TO REBASE * Use AutoModel for text model * Add comment to explain vision embeddings * Fix issue with tie_word_embeddings * Address review comments * Fix and fix up * Chat templates for idefics * Fix copies * Fix * Add layer norms to FA2 * Fix tests * Apply suggestions from code review Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix * Review comments * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update inputs merger * Merge weights in correct order * Update convert script * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update template * Model code examples (fix idefics too) * More review comments * Tidy up * Update processing * Fix attention mask preparation * Update inputs_merger inputs * Vectorize inputs_merger * Update src/transformers/models/idefics2/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics2/modeling_idefics2.py * Review comments * saying bye to the `qk_layer_norms` * Simplify * Update latents * Remove erroneuous readme changes * Return images when applying chat template * Fix bug - prompt images are for a single sample * Update src/transformers/models/idefics2/modeling_idefics2.py * image splitting * fix test * some more comment * some comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics2/image_processing_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update processor * Update model tests * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Don't add BOS in template * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Remove index in examples * Update tests to reflect #13 * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * PR comment - consistent typing * Update readme and model doc * Update docs * Update checkpoint references * Update examples * Fix and update tests * Small addition * Update tests - remove copied from as no ignore placement copy could be found * Update example * small fixes * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update README.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Connector model as bridge * Fix up * Fix up * Don't pass model inputs for generation kwargs update * IDEFICS-2 -> Idefics2 * Remove config archive name * IDEFICS-2 -> Idefics2 * Add back llava-next * Update readmes * Add requirements for processor tester * Use custom convert_to_rgb to avoid possible BC * Fix doc example * Fix doc example * Skip model doc tests - as model to large * More doc example - account for image splitting * Update src/transformers/image_transforms.py * Fix config doctest --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
99 lines
3.9 KiB
Markdown
99 lines
3.9 KiB
Markdown
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||
the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations under the License.
|
||
|
||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||
rendered properly in your Markdown viewer.
|
||
|
||
-->
|
||
|
||
# Idefics2
|
||
|
||
## Overview
|
||
|
||
The Idefics2 model was created by the [Hugging Face M4](https://huggingface.co/HuggingFaceM4) team and authored by Léo Tronchon, Hugo Laurencon, Victor Sanh.
|
||
The accompanying blog post can be found [here](https://huggingface.co/blog/idefics2).
|
||
|
||
Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text
|
||
outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple
|
||
images, or simply behave as a pure language model without visual inputs. It improves upon IDEFICS-1, notably on
|
||
document understanding, OCR, or visual reasoning. Idefics2 is lightweight (8 billion parameters) and treats
|
||
images in their native aspect ratio and resolution, which allows for varying inference efficiency.
|
||
|
||
Tips:
|
||
- Each sample can contain multiple images, and the number of images can vary between samples. The processor will pad the inputs to the maximum number of images in a batch for input to the model.
|
||
- The processor has a `do_image_splitting` option. If `True`, each input image will be split into 4 sub-images, and concatenated with the original to form 5 images. This is useful for increasing model performance. Make sure `processor.image_processor.do_image_splitting` is set to `False` if the model was not trained with this option.
|
||
- `text` passed to the processor should have the `<image>` tokens where the images should be inserted. And `<end_of_utterance>` at the end of each utterance if the text is a chat message.
|
||
- The processor has its own `apply_chat_template` method to convert chat messages to text that can then be passed as `text` to the processor.
|
||
|
||
Example of how to use the processor on chat messages:
|
||
```python
|
||
import requests
|
||
from PIL import Image
|
||
from transformers import Idefics2Processor, Idefics2ForConditionalGeneration
|
||
|
||
url_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||
url_2 = "http://images.cocodataset.org/val2017/000000219578.jpg"
|
||
|
||
image_1 = Image.open(requests.get(url_1, stream=True).raw)
|
||
image_2 = Image.open(requests.get(url_2, stream=True).raw)
|
||
images = [image_1, image_2]
|
||
|
||
messages = [{
|
||
"role": "user",
|
||
"content": [
|
||
{"type": "text", "text": "What’s the difference between these two images?"},
|
||
{"type": "image"},
|
||
{"type": "image"},
|
||
],
|
||
}]
|
||
|
||
processor = Idefics2Processor.from_pretrained("HuggingFaceM4/idefics2-8b")
|
||
model = Idefics2ForConditionalGeneration.from_pretrained("HuggingFaceM4/idefics2-8b")
|
||
|
||
text = processor.apply_chat_template(messages)
|
||
# "User: What’s the difference between these two images?<image><image><end_of_utterance>\n"
|
||
print(text)
|
||
|
||
inputs = processor(images=images, text=text)
|
||
|
||
generated_text = model.generate(**inputs)
|
||
```
|
||
|
||
This model was contributed by [amyeroberts](https://huggingface.co/amyeroberts).
|
||
The original code can be found [here](https://huggingface.co/HuggingFaceM4/idefics2).
|
||
|
||
|
||
## Idefics2Config
|
||
|
||
[[autodoc]] Idefics2Config
|
||
|
||
|
||
## Idefics2Model
|
||
|
||
[[autodoc]] Idefics2Model
|
||
- forward
|
||
|
||
|
||
## Idefics2ForConditionalGeneration
|
||
|
||
[[autodoc]] Idefics2ForConditionalGeneration
|
||
- forward
|
||
|
||
|
||
## Idefics2ImageProcessor
|
||
[[autodoc]] Idefics2ImageProcessor
|
||
- preprocess
|
||
|
||
|
||
## Idefics2Processor
|
||
[[autodoc]] Idefics2Processor
|
||
- __call__
|