transformers/docs/source/en/model_doc/fuyu.md
Pablo Montalvo caa0ff0bf1
Add fuyu model (#26911)
* initial commit

* add processor, add fuyu naming

* add draft processor

* fix processor

* remove dropout to fix loading of weights

* add image processing fixes from Pedro

* fix

* fix processor

* add basic processing fuyu test

* add documentation and TODO

* address comments, add tests, add doc

* replace assert with torch asserts

* add Mixins and fix tests

* clean imports

* add model tester, clean imports

* fix embedding test

* add updated tests from pre-release model

* Processor: return input_ids used for inference

* separate processing and model tests

* relax test tolerance for embeddings

* add test for logit comparison

* make sure fuyu image processor is imported in the init

* fix formattingh

* more formatting issues

* and more

* fixups

* remove some stuff

* nits

* update init

* remove the fuyu file

* Update integration test with release model

* Update conversion script.

The projection is not used, as confirmed by the authors.

* improve geenration

* Remove duplicate function

* Trickle down patches to model call

* processing fuyu updates

* remove things

* fix prepare_inputs_for_generation to fix generate()

* remove model_input

* update

* add generation tests

* nits

* draft leverage automodel and autoconfig

* nits

* fix dtype patch

* address comments, update READMEs and doc, include tests

* add working processing test, remove refs to subsequences

* add tests, remove Sequence classification

* processing

* update

* update the conversion script

* more processing cleanup

* safe import

* take out ModelTesterMixin for early release

* more cl;eanup

* more cleanup

* more cleanup

* and more

* register a buffer

* nits

* add postprocessing of generate output

* nits

* updates

* add one working test

* fix test

* make fixup works

* fixup

* Arthur's updates

* nits

* update

* update

* fix processor

* update tests

* passe more fixups

* fix

* nits

* don't import torch

* skip fuyu config for now

* fixup done

* fixup

* update

* oups

* nits

* Use input embeddings

* no buffer

* update

* styling processing fuyu

* fix test

* update licence

* protect torch import

* fixup and update not doctested

* kwargs should be passed

* udpates

* update the impofixuprts in the test

* protect import

* protecting imports

* protect imports in type checking

* add testing decorators

* protect top level import structure

* fix typo

* fix check init

* move requires_backend to functions

* Imports

* Protect types

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-10-18 15:24:11 -07:00

115 lines
5.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Fuyu
## Overview
The Fuyu model was created by [ADEPT](https://www.adept.ai/blog/fuyu-8b), and authored by Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, Sağnak Taşırlar.
The authors introduced Fuyu-8B, a decoder-only multimodal model based on the classic transformers architecture, with query and key normalization. A linear encoder is added to create multimodal embeddings from image inputs.
By treating image tokens like text tokens and using a special image-newline character, the model knows when an image line ends. Image positional embeddings are removed. This avoids the need for different training phases for various image resolutions. With 8 billion parameters and licensed under Apache, Fuyu-8B is notable for its ability to handle both text and images, its impressive context size of 16K, and its overall performance.
<Tip warning={true}>
The `Fuyu` models were trained using `bfloat16`, but the original inference uses `float16` The checkpoints uploaded on the hub use `torch_dtype = 'float16'` which will be
used by the `AutoModel` API to cast the checkpoints from `torch.float32` to `torch.float16`.
The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be cast to the default `dtype` of `torch` (becomes `torch.float32`). Users should specify the `torch_dtype` they want, and if they don't it will be `torch.float32`.
Finetuning the model in `float16` is not recommended and known to produce `nan`, as such the model should be fine-tuned in `bfloat16`.
</Tip>
Tips:
- To convert the model, you need to clone the original repository using `git clone https://github.com/persimmon-ai-labs/adept-inference`, then get the checkpoints:
```bash
git clone https://github.com/persimmon-ai-labs/adept-inference
wget path/to/fuyu-8b-model-weights.tar
tar -xvf fuyu-8b-model-weights.tar
python src/transformers/models/fuyu/convert_fuyu_weights_to_hf.py --input_dir /path/to/downloaded/fuyu/weights/ --output_dir /output/path \
--pt_model_path /path/to/fuyu_8b_release/iter_0001251/mp_rank_00/model_optim_rng.pt
--ada_lib_path /path/to/adept-inference
```
For the chat model:
```bash
wget https://axtkn4xl5cip.objectstorage.us-phoenix-1.oci.customer-oci.com/n/axtkn4xl5cip/b/adept-public-data/o/8b_chat_model_release.tar
tar -xvf 8b_base_model_release.tar
```
Then, model can be loaded via:
```py
from transformers import FuyuConfig, FuyuForCausalLM
model_config = FuyuConfig()
model = FuyuForCausalLM(model_config).from_pretrained('/output/path')
```
Inputs need to be passed through a specific Processor to have the correct formats.
A processor requires an image_processor and a tokenizer. Hence, inputs can be loaded via:
```py
from PIL import Image
from transformers import AutoTokenizer
from transformers.models.fuyu.processing_fuyu import FuyuProcessor
from transformers.models.fuyu.image_processing_fuyu import FuyuImageProcessor
tokenizer = AutoTokenizer.from_pretrained('adept-hf-collab/fuyu-8b')
image_processor = FuyuImageProcessor()
processor = FuyuProcessor(image_processor=image_processor, tokenizer=tokenizer)
text_prompt = "Generate a coco-style caption.\\n"
bus_image_url = "https://huggingface.co/datasets/hf-internal-testing/fixtures-captioning/resolve/main/bus.png"
bus_image_pil = Image.open(io.BytesIO(requests.get(bus_image_url).content))
inputs_to_model = processor(text=text_prompt, images=image_pil)
```
This model was contributed by [Molbap](https://huggingface.co/Molbap).
The original code can be found [here](https://github.com/persimmon-ai-labs/adept-inference).
- Fuyu uses a `sentencepiece` based tokenizer, with a `Unigram` model. It supports bytefallback, which is only available in `tokenizers==0.14.0` for the fast tokenizer.
The `LlamaTokenizer` is used as it is a standard wrapper around sentencepiece.
- The authors suggest to use the following prompt for image captioning: `f"Generate a coco-style caption.\\n"`
## FuyuConfig
[[autodoc]] FuyuConfig
## FuyuForCausalLM
[[autodoc]] FuyuForCausalLM
- forward
## FuyuImageProcessor
[[autodoc]] FuyuImageProcessor
- __call__
## FuyuProcessor
[[autodoc]] FuyuProcessor
- __call__