transformers/docs/source/en/model_doc/t5gemma.md
Biao Zhang 3ef8896906
Encoder-Decoder Gemma (#38332)
* Initial submit

* Fix bugs:
1. add __init__ file
2. tied word embedding
3. support flash/flex attention
4. model saving and loading

* Code refactor:
* Rename encdecgemma to t5gemma.
* Split attention into self- and cross-attention
* Split stack into encoder and decoder
* Add test cases
* Add auto configuration

* Update configurations.

* Fix bugs related to copy and attribute checks

* Fix type union

* Fix merge errors

* run ruff format

* Run make style and update tests.

* Add t5gemma model doc.

* ruff and style formatting.

* Add missed module config.

* Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.).

* Update model doc.

* Minor updates following Arthur's comments:
* replace docstrings with auto_docstrings
* remove checkpoint layers
* remove deprecate_kwargs

* fix rebase errors

* Fix docstring issues.

* fix t5gemma doc issue.

* run ruff format

* Updates:
* split encoder-only model out
* make t5gemmamodel encoder-decoder only
* update token and sequence classification
* update tests
2025-06-25 09:05:10 +00:00

3.7 KiB

T5Gemma

T5Gemma (aka encoder-decoder Gemma) was proposed in a research paper by Google. It is a family of encoder-decoder large langauge models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.

T5Gemma has two groups of model sizes: 1) Gemma 2 sizes (2B-2B, 9B-2B, and 9B-9B), which are based on the offical Gemma 2 models (2B and 9B); and 2) T5 sizes (Small, Base, Large, and XL), where are pretrained under the Gemma 2 framework following T5 configuration. In addition, we also provide a model at ML size (medium large, ~2B in total), which is in-between T5 Large and T5 XL.

The pretrained varaints are trained with two objectives: prefix language modeling with knowledge distillation (PrefixLM) and UL2, separately. We release both variants for each model size. The instruction-turned varaints was post-trained with supervised fine-tuning and reinforcement learning.

The example below demonstrates how to chat with the model with [Pipeline] or the [AutoModel] class, and from the command line.

import torch
from transformers import pipeline

pipe = pipeline(
    task="text2text-generation",
    model="google/t5gemma-placeholder",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

pipe("Question: Why is the sky blue?\nAnswer:", max_new_tokens=50)
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("google/t5gemma-placeholder")
model = AutoModelForSeq2SeqLM.from_pretrained(
    "google/t5gemma-placeholder",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

input_text = "Question: Why is the sky blue?\nAnswer:"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

echo -e "Question: Why is the sky blue? Answer:" | transformers run --task text2text-generation --model google/t5gemma-placeholder --device 0

T5GemmaConfig

autodoc T5GemmaConfig

T5GemmaModuleConfig

autodoc T5GemmaModuleConfig

T5GemmaModel

autodoc T5GemmaModel - forward

T5GemmaEncoderModel

autodoc T5GemmaEncoderModel - forward

T5GemmaForConditionalGeneration

autodoc T5GemmaForConditionalGeneration - forward

T5GemmaForSequenceClassification

autodoc T5GemmaForSequenceClassification - forward

T5GemmaForTokenClassification

autodoc T5GemmaForTokenClassification - forward