
* Initial submit * Fix bugs: 1. add __init__ file 2. tied word embedding 3. support flash/flex attention 4. model saving and loading * Code refactor: * Rename encdecgemma to t5gemma. * Split attention into self- and cross-attention * Split stack into encoder and decoder * Add test cases * Add auto configuration * Update configurations. * Fix bugs related to copy and attribute checks * Fix type union * Fix merge errors * run ruff format * Run make style and update tests. * Add t5gemma model doc. * ruff and style formatting. * Add missed module config. * Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.). * Update model doc. * Minor updates following Arthur's comments: * replace docstrings with auto_docstrings * remove checkpoint layers * remove deprecate_kwargs * fix rebase errors * Fix docstring issues. * fix t5gemma doc issue. * run ruff format * Updates: * split encoder-only model out * make t5gemmamodel encoder-decoder only * update token and sequence classification * update tests
3.7 KiB
T5Gemma
T5Gemma (aka encoder-decoder Gemma) was proposed in a research paper by Google. It is a family of encoder-decoder large langauge models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.
T5Gemma has two groups of model sizes: 1) Gemma 2 sizes (2B-2B, 9B-2B, and 9B-9B), which are based on the offical Gemma 2 models (2B and 9B); and 2) T5 sizes (Small, Base, Large, and XL), where are pretrained under the Gemma 2 framework following T5 configuration. In addition, we also provide a model at ML size (medium large, ~2B in total), which is in-between T5 Large and T5 XL.
The pretrained varaints are trained with two objectives: prefix language modeling with knowledge distillation (PrefixLM) and UL2, separately. We release both variants for each model size. The instruction-turned varaints was post-trained with supervised fine-tuning and reinforcement learning.
The example below demonstrates how to chat with the model with [Pipeline
] or the [AutoModel
] class, and from the command line.
import torch
from transformers import pipeline
pipe = pipeline(
task="text2text-generation",
model="google/t5gemma-placeholder",
torch_dtype=torch.bfloat16,
device="cuda",
)
pipe("Question: Why is the sky blue?\nAnswer:", max_new_tokens=50)
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/t5gemma-placeholder")
model = AutoModelForSeq2SeqLM.from_pretrained(
"google/t5gemma-placeholder",
torch_dtype=torch.bfloat16,
device_map="auto"
)
input_text = "Question: Why is the sky blue?\nAnswer:"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
echo -e "Question: Why is the sky blue? Answer:" | transformers run --task text2text-generation --model google/t5gemma-placeholder --device 0
T5GemmaConfig
autodoc T5GemmaConfig
T5GemmaModuleConfig
autodoc T5GemmaModuleConfig
T5GemmaModel
autodoc T5GemmaModel - forward
T5GemmaEncoderModel
autodoc T5GemmaEncoderModel - forward
T5GemmaForConditionalGeneration
autodoc T5GemmaForConditionalGeneration - forward
T5GemmaForSequenceClassification
autodoc T5GemmaForSequenceClassification - forward
T5GemmaForTokenClassification
autodoc T5GemmaForTokenClassification - forward