mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00

* initial config and MLA layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at decoder Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * completion of layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * modeling class Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * adding hybrid class to imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix imports granitemoehybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid import Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add some comments Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * minor fixes in layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add sharedMLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct layer names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * change name of MLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix seq mizer layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in param names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * enable hybrid model Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix config granite hybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix attention layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * cleanup to re-use mamba code Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * keep layer types Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * attention bias cleanup Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update mamba layer name Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * use granite attention Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix: self attn weights Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * pass at making pos_emb optional Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * initialize self_attn only as needed Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * overwrite forward to create HybridMambaCache Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * Log invalid layer types * Add attention outputs test * Only emit attentions/logits if not None * Fix config test hidden size divisibility * mark granitmoehybrid as stateful * Initialize mamba convolutional layers * Formatting fixes * config docstring, removed some unused attrs * Fix missing arg in models test * Fix create and check decoder model test * support logits to keep in granitemoe * regen to pass logits_to_keep * Allow None or rope * Fix gradient checkpointing * Add granitemoehybrid as special cache for generate check * Remove unused MLA refs * Fix mamba layer mask * Remove logits to keep from config * Minor docstring nits * Update licenses * Enable cache by default * map layer types to layer block type * First pass at granite moe hybrid docs * Ignore granite moe hybrid in valid checkpoint check * Align attention interfaces * regenerate modular granitemoeshared attention interface * Align granite moe hybrid attn interface * run formatting * Handle mamba initialization * avoid conditional attr defs * Move hybrid layer validation to config * Add placeholder integration tests * Docs nits / Update model names * Clean up forward conditions * Use gradient checkpointing layer * Remove some copied bamba tests + inherit align test init delete more tests Use common layer init with bamba tests finish test consolidation * avoid redundant intermediate std var * use @can_return_tuple * Remove unused moe state * make skipped test names consistent * Fix docstring order * Add missing toc * Always create the shared mlp * Fix name in docstring * link preview model in docs --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2.1 KiB
2.1 KiB
GraniteMoeHybrid
Overview
The GraniteMoeHybrid
model builds on top of GraniteMoeSharedModel
and Bamba
. Its decoding layers consist of state space layers or MoE attention layers with shared experts. By default, the attention layers do not use positional encoding.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "ibm-granite/granite-4.0-tiny-preview"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()
# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
print(i)
This HF implementation is contributed by Sukriti Sharma and Alexander Brooks.
GraniteMoeHybridConfig
autodoc GraniteMoeHybridConfig
GraniteMoeHybridModel
autodoc GraniteMoeHybridModel - forward
GraniteMoeHybridForCausalLM
autodoc GraniteMoeHybridForCausalLM - forward