mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
2.3 KiB
2.3 KiB
GraniteMoeShared
Overview
The GraniteMoe model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.
Additionally this class GraniteMoeSharedModel adds shared experts for Moe.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "ibm-research/moe-7b-1b-active-shared-experts"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()
# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
print(i)
This HF implementation is contributed by Mayank Mishra, Shawn Tan and Sukriti Sharma.
GraniteMoeSharedConfig
autodoc GraniteMoeSharedConfig
GraniteMoeSharedModel
autodoc GraniteMoeSharedModel - forward
GraniteMoeSharedForCausalLM
autodoc GraniteMoeSharedForCausalLM - forward