mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00

* Create push-important-models.yml * feat: add falcon-h1 * fixup * address comment * fix * fix copies * fix copies * fix * fix * fix * fix * fix copies * fix * fix copies * fix test import to at least trigget the cis * yups * update * fix make fix copies * fix inits? * fix style * skip annoying test * add integration test for Falcon H1 * fix copies * fix --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
3.1 KiB
3.1 KiB
FalconH1
Overview
The FalconH1 model was developed by the TII Pretraining team. A comprehensive research paper covering the architecture, pretraining dynamics, experimental results, and conclusions is forthcoming. You can read more about this series in this website.
Contributors
This model was contributed by DhiyaEddine, ybelkada, JingweiZuo, IlyasChahed, and MaksimVelikanov. The original code can be found here.
FalconH1Config
Model | Depth | Dim | Attn Heads | KV | Mamba Heads | d_head | d_state | Ctx Len |
---|---|---|---|---|---|---|---|---|
H1 0.5B | 36 | 1024 | 8 | 2 | 24 | 64 / 64 | 128 | 4K, 16K-SFT |
H1 1.5B | 24 | 2048 | 8 | 2 | 48 | 128 / 64 | 256 | 128K |
H1 1.5B-d | 66 | 1280 | 6 | 2 | 24 | 128 / 64 | 256 | 128K |
H1 3B | 32 | 2560 | 10 | 2 | 32 | 128 / 128 | 256 | 128K |
H1 7B | 44 | 3072 | 12 | 2 | 24 | 128 / 128 | 256 | 256K |
H1 34B | 72 | 5120 | 20 | 4 | 32 | 128 / 128 | 256 | 256K |
autodoc FalconH1Config
FalconH1ForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon-H1-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon-H1-7B-Instruct")
message = ["Mamba is a snake with following properties "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
autodoc FalconH1ForCausalLM - forward
This HF implementation is contributed by younesbelkada and DhiaEddineRhaiem.