mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

* toctree

* not-doctested.txt

* collapse sections

* feedback

* update

* rewrite get started sections

* fixes

* fix

* loading models

* fix

* customize models

* share

* fix link

* contribute part 1

* contribute pt 2

* fix toctree

* tokenization pt 1

* Add new model (#32615)

* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* "to be not" -> "not to be" (#32636)

* "to be not" -> "not to be"

* Update sam.md

* Update trainer.py

* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* fix hfoption tag

* tokenization pt. 2

* image processor

* fix toctree

* backbones

* feature extractor

* fix file name

* processor

* update not-doctested

* update

* make style

* fix toctree

* revision

* make fixup

* fix toctree

* fix

* make style

* fix hfoption tag

* pipeline

* pipeline gradio

* pipeline web server

* add pipeline

* fix toctree

* not-doctested

* prompting

* llm optims

* fix toctree

* fixes

* cache

* text generation

* fix

* chat pipeline

* chat stuff

* xla

* torch.compile

* cpu inference

* toctree

* gpu inference

* agents and tools

* gguf/tiktoken

* finetune

* toctree

* trainer

* trainer pt 2

* optims

* optimizers

* accelerate

* parallelism

* fsdp

* update

* distributed cpu

* hardware training

* gpu training

* gpu training 2

* peft

* distrib debug

* deepspeed 1

* deepspeed 2

* chat toctree

* quant pt 1

* quant pt 2

* fix toctree

* fix

* fix

* quant pt 3

* quant pt 4

* serialization

* torchscript

* scripts

* tpu

* review

* model addition timeline

* modular

* more reviews

* reviews

* fix toctree

* reviews reviews

* continue reviews

* more reviews

* modular transformers

* more review

* zamba2

* fix

* all frameworks

* pytorch

* supported model frameworks

* flashattention

* rm check_table

* not-doctested.txt

* rm check_support_list.py

* feedback

* updates/feedback

* review

* feedback

* fix

* update

* feedback

* updates

* update

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

2025-03-03 10:33:46 -08:00

5.3 KiB

Raw Blame History

GPTSAN-japanese

This model is in maintenance mode only, we don't accept any new PRs changing its code. If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2. You can do so by running the following command: pip install -U transformers==4.40.2.

Overview

The GPTSAN-japanese model was released in the repository by Toshiyuki Sakamoto (tanreinama).

GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and support both Text Generation and Masked Language Modeling tasks. These basic tasks similarly can fine-tune for translation or summarization.

Usage example

The generate() method can be used to generate text using GPTSAN-Japanese model.

>>> from transformers import AutoModel, AutoTokenizer
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("Tanrei/GPTSAN-japanese")
>>> model = AutoModel.from_pretrained("Tanrei/GPTSAN-japanese").cuda()
>>> x_tok = tokenizer("は、", prefix_text="織田信長", return_tensors="pt")
>>> torch.manual_seed(0)
>>> gen_tok = model.generate(x_tok.input_ids.cuda(), token_type_ids=x_tok.token_type_ids.cuda(), max_new_tokens=20)
>>> tokenizer.decode(gen_tok[0])
'織田信長は、2004年に『戦国BASARA』のために、豊臣秀吉'

GPTSAN Features

GPTSAN has some unique features. It has a model structure of Prefix-LM. It works as a shifted Masked Language Model for Prefix Input tokens. Un-prefixed inputs behave like normal generative models. The Spout vector is a GPTSAN specific input. Spout is pre-trained with random inputs, but you can specify a class of text or an arbitrary vector during fine-tuning. This allows you to indicate the tendency of the generated text. GPTSAN has a sparse Feed Forward based on Switch-Transformer. You can also add other layers and train them partially. See the original GPTSAN repository for details.

Prefix-LM Model

GPTSAN has the structure of the model named Prefix-LM in the T5 paper. (The original GPTSAN repository calls it hybrid) In GPTSAN, the Prefix part of Prefix-LM, that is, the input position that can be referenced by both tokens, can be specified with any length. Arbitrary lengths can also be specified differently for each batch. This length applies to the text entered in prefix_text for the tokenizer. The tokenizer returns the mask of the Prefix part of Prefix-LM as token_type_ids. The model treats the part where token_type_ids is 1 as a Prefix part, that is, the input can refer to both tokens before and after.

Usage tips

Specifying the Prefix part is done with a mask passed to self-attention. When token_type_ids=None or all zero, it is equivalent to regular causal mask

for example:

x_token = tokenizer("ｱｲｳｴ") input_ids: | SOT | SEG | ｱ | ｲ | ｳ | ｴ | token_type_ids: | 1 | 0 | 0 | 0 | 0 | 0 | prefix_lm_mask: SOT | 1 0 0 0 0 0 | SEG | 1 1 0 0 0 0 | ｱ | 1 1 1 0 0 0 | ｲ | 1 1 1 1 0 0 | ｳ | 1 1 1 1 1 0 | ｴ | 1 1 1 1 1 1 |

x_token = tokenizer("", prefix_text="ｱｲｳｴ") input_ids: | SOT | ｱ | ｲ | ｳ | ｴ | SEG | token_type_ids: | 1 | 1 | 1 | 1 | 1 | 0 | prefix_lm_mask: SOT | 1 1 1 1 1 0 | ｱ | 1 1 1 1 1 0 | ｲ | 1 1 1 1 1 0 | ｳ | 1 1 1 1 1 0 | ｴ | 1 1 1 1 1 0 | SEG | 1 1 1 1 1 1 |

x_token = tokenizer("ｳｴ", prefix_text="ｱｲ") input_ids: | SOT | ｱ | ｲ | SEG | ｳ | ｴ | token_type_ids: | 1 | 1 | 1 | 0 | 0 | 0 | prefix_lm_mask: SOT | 1 1 1 0 0 0 | ｱ | 1 1 1 0 0 0 | ｲ | 1 1 1 0 0 0 | SEG | 1 1 1 1 0 0 | ｳ | 1 1 1 1 1 0 | ｴ | 1 1 1 1 1 1 |

Spout Vector

A Spout Vector is a special vector for controlling text generation. This vector is treated as the first embedding in self-attention to bring extraneous attention to the generated tokens. In the pre-trained model published from Tanrei/GPTSAN-japanese, the Spout Vector is a 128-dimensional vector that passes through 8 fully connected layers in the model and is projected into the space acting as external attention. The Spout Vector projected by the fully connected layer is split to be passed to all self-attentions.

GPTSanJapaneseConfig

autodoc GPTSanJapaneseConfig

GPTSanJapaneseTokenizer

autodoc GPTSanJapaneseTokenizer

GPTSanJapaneseModel

autodoc GPTSanJapaneseModel

GPTSanJapaneseForConditionalGeneration

autodoc GPTSanJapaneseForConditionalGeneration - forward

5.3 KiB Raw Blame History