mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 05:10:06 +06:00

Transformers cli clean command (#37657 )

* transformers-cli -> transformers

* Chat command works with positional argument

* update doc references to transformers-cli

* doc headers

* deepspeed

---------

Co-authored-by: Joao Gante <joao@huggingface.co>

2025-04-30 12:15:43 +01:00

6.4 KiB

Raw Blame History

GPT-2

GPT-2 is a scaled up version of GPT, a causal transformer language model, with 10x more parameters and training data. The model was pretrained on a 40GB dataset to predict the next word in a sequence based on all the previous words. This approach enabled the model to perform many downstream tasks in a zero-shot setting.

The model architecture uses a unidirectional (causal) attention mechanism where each token can only attend to previous tokens, making it particularly effective for text generation tasks.

You can find all the original GPT-2 checkpoints under the OpenAI community organization.

Tip

Click on the GPT-2 models in the right sidebar for more examples of how to apply GPT-2 to different language tasks.

The example below demonstrates how to generate text with [Pipeline] or the [AutoModel], and from the command line.

import torch
from transformers import pipeline

pipeline = pipeline(task="text-generation", model="openai-community/gpt2", torch_dtype=torch.float16, device=0)
pipeline("Hello, I'm a language model")

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2", torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")

input_ids = tokenzier("Hello, I'm a language model". return_tensors="pt").to("cuda")

output = model.generate(**input_ids, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))

echo -e "Hello, I'm a language model" | transformers run --task text-generation --model openai-community/gpt2 --device 0

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the Quantization overview for more available quantization backends.

The example below uses bitsandbytes to only quantize the weights to 4-bits.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    "openai-community/gpt2-xl",
    quantization_config=quantization_config,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2-xl")
inputs = tokenizer("Once upon a time, there was a magical forest", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

Pad inputs on the right because GPT-2 uses absolute position embeddings.
GPT-2 can reuse previously computed key-value attention pairs. Access this feature with the past_key_values parameter in [GPT2Model.forward].
Enable the scale_attn_by_inverse_layer_idx and reorder_and_upcast_attn parameters to apply the training stability improvements from Mistral.

GPT2Config

autodoc GPT2Config

GPT2Tokenizer

autodoc GPT2Tokenizer - save_vocabulary

GPT2TokenizerFast

autodoc GPT2TokenizerFast

GPT2 specific outputs

autodoc models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput

autodoc models.gpt2.modeling_tf_gpt2.TFGPT2DoubleHeadsModelOutput

GPT2Model

autodoc GPT2Model - forward

GPT2LMHeadModel

autodoc GPT2LMHeadModel - forward

GPT2DoubleHeadsModel

autodoc GPT2DoubleHeadsModel - forward

GPT2ForQuestionAnswering

autodoc GPT2ForQuestionAnswering - forward

GPT2ForSequenceClassification

autodoc GPT2ForSequenceClassification - forward

GPT2ForTokenClassification

autodoc GPT2ForTokenClassification - forward

TFGPT2Model

autodoc TFGPT2Model - call

TFGPT2LMHeadModel

autodoc TFGPT2LMHeadModel - call

TFGPT2DoubleHeadsModel

autodoc TFGPT2DoubleHeadsModel - call

TFGPT2ForSequenceClassification

autodoc TFGPT2ForSequenceClassification - call

TFSequenceClassifierOutputWithPast

autodoc modeling_tf_outputs.TFSequenceClassifierOutputWithPast

TFGPT2Tokenizer

autodoc TFGPT2Tokenizer

FlaxGPT2Model

autodoc FlaxGPT2Model - call

FlaxGPT2LMHeadModel

autodoc FlaxGPT2LMHeadModel - call

6.4 KiB Raw Blame History