mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 12:50:06 +06:00
New gpt neo model card (#38505)
* Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Commit for new_gpt_model_card. * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
parent
8046aff520
commit
8e1266de2b
@ -14,93 +14,94 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# GPT Neo
|
<div style="float: right;">
|
||||||
|
<div class="flex flex-wrap space-x-1">
|
||||||
<div class="flex flex-wrap space-x-1">
|
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||||
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
|
||||||
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
|
">
|
||||||
">
|
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
|
||||||
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
|
</div>
|
||||||
</div>
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
The GPTNeo model was released in the [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) repository by Sid
|
|
||||||
Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like causal language model trained on the
|
|
||||||
[Pile](https://pile.eleuther.ai/) dataset.
|
|
||||||
|
|
||||||
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
|
|
||||||
256 tokens.
|
|
||||||
|
|
||||||
This model was contributed by [valhalla](https://huggingface.co/valhalla).
|
|
||||||
|
|
||||||
## Usage example
|
|
||||||
|
|
||||||
The `generate()` method can be used to generate text using GPT Neo model.
|
|
||||||
|
|
||||||
```python
|
|
||||||
>>> from transformers import GPTNeoForCausalLM, GPT2Tokenizer
|
|
||||||
|
|
||||||
>>> model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
|
|
||||||
>>> tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
|
|
||||||
|
|
||||||
>>> prompt = (
|
|
||||||
... "In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
|
|
||||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the "
|
|
||||||
... "researchers was the fact that the unicorns spoke perfect English."
|
|
||||||
... )
|
|
||||||
|
|
||||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
|
||||||
|
|
||||||
>>> gen_tokens = model.generate(
|
|
||||||
... input_ids,
|
|
||||||
... do_sample=True,
|
|
||||||
... temperature=0.9,
|
|
||||||
... max_length=100,
|
|
||||||
... )
|
|
||||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
|
||||||
```
|
|
||||||
|
|
||||||
## Combining GPT-Neo and Flash Attention 2
|
|
||||||
|
|
||||||
First, make sure to install the latest version of Flash Attention 2 to include the sliding window attention feature, and make sure your hardware is compatible with Flash-Attention 2. More details are available [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2) concerning the installation.
|
|
||||||
|
|
||||||
Make sure as well to load your model in half-precision (e.g. `torch.float16`).
|
|
||||||
|
|
||||||
To load and run a model using Flash Attention 2, refer to the snippet below:
|
|
||||||
|
|
||||||
```python
|
|
||||||
>>> import torch
|
|
||||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
||||||
>>> device = "cuda" # the device to load the model onto
|
|
||||||
|
|
||||||
>>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B", torch_dtype=torch.float16, attn_implementation="flash_attention_2")
|
|
||||||
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
|
|
||||||
|
|
||||||
>>> prompt = "def hello_world():"
|
|
||||||
|
|
||||||
>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
|
|
||||||
>>> model.to(device)
|
|
||||||
|
|
||||||
>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
|
|
||||||
>>> tokenizer.batch_decode(generated_ids)[0]
|
|
||||||
"def hello_world():\n >>> run_script("hello.py")\n >>> exit(0)\n<|endoftext|>"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Expected speedups
|
|
||||||
|
|
||||||
Below is an expected speedup diagram that compares pure inference time between the native implementation in transformers using `EleutherAI/gpt-neo-2.7B` checkpoint and the Flash Attention 2 version of the model.
|
|
||||||
Note that for GPT-Neo it is not possible to train / run on very long context as the max [position embeddings](https://huggingface.co/EleutherAI/gpt-neo-2.7B/blob/main/config.json#L58 ) is limited to 2048 - but this is applicable to all gpt-neo models and not specific to FA-2
|
|
||||||
|
|
||||||
<div style="text-align: center">
|
|
||||||
<img src="https://user-images.githubusercontent.com/49240599/272241893-b1c66b75-3a48-4265-bc47-688448568b3d.png">
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Resources
|
## GPT-Neo
|
||||||
|
|
||||||
- [Text classification task guide](../tasks/sequence_classification)
|
[GPT-Neo](https://zenodo.org/records/5297715) is an open-source alternative to GPT-2 and GPT-3 models, built with Mesh TensorFlow for TPUs. GPT-Neo uses local attention in every other layer for more efficiency. It is trained on the [Pile](https://huggingface.co/datasets/EleutherAI/pile), a diverse dataset consisting of 22 smaller high-quality datasets.
|
||||||
- [Causal language modeling task guide](../tasks/language_modeling)
|
|
||||||
|
|
||||||
|
You can find all the original GPT-Neo checkpoints under the [EleutherAI](https://huggingface.co/EleutherAI?search_models=gpt-neo) organization.
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> Click on the GPT-Neo models in the right sidebar for more examples of how to apply GPT Neo to different language tasks.
|
||||||
|
|
||||||
|
The example below demonstrates how to generate text with [`Pipeline`] or the [`AutoModel`], and from the command line.
|
||||||
|
|
||||||
|
<hfoptions id="usage">
|
||||||
|
<hfoption id="Pipeline">
|
||||||
|
|
||||||
|
```py
|
||||||
|
import torch
|
||||||
|
from transformers import pipeline
|
||||||
|
|
||||||
|
pipeline = pipeline(task="text-generation", model="EleutherAI/gpt-neo-1.3B", torch_dtype=torch.float16, device=0)
|
||||||
|
pipeline("Hello, I'm a language model")
|
||||||
|
```
|
||||||
|
</hfoption>
|
||||||
|
<hfoption id="AutoModel">
|
||||||
|
|
||||||
|
```py
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
|
||||||
|
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B", torch_dtype=torch.float16, device_map="auto", attn_implementation="flash_attention_2")
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
|
||||||
|
|
||||||
|
input_ids = tokenizer("Hello, I'm a language model", return_tensors="pt").to("cuda")
|
||||||
|
|
||||||
|
output = model.generate(**input_ids)
|
||||||
|
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
|
||||||
|
</hfoption>
|
||||||
|
<hfoption id="transformers CLI">
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo -e "Hello, I'm a language model" | transformers-cli run --task text-generation --model EleutherAI/gpt-neo-1.3B --device 0
|
||||||
|
```
|
||||||
|
|
||||||
|
</hfoption>
|
||||||
|
</hfoptions>
|
||||||
|
|
||||||
|
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
|
||||||
|
|
||||||
|
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to 4-bits.
|
||||||
|
|
||||||
|
```py
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
||||||
|
|
||||||
|
quantization_config = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True,
|
||||||
|
bnb_4bit_quant_type="nf4",
|
||||||
|
bnb_4bit_compute_dtype="float16",
|
||||||
|
bnb_4bit_use_double_quant=True
|
||||||
|
)
|
||||||
|
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
"EleutherAI/gpt-neo-2.7B",
|
||||||
|
quantization_config=quantization_config,
|
||||||
|
device_map="auto"
|
||||||
|
)
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
|
||||||
|
inputs = tokenizer("Hello, I'm a language model", return_tensors="pt").to("cuda")
|
||||||
|
outputs = model.generate(**inputs, max_new_tokens=100)
|
||||||
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Pad inputs on the right because GPT-Neo uses absolute position embeddings.
|
||||||
|
|
||||||
## GPTNeoConfig
|
## GPTNeoConfig
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user