From 8e1266de2b480fef402120dee21b9261dd89ea28 Mon Sep 17 00:00:00 2001 From: RogerSinghChugh <35698080+RogerSinghChugh@users.noreply.github.com> Date: Wed, 4 Jun 2025 22:26:47 +0530 Subject: [PATCH] New gpt neo model card (#38505) * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Commit for new_gpt_model_card. * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/gpt_neo.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/gpt_neo.md | 167 ++++++++++++++-------------- 1 file changed, 84 insertions(+), 83 deletions(-) diff --git a/docs/source/en/model_doc/gpt_neo.md b/docs/source/en/model_doc/gpt_neo.md index f90e0d18498..3830f04378c 100644 --- a/docs/source/en/model_doc/gpt_neo.md +++ b/docs/source/en/model_doc/gpt_neo.md @@ -14,93 +14,94 @@ rendered properly in your Markdown viewer. --> -# GPT Neo - -
-PyTorch -Flax -FlashAttention -
- -## Overview - -The GPTNeo model was released in the [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) repository by Sid -Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like causal language model trained on the -[Pile](https://pile.eleuther.ai/) dataset. - -The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of -256 tokens. - -This model was contributed by [valhalla](https://huggingface.co/valhalla). - -## Usage example - -The `generate()` method can be used to generate text using GPT Neo model. - -```python ->>> from transformers import GPTNeoForCausalLM, GPT2Tokenizer - ->>> model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B") ->>> tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B") - ->>> prompt = ( -... "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " -... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " -... "researchers was the fact that the unicorns spoke perfect English." -... ) - ->>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids - ->>> gen_tokens = model.generate( -... input_ids, -... do_sample=True, -... temperature=0.9, -... max_length=100, -... ) ->>> gen_text = tokenizer.batch_decode(gen_tokens)[0] -``` - -## Combining GPT-Neo and Flash Attention 2 - -First, make sure to install the latest version of Flash Attention 2 to include the sliding window attention feature, and make sure your hardware is compatible with Flash-Attention 2. More details are available [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2) concerning the installation. - -Make sure as well to load your model in half-precision (e.g. `torch.float16`). - -To load and run a model using Flash Attention 2, refer to the snippet below: - -```python ->>> import torch ->>> from transformers import AutoModelForCausalLM, AutoTokenizer ->>> device = "cuda" # the device to load the model onto - ->>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B", torch_dtype=torch.float16, attn_implementation="flash_attention_2") ->>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B") - ->>> prompt = "def hello_world():" - ->>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device) ->>> model.to(device) - ->>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True) ->>> tokenizer.batch_decode(generated_ids)[0] -"def hello_world():\n >>> run_script("hello.py")\n >>> exit(0)\n<|endoftext|>" -``` - -### Expected speedups - -Below is an expected speedup diagram that compares pure inference time between the native implementation in transformers using `EleutherAI/gpt-neo-2.7B` checkpoint and the Flash Attention 2 version of the model. -Note that for GPT-Neo it is not possible to train / run on very long context as the max [position embeddings](https://huggingface.co/EleutherAI/gpt-neo-2.7B/blob/main/config.json#L58 ) is limited to 2048 - but this is applicable to all gpt-neo models and not specific to FA-2 - -
- +
+
+ PyTorch + Flax + FlashAttention +
-## Resources +## GPT-Neo -- [Text classification task guide](../tasks/sequence_classification) -- [Causal language modeling task guide](../tasks/language_modeling) +[GPT-Neo](https://zenodo.org/records/5297715) is an open-source alternative to GPT-2 and GPT-3 models, built with Mesh TensorFlow for TPUs. GPT-Neo uses local attention in every other layer for more efficiency. It is trained on the [Pile](https://huggingface.co/datasets/EleutherAI/pile), a diverse dataset consisting of 22 smaller high-quality datasets. + + +You can find all the original GPT-Neo checkpoints under the [EleutherAI](https://huggingface.co/EleutherAI?search_models=gpt-neo) organization. + +> [!TIP] +> Click on the GPT-Neo models in the right sidebar for more examples of how to apply GPT Neo to different language tasks. + +The example below demonstrates how to generate text with [`Pipeline`] or the [`AutoModel`], and from the command line. + + + + +```py +import torch +from transformers import pipeline + +pipeline = pipeline(task="text-generation", model="EleutherAI/gpt-neo-1.3B", torch_dtype=torch.float16, device=0) +pipeline("Hello, I'm a language model") +``` + + + +```py +import torch +from transformers import AutoModelForCausalLM, AutoTokenizer + +model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B", torch_dtype=torch.float16, device_map="auto", attn_implementation="flash_attention_2") +tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B") + +input_ids = tokenizer("Hello, I'm a language model", return_tensors="pt").to("cuda") + +output = model.generate(**input_ids) +print(tokenizer.decode(output[0], skip_special_tokens=True)) +``` + + + + +```bash +echo -e "Hello, I'm a language model" | transformers-cli run --task text-generation --model EleutherAI/gpt-neo-1.3B --device 0 +``` + + + + +Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends. + +The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to 4-bits. + +```py +import torch +from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig + +quantization_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_quant_type="nf4", + bnb_4bit_compute_dtype="float16", + bnb_4bit_use_double_quant=True +) + +model = AutoModelForCausalLM.from_pretrained( + "EleutherAI/gpt-neo-2.7B", + quantization_config=quantization_config, + device_map="auto" +) + +tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B") +inputs = tokenizer("Hello, I'm a language model", return_tensors="pt").to("cuda") +outputs = model.generate(**inputs, max_new_tokens=100) +print(tokenizer.decode(outputs[0], skip_special_tokens=True)) +``` + +## Notes + +- Pad inputs on the right because GPT-Neo uses absolute position embeddings. ## GPTNeoConfig