transformers/docs/source/en/model_doc/big_bird.md
RogerSinghChugh b73faef52f
Updated BigBird Model card as per #36979. (#37959)
* Updated BigBird Model card as per #36979.

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:24:28 -07:00

8.9 KiB
Raw Blame History

PyTorch Flax

BigBird

BigBird is a transformer model built to handle sequence lengths up to 4096 compared to 512 for BERT. Traditional transformers struggle with long inputs because attention gets really expensive as the sequence length grows. BigBird fixes this by using a sparse attention mechanism, which means it doesnt try to look at everything at once. Instead, it mixes in local attention, random attention, and a few global tokens to process the whole input. This combination gives it the best of both worlds. It keeps the computation efficient while still capturing enough of the sequence to understand it well. Because of this, BigBird is great at tasks involving long documents, like question answering, summarization, and genomic applications.

You can find all the original BigBird checkpoints under the Google organization.

Tip

Click on the BigBird models in the right sidebar for more examples of how to apply BigBird to different language tasks.

The example below demonstrates how to predict the [MASK] token with [Pipeline], [AutoModel], and from the command line.

import torch
from transformers import pipeline

pipeline = pipeline(
    task="fill-mask",
    model="google/bigbird-roberta-base",
    torch_dtype=torch.float16,
    device=0
)
pipeline("Plants create [MASK] through a process known as photosynthesis.")
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "google/bigbird-roberta-base",
)
model = AutoModelForMaskedLM.from_pretrained(
    "google/bigbird-roberta-base",
    torch_dtype=torch.float16,
    device_map="auto",
)
inputs = tokenizer("Plants create [MASK] through a process known as photosynthesis.", return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits

masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)

print(f"The predicted token is: {predicted_token}")
!echo -e "Plants create [MASK] through a process known as photosynthesis." | transformers-cli run --task fill-mask --model google/bigbird-roberta-base --device 0

Notes

  • Inputs should be padded on the right because BigBird uses absolute position embeddings.
  • BigBird supports original_full and block_sparse attention. If the input sequence length is less than 1024, it is recommended to use original_full since sparse patterns don't offer much benefit for smaller inputs.
  • The current implementation uses window size of 3 blocks and 2 global blocks, only supports the ITC-implementation, and doesn't support num_random_blocks=0.
  • The sequence length must be divisible by the block size.

Resources

  • Read the BigBird blog post for more details about how its attention works.

BigBirdConfig

autodoc BigBirdConfig

BigBirdTokenizer

autodoc BigBirdTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary

BigBirdTokenizerFast

autodoc BigBirdTokenizerFast

BigBird specific outputs

autodoc models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput

BigBirdModel

autodoc BigBirdModel - forward

BigBirdForPreTraining

autodoc BigBirdForPreTraining - forward

BigBirdForCausalLM

autodoc BigBirdForCausalLM - forward

BigBirdForMaskedLM

autodoc BigBirdForMaskedLM - forward

BigBirdForSequenceClassification

autodoc BigBirdForSequenceClassification - forward

BigBirdForMultipleChoice

autodoc BigBirdForMultipleChoice - forward

BigBirdForTokenClassification

autodoc BigBirdForTokenClassification - forward

BigBirdForQuestionAnswering

autodoc BigBirdForQuestionAnswering - forward

FlaxBigBirdModel

autodoc FlaxBigBirdModel - call

FlaxBigBirdForPreTraining

autodoc FlaxBigBirdForPreTraining - call

FlaxBigBirdForCausalLM

autodoc FlaxBigBirdForCausalLM - call

FlaxBigBirdForMaskedLM

autodoc FlaxBigBirdForMaskedLM - call

FlaxBigBirdForSequenceClassification

autodoc FlaxBigBirdForSequenceClassification - call

FlaxBigBirdForMultipleChoice

autodoc FlaxBigBirdForMultipleChoice - call

FlaxBigBirdForTokenClassification

autodoc FlaxBigBirdForTokenClassification - call

FlaxBigBirdForQuestionAnswering

autodoc FlaxBigBirdForQuestionAnswering - call