transformers/docs/source/en/model_doc/bert.md
Lysandre Debut d538293f62
Transformers cli clean command (#37657)
* transformers-cli -> transformers

* Chat command works with positional argument

* update doc references to transformers-cli

* doc headers

* deepspeed

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
2025-04-30 12:15:43 +01:00

9.2 KiB

PyTorch TensorFlow Flax SDPA

BERT

BERT is a bidirectional transformer pretrained on unlabeled text to predict masked tokens in a sentence and to predict whether one sentence follows another. The main idea is that by randomly masking some tokens, the model can train on text to the left and right, giving it a more thorough understanding. BERT is also very versatile because its learned language representations can be adapted for other NLP tasks by fine-tuning an additional layer or head.

You can find all the original BERT checkpoints under the BERT collection.

Tip

Click on the BERT models in the right sidebar for more examples of how to apply BERT to different language tasks.

The example below demonstrates how to predict the [MASK] token with [Pipeline], [AutoModel], and from the command line.

import torch
from transformers import pipeline

pipeline = pipeline(
    task="fill-mask",
    model="google-bert/bert-base-uncased",
    torch_dtype=torch.float16,
    device=0
)
pipeline("Plants create [MASK] through a process known as photosynthesis.")
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "google-bert/bert-base-uncased",
)
model = AutoModelForMaskedLM.from_pretrained(
    "google-bert/bert-base-uncased",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="sdpa"
)
inputs = tokenizer("Plants create [MASK] through a process known as photosynthesis.", return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits

masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)

print(f"The predicted token is: {predicted_token}")
echo -e "Plants create [MASK] through a process known as photosynthesis." | transformers run --task fill-mask --model google-bert/bert-base-uncased --device 0

Notes

  • Inputs should be padded on the right because BERT uses absolute position embeddings.

BertConfig

autodoc BertConfig - all

BertTokenizer

autodoc BertTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary

BertTokenizerFast

autodoc BertTokenizerFast

BertModel

autodoc BertModel - forward

BertForPreTraining

autodoc BertForPreTraining - forward

BertLMHeadModel

autodoc BertLMHeadModel - forward

BertForMaskedLM

autodoc BertForMaskedLM - forward

BertForNextSentencePrediction

autodoc BertForNextSentencePrediction - forward

BertForSequenceClassification

autodoc BertForSequenceClassification - forward

BertForMultipleChoice

autodoc BertForMultipleChoice - forward

BertForTokenClassification

autodoc BertForTokenClassification - forward

BertForQuestionAnswering

autodoc BertForQuestionAnswering - forward

TFBertTokenizer

autodoc TFBertTokenizer

TFBertModel

autodoc TFBertModel - call

TFBertForPreTraining

autodoc TFBertForPreTraining - call

TFBertModelLMHeadModel

autodoc TFBertLMHeadModel - call

TFBertForMaskedLM

autodoc TFBertForMaskedLM - call

TFBertForNextSentencePrediction

autodoc TFBertForNextSentencePrediction - call

TFBertForSequenceClassification

autodoc TFBertForSequenceClassification - call

TFBertForMultipleChoice

autodoc TFBertForMultipleChoice - call

TFBertForTokenClassification

autodoc TFBertForTokenClassification - call

TFBertForQuestionAnswering

autodoc TFBertForQuestionAnswering - call

FlaxBertModel

autodoc FlaxBertModel - call

FlaxBertForPreTraining

autodoc FlaxBertForPreTraining - call

FlaxBertForCausalLM

autodoc FlaxBertForCausalLM - call

FlaxBertForMaskedLM

autodoc FlaxBertForMaskedLM - call

FlaxBertForNextSentencePrediction

autodoc FlaxBertForNextSentencePrediction - call

FlaxBertForSequenceClassification

autodoc FlaxBertForSequenceClassification - call

FlaxBertForMultipleChoice

autodoc FlaxBertForMultipleChoice - call

FlaxBertForTokenClassification

autodoc FlaxBertForTokenClassification - call

FlaxBertForQuestionAnswering

autodoc FlaxBertForQuestionAnswering - call

Bert specific outputs

autodoc models.bert.modeling_bert.BertForPreTrainingOutput

autodoc models.bert.modeling_tf_bert.TFBertForPreTrainingOutput

autodoc models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput