# MobileBERT
[MobileBERT](https://huggingface.co/papers/2004.02984) is a lightweight and efficient variant of BERT, specifically designed for resource-limited devices such as mobile phones. It retains BERT's architecture but significantly reduces model size and inference latency while maintaining strong performance on NLP tasks. MobileBERT achieves this through a bottleneck structure and carefully balanced self-attention and feedforward networks. The model is trained by knowledge transfer from a large BERT model with an inverted bottleneck structure.
You can find the original MobileBERT checkpoint under the [Google](https://huggingface.co/google/mobilebert-uncased) organization.
> [!TIP]
> Click on the MobileBERT models in the right sidebar for more examples of how to apply MobileBERT to different language tasks.
The example below demonstrates how to predict the `[MASK]` token with [`Pipeline`], [`AutoModel`], and from the command line.
```py
import torch
from transformers import pipeline
pipeline = pipeline(
task="fill-mask",
model="google/mobilebert-uncased",
torch_dtype=torch.float16,
device=0
)
pipeline("The capital of France is [MASK].")
```
```py
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"google/mobilebert-uncased",
)
model = AutoModelForMaskedLM.from_pretrained(
"google/mobilebert-uncased",
torch_dtype=torch.float16,
device_map="auto",
)
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")
```
```bash
echo -e "The capital of France is [MASK]." | transformers run --task fill-mask --model google/mobilebert-uncased --device 0
```
## Notes
- Inputs should be padded on the right because BERT uses absolute position embeddings.
## MobileBertConfig
[[autodoc]] MobileBertConfig
## MobileBertTokenizer
[[autodoc]] MobileBertTokenizer
## MobileBertTokenizerFast
[[autodoc]] MobileBertTokenizerFast
## MobileBert specific outputs
[[autodoc]] models.mobilebert.modeling_mobilebert.MobileBertForPreTrainingOutput
[[autodoc]] models.mobilebert.modeling_tf_mobilebert.TFMobileBertForPreTrainingOutput
## MobileBertModel
[[autodoc]] MobileBertModel
- forward
## MobileBertForPreTraining
[[autodoc]] MobileBertForPreTraining
- forward
## MobileBertForMaskedLM
[[autodoc]] MobileBertForMaskedLM
- forward
## MobileBertForNextSentencePrediction
[[autodoc]] MobileBertForNextSentencePrediction
- forward
## MobileBertForSequenceClassification
[[autodoc]] MobileBertForSequenceClassification
- forward
## MobileBertForMultipleChoice
[[autodoc]] MobileBertForMultipleChoice
- forward
## MobileBertForTokenClassification
[[autodoc]] MobileBertForTokenClassification
- forward
## MobileBertForQuestionAnswering
[[autodoc]] MobileBertForQuestionAnswering
- forward
## TFMobileBertModel
[[autodoc]] TFMobileBertModel
- call
## TFMobileBertForPreTraining
[[autodoc]] TFMobileBertForPreTraining
- call
## TFMobileBertForMaskedLM
[[autodoc]] TFMobileBertForMaskedLM
- call
## TFMobileBertForNextSentencePrediction
[[autodoc]] TFMobileBertForNextSentencePrediction
- call
## TFMobileBertForSequenceClassification
[[autodoc]] TFMobileBertForSequenceClassification
- call
## TFMobileBertForMultipleChoice
[[autodoc]] TFMobileBertForMultipleChoice
- call
## TFMobileBertForTokenClassification
[[autodoc]] TFMobileBertForTokenClassification
- call
## TFMobileBertForQuestionAnswering
[[autodoc]] TFMobileBertForQuestionAnswering
- call