PyTorch TensorFlow Flax SDPA FlashAttention
# DistilBERT [DistilBERT](https://huggingface.co/papers/1910.01108) is pretrained by knowledge distillation to create a smaller model with faster inference and requires less compute to train. Through a triple loss objective during pretraining, language modeling loss, distillation loss, cosine-distance loss, DistilBERT demonstrates similar performance to a larger transformer language model. You can find all the original DistilBERT checkpoints under the [DistilBERT](https://huggingface.co/distilbert) organization. > [!TIP] > Click on the DistilBERT models in the right sidebar for more examples of how to apply DistilBERT to different language tasks. The example below demonstrates how to classify text with [`Pipeline`], [`AutoModel`], and from the command line. ```py from transformers import pipeline classifier = pipeline( task="text-classification", model="distilbert-base-uncased-finetuned-sst-2-english", torch_dtype=torch.float16, device=0 ) result = classifier("I love using Hugging Face Transformers!") print(result) # Output: [{'label': 'POSITIVE', 'score': 0.9998}] ``` ```py import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( "distilbert/distilbert-base-uncased-finetuned-sst-2-english", ) model = AutoModelForSequenceClassification.from_pretrained( "distilbert/distilbert-base-uncased-finetuned-sst-2-english", torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa" ) inputs = tokenizer("I love using Hugging Face Transformers!", return_tensors="pt").to("cuda") with torch.no_grad(): outputs = model(**inputs) predicted_class_id = torch.argmax(outputs.logits, dim=-1).item() predicted_label = model.config.id2label[predicted_class_id] print(f"Predicted label: {predicted_label}") ``` ```bash echo -e "I love using Hugging Face Transformers!" | transformers run --task text-classification --model distilbert-base-uncased-finetuned-sst-2-english ``` ## Notes - DistilBERT doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`). - DistilBERT doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let us know if you need this option. ## DistilBertConfig [[autodoc]] DistilBertConfig ## DistilBertTokenizer [[autodoc]] DistilBertTokenizer ## DistilBertTokenizerFast [[autodoc]] DistilBertTokenizerFast ## DistilBertModel [[autodoc]] DistilBertModel - forward ## DistilBertForMaskedLM [[autodoc]] DistilBertForMaskedLM - forward ## DistilBertForSequenceClassification [[autodoc]] DistilBertForSequenceClassification - forward ## DistilBertForMultipleChoice [[autodoc]] DistilBertForMultipleChoice - forward ## DistilBertForTokenClassification [[autodoc]] DistilBertForTokenClassification - forward ## DistilBertForQuestionAnswering [[autodoc]] DistilBertForQuestionAnswering - forward ## TFDistilBertModel [[autodoc]] TFDistilBertModel - call ## TFDistilBertForMaskedLM [[autodoc]] TFDistilBertForMaskedLM - call ## TFDistilBertForSequenceClassification [[autodoc]] TFDistilBertForSequenceClassification - call ## TFDistilBertForMultipleChoice [[autodoc]] TFDistilBertForMultipleChoice - call ## TFDistilBertForTokenClassification [[autodoc]] TFDistilBertForTokenClassification - call ## TFDistilBertForQuestionAnswering [[autodoc]] TFDistilBertForQuestionAnswering - call ## FlaxDistilBertModel [[autodoc]] FlaxDistilBertModel - __call__ ## FlaxDistilBertForMaskedLM [[autodoc]] FlaxDistilBertForMaskedLM - __call__ ## FlaxDistilBertForSequenceClassification [[autodoc]] FlaxDistilBertForSequenceClassification - __call__ ## FlaxDistilBertForMultipleChoice [[autodoc]] FlaxDistilBertForMultipleChoice - __call__ ## FlaxDistilBertForTokenClassification [[autodoc]] FlaxDistilBertForTokenClassification - __call__ ## FlaxDistilBertForQuestionAnswering [[autodoc]] FlaxDistilBertForQuestionAnswering - __call__