diff --git a/docs/source/training.rst b/docs/source/training.rst index c497fb4b601..bb06750462e 100644 --- a/docs/source/training.rst +++ b/docs/source/training.rst @@ -39,7 +39,7 @@ of the specified model are used to initialize the model. The library also includes a number of task-specific final layers or 'heads' whose weights are instantiated randomly when not present in the specified pre-trained model. For example, instantiating a model with -``BertForSequenceClassification.from_pretrained('bert-base-uncased', num_classes=2)`` +``BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)`` will create a BERT model instance with encoder weights copied from the ``bert-base-uncased`` model and a randomly initialized sequence classification head on top of the encoder with an output size of 2. Models