From 7ecff0ccbb19d119fb6470a1de645c60c00931b0 Mon Sep 17 00:00:00 2001
From: ELanning <38930062+ELanning@users.noreply.github.com>
Date: Mon, 6 Jul 2020 06:14:57 -0700
Subject: [PATCH] Fix typo in training (#5510)

---
 docs/source/training.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/training.rst b/docs/source/training.rst
index c497fb4b601..bb06750462e 100644
--- a/docs/source/training.rst
+++ b/docs/source/training.rst
@@ -39,7 +39,7 @@ of the specified model are used to initialize the model. The
 library also includes a number of task-specific final layers or 'heads' whose
 weights are instantiated randomly when not present in the specified
 pre-trained model. For example, instantiating a model with
-``BertForSequenceClassification.from_pretrained('bert-base-uncased', num_classes=2)``
+``BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)``
 will create a BERT model instance with encoder weights copied from the
 ``bert-base-uncased`` model and a randomly initialized sequence
 classification head on top of the encoder with an output size of 2. Models