Documentation additions

2025-08-01 18:51:14 +06:00 · 2019-08-28 09:37:27 -04:00 · 2019-08-28 09:37:27 -04:00 · 1dc43e56c9
commit 1dc43e56c9
parent 912a377e90
4 changed files with 56 additions and 4 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -48,3 +48,4 @@ The library currently contains PyTorch implementations, pre-trained model weight
    model_doc/xlm
    model_doc/xlnet
    model_doc/roberta
    model_doc/distilbert
--- a/docs/source/model_doc/distilbert.rst
+++ b/docs/source/model_doc/distilbert.rst
@ -0,0 +1,43 @@
 DistilBERT
 ----------------------------------------------------
 ``DistilBertConfig``
 ~~~~~~~~~~~~~~~~~~~~~
 .. autoclass:: pytorch_transformers.DistilBertConfig
    :members:
 ``DistilBertTokenizer``
 ~~~~~~~~~~~~~~~~~~~~~
 .. autoclass:: pytorch_transformers.DistilBertTokenizer
    :members:
 ``DistilBertModel``
 ~~~~~~~~~~~~~~~~~~~~
 .. autoclass:: pytorch_transformers.DistilBertModel
    :members:
 ``DistilBertForMaskedLM``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. autoclass:: pytorch_transformers.DistilBertForMaskedLM
    :members:
 ``DistilBertForSequenceClassification``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. autoclass:: pytorch_transformers.DistilBertForSequenceClassification
    :members:
 ``DistilBertForQuestionAnswering``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. autoclass:: pytorch_transformers.DistilBertForQuestionAnswering
    :members:
--- a/docs/source/pretrained_models.rst
+++ b/docs/source/pretrained_models.rst
@ -111,5 +111,13 @@ Here is the full list of the currently provided pretrained models together with
 |                   |                                                            | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__.                                            |
 |                   |                                                            | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__)                                                   |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 | DistilBERT        | ``distilbert-base-uncased``                                | | 6-layer, 768-hidden, 12-heads, 66M parameters                                                                                       |
 |                   |                                                            | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint                                                   |
 |                   |                                                            | (see `details <https://medium.com/@victorsanh/8cf3380435b5>`__)                                                                       |
 |                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 |                   | ``distilbert-base-uncased-distilled-squad``                | | 6-layer, 768-hidden, 12-heads, 66M parameters                                                                                       |
 |                   |                                                            | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer.                 |
 |                   |                                                            | (see `details <https://medium.com/@victorsanh/8cf3380435b5>`__)                                                                       |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 .. <https://huggingface.co/pytorch-transformers/examples.html>`__
--- a/pytorch_transformers/modeling_distilbert.py
+++ b/pytorch_transformers/modeling_distilbert.py
@ -433,7 +433,7 @@ DISTILBERT_START_DOCSTRING = r"""
    Here are the differences between the interface of Bert and DistilBert:
-    - DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belong to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
+    - DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
    - DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
    For more information on DistilBERT, please refer to our
@ -450,9 +450,9 @@ DISTILBERT_START_DOCSTRING = r"""
 DISTILBERT_INPUTS_DOCSTRING = r"""
    Inputs:
-        **input_ids**L ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
+        **input_ids** ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``:
-            Indices oof input sequence tokens in the vocabulary.
+            Indices of input sequence tokens in the vocabulary.
-            The input sequences should start with `[CLS]` and `[SEP]` tokens.
+            The input sequences should start with `[CLS]` and end with `[SEP]` tokens.
            For now, ONLY BertTokenizer(`bert-base-uncased`) is supported and you should use this tokenizer when using DistilBERT.
        **attention_mask**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size, sequence_length)``: