From 0d79de7322f21063e615d555704e14f600c6eb40 Mon Sep 17 00:00:00 2001 From: Amine Abdaoui Date: Mon, 5 Oct 2020 13:50:56 +0200 Subject: [PATCH] docs(pretrained_models): fix num parameters (#7575) * docs(pretrained_models): fix num parameters * fix(pretrained_models): correct typo Co-authored-by: Amin --- docs/source/pretrained_models.rst | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/source/pretrained_models.rst b/docs/source/pretrained_models.rst index cda55295da7..812b5f894f9 100644 --- a/docs/source/pretrained_models.rst +++ b/docs/source/pretrained_models.rst @@ -11,26 +11,26 @@ For a list that includes community-uploaded models, refer to `https://huggingfac | BERT | ``bert-base-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | | | | | Trained on lower-cased English text. | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-large-uncased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. | +| | ``bert-large-uncased`` | | 24-layer, 1024-hidden, 16-heads, 336M parameters. | | | | | Trained on lower-cased English text. | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``bert-base-cased`` | | 12-layer, 768-hidden, 12-heads, 109M parameters. | | | | | Trained on cased English text. | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. | +| | ``bert-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 335M parameters. | | | | | Trained on cased English text. | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 168M parameters. | | | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias | | | | | | | | (see `details `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 179M parameters. | | | | | Trained on cased text in the top 104 languages with the largest Wikipedias | | | | | | | | (see `details `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 103M parameters. | | | | | Trained on cased Chinese Simplified and Traditional text. | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | @@ -38,22 +38,22 @@ For a list that includes community-uploaded models, refer to `https://huggingfac | | | | | | | (see `details on deepset.ai website `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. | +| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 336M parameters. | | | | | Trained on lower-cased English text using Whole-Word-Masking | | | | | | | | (see `details `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. | +| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 335M parameters. | | | | | Trained on cased English text using Whole-Word-Masking | | | | | | | | (see `details `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. | +| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 336M parameters. | | | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD | | | | | | | | (see details of fine-tuning in the `example section `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 335M parameters | | | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD | | | | | | | | (see `details of fine-tuning in the example section `__) | @@ -73,31 +73,31 @@ For a list that includes community-uploaded models, refer to `https://huggingfac | | | | | | | (see `details on dbmdz repository `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 111M parameters. | | | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, | | | | | `fugashi `__ which is a wrapper around `MeCab `__. | | | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. | | | | | | | | (see `details on cl-tohoku repository `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 111M parameters. | | | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, | | | | | `fugashi `__ which is a wrapper around `MeCab `__. | | | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. | | | | | | | | (see `details on cl-tohoku repository `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 90M parameters. | | | | | Trained on Japanese text. Text is tokenized into characters. | | | | | | | | (see `details on cl-tohoku repository `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 90M parameters. | | | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. | | | | | | | | (see `details on cl-tohoku repository `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ -| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | +| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 125M parameters. | | | | | Trained on cased Finnish text. | | | | | | | | (see `details on turkunlp.org `__). |