[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-07-31 02:02:21 +06:00 · 2020-12-12 00:24:42 +01:00 · 2020-12-12 00:24:42 +01:00 · 3552d0e0d8
commit 3552d0e0d8
parent 29e4597950
758 changed files with 50 additions and 45116 deletions
--- a/docs/source/model_sharing.rst
+++ b/docs/source/model_sharing.rst
@ -60,7 +60,7 @@ Basic steps
 In order to upload a model, you'll need to first create a git repo. This repo will live on the model hub, allowing
 users to clone it and you (and your organization members) to push to it.

-You can create a model repo directly from the website, `here <https://huggingface.co/new>`.
+You can create a model repo **directly from `the /new page on the website <https://huggingface.co/new>`__.**

 Alternatively, you can use the ``transformers-cli``. The next steps describe that process:

@ -82,12 +82,12 @@ This creates a repo on the model hub, which can be cloned.

 .. code-block:: bash

-    git clone https://huggingface.co/username/your-model-name
-
    # Make sure you have git-lfs installed
    # (https://git-lfs.github.com/)
    git lfs install

+    git clone https://huggingface.co/username/your-model-name
+
 When you have your local clone of your repo and lfs installed, you can then add/remove from that clone as you would
 with any other git repo.

@ -98,8 +98,12 @@ with any other git repo.
    echo "hello" >> README.md
    git add . && git commit -m "Update from $USER"

-We are intentionally not wrapping git too much, so as to stay intuitive and easy-to-use.
+We are intentionally not wrapping git too much, so that you can go on with the workflow you're used to and the tools
+you already know.

+The only learning curve you might have compared to regular git is the one for git-lfs. The documentation at
+`git-lfs.github.com <https://git-lfs.github.com/>`__ is decent, but we'll work on a tutorial with some tips and tricks
+in the coming weeks!

 Make your model work on all frameworks
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -110,7 +114,7 @@ Make your model work on all frameworks
 You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
 PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
 your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's
-super easy to do (and in a future version, it will all be automatic). You will need to install both PyTorch and
+super easy to do (and in a future version, it might all be automatic). You will need to install both PyTorch and
 TensorFlow for this step, but you don't need to worry about the GPU, so it should be very easy. Check the `TensorFlow
 installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__ and/or the `PyTorch
 installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
@ -192,7 +196,7 @@ status`` command:
    git add --all
    git status

-Finally, the files should be comitted:
+Finally, the files should be committed:

 .. code-block:: bash

@ -210,23 +214,20 @@ This will upload the folder containing the weights, tokenizer and configuration
 Add a model card
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-To make sure everyone knows what your model can do, what its limitations and potential bias or ethetical
-considerations, please add a README.md model card to the 🤗 Transformers repo under `model_cards/`. It should then be
-placed in a subfolder with your username or organization, then another subfolder named like your model
-(`awesome-name-you-picked`). Or just click on the "Create a model card on GitHub" button on the model page, it will get
-you directly to the right location. If you need one, `here <https://github.com/huggingface/model_card>`__ is a model
-card template (meta-suggestions are welcome).
+To make sure everyone knows what your model can do, what its limitations, potential bias or ethical considerations are,
+please add a README.md model card to your model repo. You can just create it, or there's also a convenient button
+titled "Add a README.md" on your model page. A model card template can be found `here
+<https://github.com/huggingface/model_card>`__ (meta-suggestions are welcome). model card template (meta-suggestions
+are welcome).
+
+.. note::
+
+    Model cards used to live in the 🤗 Transformers repo under `model_cards/`, but for consistency and scalability we
+    migrated every model card from the repo to its corresponding huggingface.co model repo.

 If your model is fine-tuned from another model coming from the model hub (all 🤗 Transformers pretrained models do),
 don't forget to link to its model card so that people can fully trace how your model was built.

-If you have never made a pull request to the 🤗 Transformers repo, look at the :doc:`contributing guide <contributing>`
-to see the steps to follow.
-
-.. note::
-
-    You can also send your model card in the folder you uploaded with the CLI by placing it in a `README.md` file
-    inside `path/to/awesome-name-you-picked/`.

 Using your model
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -262,7 +263,8 @@ First you need to install `git-lfs` in the environment used by the notebook:

    sudo apt-get install git-lfs

-Then you can use the :obj:`transformers-cli` to create your new repo:
+Then you can use either create a repo directly from `huggingface.co <https://huggingface.co/>`__ , or use the
+:obj:`transformers-cli` to create it:


 .. code-block:: bash
@ -274,13 +276,14 @@ Once it's created, you can clone it and configure it (replace username by your u

 .. code-block:: bash

+    git lfs install
+
    git clone https://username:password@huggingface.co/username/your-model-name
    # Alternatively if you have a token,
    # you can use it instead of your password
    git clone https://username:token@huggingface.co/username/your-model-name

    cd your-model-name
-    git lfs install
    git config --global user.email "email@example.com"
    # Tip: using the same email than for your huggingface.co account will link your commits to your profile
    git config --global user.name "Your name"
--- a/model_cards/Cinnamon/electra-small-japanese-discriminator/README.md
+++ b/model_cards/Cinnamon/electra-small-japanese-discriminator/README.md
@ -1,20 +0,0 @@
---
-language: ja
-license: apache-2.0
---
-
-## Japanese ELECTRA-small
-
-We provide a Japanese **ELECTRA-Small** model, as described in [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-Our pretraining process employs subword units derived from the [Japanese Wikipedia](https://dumps.wikimedia.org/jawiki/latest), using the [Byte-Pair Encoding](https://www.aclweb.org/anthology/P16-1162.pdf) method and building on an initial tokenization with [mecab-ipadic-NEologd](https://github.com/neologd/mecab-ipadic-neologd). For optimal performance, please take care to set your MeCab dictionary appropriately.
-
-## How to use the discriminator in `transformers`
-
-```
-from transformers import BertJapaneseTokenizer, ElectraForPreTraining
-
-tokenizer = BertJapaneseTokenizer.from_pretrained('Cinnamon/electra-small-japanese-discriminator', mecab_kwargs={"mecab_option": "-d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd"})
-
-model = ElectraForPreTraining.from_pretrained('Cinnamon/electra-small-japanese-discriminator')
-```
--- a/model_cards/Cinnamon/electra-small-japanese-generator/README.md
+++ b/model_cards/Cinnamon/electra-small-japanese-generator/README.md
@ -1,18 +0,0 @@
---
-language: ja
---
-## Japanese ELECTRA-small
-
-We provide a Japanese **ELECTRA-Small** model, as described in [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-Our pretraining process employs subword units derived from the [Japanese Wikipedia](https://dumps.wikimedia.org/jawiki/latest), using the [Byte-Pair Encoding](https://www.aclweb.org/anthology/P16-1162.pdf) method and building on an initial tokenization with [mecab-ipadic-NEologd](https://github.com/neologd/mecab-ipadic-neologd). For optimal performance, please take care to set your MeCab dictionary appropriately.
-
-```
-# ELECTRA-small generator usage
-
-from transformers import BertJapaneseTokenizer, ElectraForMaskedLM
-
-tokenizer = BertJapaneseTokenizer.from_pretrained('Cinnamon/electra-small-japanese-generator', mecab_kwargs={"mecab_option": "-d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd"})
-
-model = ElectraForMaskedLM.from_pretrained('Cinnamon/electra-small-japanese-generator')
-```
--- a/model_cards/DJSammy/bert-base-danish-uncased_BotXO,ai/README.md
+++ b/model_cards/DJSammy/bert-base-danish-uncased_BotXO,ai/README.md
@ -1,142 +0,0 @@
---
-language: da
-tags:
- bert
- masked-lm
-license: cc-by-4.0
-datasets:
- common_crawl
- wikipedia
-pipeline_tag: fill-mask
-widget:
- text: "København er [MASK] i Danmark."
---
-
-# Danish BERT (uncased) model 
-
-[BotXO.ai](https://www.botxo.ai/) developed this model. For data and training details see their [GitHub repository](https://github.com/botxo/nordic_bert).  
-
-The original model was trained in TensorFlow then I converted it to Pytorch using [transformers-cli](https://huggingface.co/transformers/converting_tensorflow_models.html?highlight=cli).
-
-For TensorFlow version download here: https://www.dropbox.com/s/19cjaoqvv2jicq9/danish_bert_uncased_v2.zip?dl=1
-
-
-## Architecture
-
-```python
-from transformers import AutoModelForPreTraining
-
-model = AutoModelForPreTraining.from_pretrained("DJSammy/bert-base-danish-uncased_BotXO,ai")
-
-params = list(model.named_parameters())
-print('danish_bert_uncased_v2 has {:} different named parameters.\n'.format(len(params)))
-
-print('==== Embedding Layer ====\n')
-for p in params[0:5]:
-    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
-
-print('\n==== First Transformer ====\n')
-for p in params[5:21]:
-    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
-
-print('\n==== Last Transformer ====\n')
-for p in params[181:197]:
-    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
-
-print('\n==== Output Layer ====\n')
-for p in params[197:]:
-    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
-
-# danish_bert_uncased_v2 has 206 different named parameters.
-
-# ==== Embedding Layer ====
-
-# bert.embeddings.word_embeddings.weight                  (32000, 768)
-# bert.embeddings.position_embeddings.weight                (512, 768)
-# bert.embeddings.token_type_embeddings.weight                (2, 768)
-# bert.embeddings.LayerNorm.weight                              (768,)
-# bert.embeddings.LayerNorm.bias                                (768,)
-
-# ==== First Transformer ====
-
-# bert.encoder.layer.0.attention.self.query.weight          (768, 768)
-# bert.encoder.layer.0.attention.self.query.bias                (768,)
-# bert.encoder.layer.0.attention.self.key.weight            (768, 768)
-# bert.encoder.layer.0.attention.self.key.bias                  (768,)
-# bert.encoder.layer.0.attention.self.value.weight          (768, 768)
-# bert.encoder.layer.0.attention.self.value.bias                (768,)
-# bert.encoder.layer.0.attention.output.dense.weight        (768, 768)
-# bert.encoder.layer.0.attention.output.dense.bias              (768,)
-# bert.encoder.layer.0.attention.output.LayerNorm.weight        (768,)
-# bert.encoder.layer.0.attention.output.LayerNorm.bias          (768,)
-# bert.encoder.layer.0.intermediate.dense.weight           (3072, 768)
-# bert.encoder.layer.0.intermediate.dense.bias                 (3072,)
-# bert.encoder.layer.0.output.dense.weight                 (768, 3072)
-# bert.encoder.layer.0.output.dense.bias                        (768,)
-# bert.encoder.layer.0.output.LayerNorm.weight                  (768,)
-# bert.encoder.layer.0.output.LayerNorm.bias                    (768,)
-
-# ==== Last Transformer ====
-
-# bert.encoder.layer.11.attention.self.query.weight         (768, 768)
-# bert.encoder.layer.11.attention.self.query.bias               (768,)
-# bert.encoder.layer.11.attention.self.key.weight           (768, 768)
-# bert.encoder.layer.11.attention.self.key.bias                 (768,)
-# bert.encoder.layer.11.attention.self.value.weight         (768, 768)
-# bert.encoder.layer.11.attention.self.value.bias               (768,)
-# bert.encoder.layer.11.attention.output.dense.weight       (768, 768)
-# bert.encoder.layer.11.attention.output.dense.bias             (768,)
-# bert.encoder.layer.11.attention.output.LayerNorm.weight       (768,)
-# bert.encoder.layer.11.attention.output.LayerNorm.bias         (768,)
-# bert.encoder.layer.11.intermediate.dense.weight          (3072, 768)
-# bert.encoder.layer.11.intermediate.dense.bias                (3072,)
-# bert.encoder.layer.11.output.dense.weight                (768, 3072)
-# bert.encoder.layer.11.output.dense.bias                       (768,)
-# bert.encoder.layer.11.output.LayerNorm.weight                 (768,)
-# bert.encoder.layer.11.output.LayerNorm.bias                   (768,)
-
-# ==== Output Layer ====
-
-# bert.pooler.dense.weight                                  (768, 768)
-# bert.pooler.dense.bias                                        (768,)
-# cls.predictions.bias                                        (32000,)
-# cls.predictions.transform.dense.weight                    (768, 768)
-# cls.predictions.transform.dense.bias                          (768,)
-# cls.predictions.transform.LayerNorm.weight                    (768,)
-# cls.predictions.transform.LayerNorm.bias                      (768,)
-# cls.seq_relationship.weight                                 (2, 768)
-# cls.seq_relationship.bias                                       (2,)
-```
-
-## Example Pipeline
-
-```python
-from transformers import pipeline
-unmasker = pipeline('fill-mask', model='DJSammy/bert-base-danish-uncased_BotXO,ai')
-
-unmasker('København er [MASK] i Danmark.')
-
-# Copenhagen is the [MASK] of Denmark.
-# =>
-
-# [{'score': 0.788068950176239,
-#  'sequence': '[CLS] københavn er hovedstad i danmark. [SEP]',
-#  'token': 12610,
-#  'token_str': 'hovedstad'},
-# {'score': 0.07606703042984009,
-#  'sequence': '[CLS] københavn er hovedstaden i danmark. [SEP]',
-#  'token': 8108,
-#  'token_str': 'hovedstaden'},
-# {'score': 0.04299738258123398,
-#  'sequence': '[CLS] københavn er metropol i danmark. [SEP]',
-#  'token': 23305,
-#  'token_str': 'metropol'},
-# {'score': 0.008163209073245525,
-#  'sequence': '[CLS] københavn er ikke i danmark. [SEP]',
-#  'token': 89,
-#  'token_str': 'ikke'},
-# {'score': 0.006238455418497324,
-#  'sequence': '[CLS] københavn er ogsa i danmark. [SEP]',
-#  'token': 25253,
-#  'token_str': 'ogsa'}]
-```
--- a/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md
+++ b/model_cards/DeepPavlov/bert-base-bg-cs-pl-ru-cased/README.md
@ -1,14 +0,0 @@
---
-language:
- bg
- cs
- pl
- ru
---
-
-# bert-base-bg-cs-pl-ru-cased
-
-SlavicBERT\[1\] \(Slavic \(bg, cs, pl, ru\), cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on Russian News and four Wikipedias: Bulgarian, Czech, Polish, and Russian. Subtoken vocabulary was built using this data. Multilingual BERT was used as an initialization for SlavicBERT.
-
-
-\[1\]: Arkhipov M., Trofimova M., Kuratov Y., Sorokin A. \(2019\). [Tuning Multilingual Transformers for Language-Specific Named Entity Recognition](https://www.aclweb.org/anthology/W19-3712/). ACL anthology W19-3712.
--- a/model_cards/DeepPavlov/bert-base-cased-conversational/README.md
+++ b/model_cards/DeepPavlov/bert-base-cased-conversational/README.md
@ -1,16 +0,0 @@
---
-language: en
---
-
-# bert-base-cased-conversational
-
-Conversational BERT \(English, cased, 12‑layer, 768‑hidden, 12‑heads, 110M parameters\) was trained on the English part of Twitter, Reddit, DailyDialogues\[1\], OpenSubtitles\[2\], Debates\[3\], Blogs\[4\], Facebook News Comments. We used this training data to build the vocabulary of English subtokens and took English cased version of BERT‑base as an initialization for English Conversational BERT.
-
-
-\[1\]: Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. IJCNLP 2017.
-
-\[2\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\)
-
-\[3\]: Justine Zhang, Ravi Kumar, Sujith Ravi, Cristian Danescu-Niculescu-Mizil. Proceedings of NAACL, 2016.
-
-\[4\]: J. Schler, M. Koppel, S. Argamon and J. Pennebaker \(2006\). Effects of Age and Gender on Blogging in Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs.
--- a/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md
+++ b/model_cards/DeepPavlov/bert-base-multilingual-cased-sentence/README.md
@ -1,15 +0,0 @@
---
-language:
- multilingual
---
-
-# bert-base-multilingual-cased-sentence
-
-Sentence Multilingual BERT \(101 languages, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) is a representation‑based sentence encoder for 101 languages of Multilingual BERT. It is initialized with Multilingual BERT and then fine‑tuned on english MultiNLI\[1\] and on dev set of multilingual XNLI\[2\]. Sentence representations are mean pooled token embeddings in the same manner as in Sentence‑BERT\[3\].
-
-
-\[1\]: Williams A., Nangia N. & Bowman S. \(2017\) A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. arXiv preprint [arXiv:1704.05426](https://arxiv.org/abs/1704.05426)
-
-\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053)
-
-\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084)
--- a/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md
+++ b/model_cards/DeepPavlov/rubert-base-cased-conversational/README.md
@ -1,13 +0,0 @@
---
-language:
- ru
---
-
-# rubert-base-cased-conversational
-
-Conversational RuBERT \(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on OpenSubtitles\[1\], [Dirty](https://d3.ru/), [Pikabu](https://pikabu.ru/), and a Social Media segment of Taiga corpus\[2\]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with [RuBERT](../rubert-base-cased).
-
-
-\[1\]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation \(LREC 2016\)
-
-\[2\]: Shavrina T., Shapovalova O. \(2017\) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: «TAIGA» SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.
--- a/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md
+++ b/model_cards/DeepPavlov/rubert-base-cased-sentence/README.md
@ -1,15 +0,0 @@
---
-language:
- ru
---
-
-# rubert-base-cased-sentence
-
-Sentence RuBERT \(Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters\) is a representation‑based sentence encoder for Russian. It is initialized with RuBERT and fine‑tuned on SNLI\[1\] google-translated to russian and on russian part of XNLI dev set\[2\]. Sentence representations are mean pooled token embeddings in the same manner as in Sentence‑BERT\[3\].
-
-
-\[1\]: S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. \(2015\) A large annotated corpus for learning natural language inference. arXiv preprint [arXiv:1508.05326](https://arxiv.org/abs/1508.05326)
-
-\[2\]: Williams A., Bowman S. \(2018\) XNLI: Evaluating Cross-lingual Sentence Representations. arXiv preprint [arXiv:1809.05053](https://arxiv.org/abs/1809.05053)
-
-\[3\]: N. Reimers, I. Gurevych \(2019\) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint [arXiv:1908.10084](https://arxiv.org/abs/1908.10084)
--- a/model_cards/DeepPavlov/rubert-base-cased/README.md
+++ b/model_cards/DeepPavlov/rubert-base-cased/README.md
@ -1,11 +0,0 @@
---
-language:
- ru
---
-
-# rubert-base-cased
-
-RuBERT \(Russian, cased, 12‑layer, 768‑hidden, 12‑heads, 180M parameters\) was trained on the Russian part of Wikipedia and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version of BERT‑base as an initialization for RuBERT\[1\].
-
-
-\[1\]: Kuratov, Y., Arkhipov, M. \(2019\). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint [arXiv:1905.07213](https://arxiv.org/abs/1905.07213).
--- a/model_cards/Geotrend/bert-base-15lang-cased/README.md
+++ b/model_cards/Geotrend/bert-base-15lang-cased/README.md
@ -1,61 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
- text: "Paris est la [MASK] de la France."
- text: "Paris est la capitale de la [MASK]."
- text: "L'élection américaine a eu [MASK] en novembre 2020."
- text: "تقع سويسرا في [MASK] أوروبا"
- text: "إسمي محمد وأسكن في [MASK]."
---
-
-# bert-base-15lang-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-The measurements below have been computed on a [Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB)](https://cloud.google.com/compute/docs/machine-types\#n1_machine_type):
-
-|             Model               | Num parameters |   Size   |  Memory  | Loading time |
-| ------------------------------- | -------------- | -------- | -------- | ------------ |
-| bert-base-multilingual-cased    |   178 million  |  714 MB  | 1400 MB  |    4.2 sec   |
-| Geotrend/bert-base-15lang-cased |   141 million  |  564 MB  | 1098 MB  |    3.1 sec   |
-
-Handled languages: en, fr, es, de, zh, ar, ru, vi, el, bg, th, tr, hi, ur and sw.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-15lang-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-15lang-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-ar-cased/README.md
+++ b/model_cards/Geotrend/bert-base-ar-cased/README.md
@ -1,47 +0,0 @@
---
-language: ar
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "تقع سويسرا في [MASK] أوروبا"
- text: "إسمي محمد وأسكن في [MASK]."
---
-
-# bert-base-ar-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-ar-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-ar-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-bg-cased/README.md
+++ b/model_cards/Geotrend/bert-base-bg-cased/README.md
@ -1,42 +0,0 @@
---
-language: bg
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-bg-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-bg-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-bg-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-de-cased/README.md
+++ b/model_cards/Geotrend/bert-base-de-cased/README.md
@ -1,42 +0,0 @@
---
-language: de
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-de-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-de-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-de-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-el-cased/README.md
+++ b/model_cards/Geotrend/bert-base-el-cased/README.md
@ -1,42 +0,0 @@
---
-language: el
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-el-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-el-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-el-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-ar-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-ar-cased/README.md
@ -1,49 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
- text: "تقع سويسرا في [MASK] أوروبا"
- text: "إسمي محمد وأسكن في [MASK]."
---
-
-# bert-base-en-ar-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-ar-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-ar-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-bg-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-bg-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-bg-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-bg-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-bg-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-cased/README.md
@ -1,47 +0,0 @@
---
-language: en
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-de-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-de-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-de-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-de-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-de-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-el-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-el-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-el-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-el-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-el-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-es-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-es-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-es-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-es-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-es-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-fr-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-fr-cased/README.md
@ -1,50 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
- text: "Paris est la [MASK] de la France."
- text: "Paris est la capitale de la [MASK]."
- text: "L'élection américaine a eu [MASK] en novembre 2020."
---
-
-# bert-base-en-fr-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-fr-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-fr-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-hi-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-hi-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-hi-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-hi-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-hi-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-ru-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-ru-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-ru-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-ru-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-ru-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-sw-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-sw-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-sw-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-sw-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-sw-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-th-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-th-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-th-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-th-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-th-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-tr-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-tr-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-tr-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-tr-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-tr-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-ur-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-ur-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-ur-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-ur-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-ur-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-vi-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-vi-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-vi-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-vi-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-vi-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-en-zh-cased/README.md
+++ b/model_cards/Geotrend/bert-base-en-zh-cased/README.md
@ -1,47 +0,0 @@
---
-language: multilingual
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Google generated 46 billion [MASK] in revenue."
- text: "Paris is the capital of [MASK]."
- text: "Algiers is the largest city in [MASK]."
---
-
-# bert-base-en-zh-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-zh-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-en-zh-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-es-cased/README.md
+++ b/model_cards/Geotrend/bert-base-es-cased/README.md
@ -1,42 +0,0 @@
---
-language: es
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-es-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-es-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-es-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-fr-cased/README.md
+++ b/model_cards/Geotrend/bert-base-fr-cased/README.md
@ -1,47 +0,0 @@
---
-language: fr
-
-datasets: wikipedia
-
-license: apache-2.0
-
-widget:
- text: "Paris est la [MASK] de la France."
- text: "Paris est la capitale de la [MASK]."
- text: "L'élection américaine a eu [MASK] en novembre 2020."
---
-
-# bert-base-fr-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-fr-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-fr-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-hi-cased/README.md
+++ b/model_cards/Geotrend/bert-base-hi-cased/README.md
@ -1,42 +0,0 @@
---
-language: hi
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-hi-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-hi-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-hi-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-ru-cased/README.md
+++ b/model_cards/Geotrend/bert-base-ru-cased/README.md
@ -1,42 +0,0 @@
---
-language: ru
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-ru-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-ru-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-ru-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-sw-cased/README.md
+++ b/model_cards/Geotrend/bert-base-sw-cased/README.md
@ -1,42 +0,0 @@
---
-language: sw
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-sw-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-sw-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-sw-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-th-cased/README.md
+++ b/model_cards/Geotrend/bert-base-th-cased/README.md
@ -1,42 +0,0 @@
---
-language: th
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-th-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-th-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-th-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-tr-cased/README.md
+++ b/model_cards/Geotrend/bert-base-tr-cased/README.md
@ -1,42 +0,0 @@
---
-language: tr
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-tr-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-tr-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-tr-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-ur-cased/README.md
+++ b/model_cards/Geotrend/bert-base-ur-cased/README.md
@ -1,42 +0,0 @@
---
-language: ur
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-ur-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-ur-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-ur-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-vi-cased/README.md
+++ b/model_cards/Geotrend/bert-base-vi-cased/README.md
@ -1,42 +0,0 @@
---
-language: vi
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-vi-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-vi-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-vi-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-zh-cased/README.md
+++ b/model_cards/Geotrend/bert-base-zh-cased/README.md
@ -1,42 +0,0 @@
---
-language: zh
-
-datasets: wikipedia
-
-license: apache-2.0
---
-
-# bert-base-zh-cased
-
-We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
-
-Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
-
-For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
-
-## How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-zh-cased")
-model = AutoModel.from_pretrained("Geotrend/bert-base-zh-cased")
-
-```
-
-To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
-
-### How to cite
-
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-
-## Contact 
-
-Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-arabic/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-arabic/README.md
@ -1,18 +0,0 @@
-This model is used detecting **hatespeech** in **Arabic language**. The mono in the name refers to the monolingual setting, where the model is trained using only Arabic language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.877609 for a learning rate of 2e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-english/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-english/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **English language**. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.726030 for a learning rate of 2e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-french/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-french/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **French language**. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.692094 for a learning rate of 3e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-german/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-german/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **German language**. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.649794 for a learning rate of 3e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-indonesian/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-indonesian/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **Indonesian language**. The mono in the name refers to the monolingual setting, where the model is trained using only Arabic language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.844494 for a learning rate of 2e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-italian/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-italian/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **Italian language**. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.837288 for a learning rate of 3e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-polish/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-polish/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **Polish language**. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.723254 for a learning rate of 2e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-portugese/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-portugese/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **Portuguese language**. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.716119 for a learning rate of 3e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/Hate-speech-CNERG/dehatebert-mono-spanish/README.md
+++ b/model_cards/Hate-speech-CNERG/dehatebert-mono-spanish/README.md
@ -1,20 +0,0 @@
-This model is used detecting **hatespeech** in **Spanish language**. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model.
-The model is trained with different learning rates and the best validation score achieved is 0.740287 for a learning rate of 3e-5. Training code can be found at this [url](https://github.com/punyajoy/DE-LIMIT)
-
-
-
-### For more details about our paper
-
-Sai Saketh Aluru, Binny Mathew, Punyajoy Saha and Animesh Mukherjee. "[Deep Learning Models for Multilingual Hate Speech Detection](https://arxiv.org/abs/2004.06465)". Accepted at ECML-PKDD 2020.
-
-***Please cite our paper in any published work that uses any of these resources.***
-
-~~~
-@article{aluru2020deep,
-  title={Deep Learning Models for Multilingual Hate Speech Detection},
-  author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
-  journal={arXiv preprint arXiv:2004.06465},
-  year={2020}
-}
-
-~~~
--- a/model_cards/HooshvareLab/bert-base-parsbert-armanner-uncased/README.md
+++ b/model_cards/HooshvareLab/bert-base-parsbert-armanner-uncased/README.md
@ -1,124 +0,0 @@
-## ParsBERT: Transformer-based Model for Persian Language Understanding
-
-ParsBERT is a monolingual language model based on Google’s BERT architecture with the same configurations as BERT-Base. 
-
-Paper presenting ParsBERT: [arXiv:2005.12515](https://arxiv.org/abs/2005.12515)
-
-All the models (downstream tasks) are uncased and trained with whole word masking. (coming soon stay tuned)
-
-
-## Persian NER [ARMAN, PEYMA, ARMAN+PEYMA]
-
-This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
-
-
-
-### PEYMA
-
-PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
-
-1. Organization
-2. Money
-3. Location
-4. Date
-5. Time
-6. Person
-7. Percent
-
-
-|     Label    |   #   |
-|:------------:|:-----:|
-| Organization | 16964 |
-|     Money    |  2037 |
-|   Location   |  8782 |
-|     Date     |  4259 |
-|     Time     |  732  |
-|    Person    |  7675 |
-|    Percent   |  699  |
-
-
-
-**Download**
-You can download the dataset from [here](http://nsurl.org/tasks/task-7-named-entity-recognition-ner-for-farsi/)
-
---
-
-### ARMAN
-
-ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
-
-1. Organization
-2. Location
-3. Facility
-4. Event
-5. Product
-6. Person
-
-
-|     Label    |   #   |
-|:------------:|:-----:|
-| Organization | 30108 |
-|   Location   | 12924 |
-|   Facility   |  4458 |
-|     Event    |  7557 |
-|    Product   |  4389 |
-|    Person    | 15645 |
-
-
-
-**Download**
-You can download the dataset from [here](https://github.com/HaniehP/PersianNER)
-
-
-
-## Results
-
-The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
-
-| Dataset         | ParsBERT | MorphoBERT |  Beheshti-NER  |  LSTM-CRF  |  Rule-Based CRF  |  BiLSTM-CRF  |
-|:---------------:|:--------:|:----------:|:--------------:|:----------:|:----------------:|:------------:|
-|  ARMAN + PEYMA  |   95.13* |      -     |        -       |      -     |         -        |       -      |
-|  PEYMA          |   98.79* |      -     |      90.59     |      -     |       84.00      |       -      |
-|  ARMAN          |   93.10* |    89.9    |      84.03     |    86.55   |         -        |     77.45    |
-
-
-## How to use :hugs:
-| Notebook     |      Description      |   |
-|:----------|:-------------|------:|
-| [How to use Pipelines](https://github.com/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb)  | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb) |
-
-
-## Cite 
-
-Please cite the following paper in your publication if you are using [ParsBERT](https://arxiv.org/abs/2005.12515) in your research:
-
-```markdown
-@article{ParsBERT,
-    title={ParsBERT: Transformer-based Model for Persian Language Understanding},
-    author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
-    journal={ArXiv},
-    year={2020},
-    volume={abs/2005.12515}
-}
-```
-
-
-## Acknowledgments
-
-We hereby, express our gratitude to the [Tensorflow Research Cloud (TFRC) program](https://tensorflow.org/tfrc) for providing us with the necessary computation resources. We also thank [Hooshvare](https://hooshvare.com) Research Group for facilitating dataset gathering and scraping online text resources.
-
-
-## Contributors
-
- Mehrdad Farahani: [Linkedin](https://www.linkedin.com/in/m3hrdadfi/), [Twitter](https://twitter.com/m3hrdadfi), [Github](https://github.com/m3hrdadfi)
- Mohammad Gharachorloo:  [Linkedin](https://www.linkedin.com/in/mohammad-gharachorloo/), [Twitter](https://twitter.com/MGharachorloo), [Github](https://github.com/baarsaam)
- Marzieh Farahani:  [Linkedin](https://www.linkedin.com/in/marziehphi/), [Twitter](https://twitter.com/marziehphi), [Github](https://github.com/marziehphi)
- Mohammad Manthouri:  [Linkedin](https://www.linkedin.com/in/mohammad-manthouri-aka-mansouri-07030766/), [Twitter](https://twitter.com/mmanthouri), [Github](https://github.com/mmanthouri)
- Hooshvare Team:  [Official Website](https://hooshvare.com/), [Linkedin](https://www.linkedin.com/company/hooshvare), [Twitter](https://twitter.com/hooshvare), [Github](https://github.com/hooshvare), [Instagram](https://www.instagram.com/hooshvare/)
-
-+ And a special thanks to Sara Tabrizi for her fantastic poster design. Follow her on: [Linkedin](https://www.linkedin.com/in/sara-tabrizi-64548b79/), [Behance](https://www.behance.net/saratabrizi), [Instagram](https://www.instagram.com/sara_b_tabrizi/)
-
-## Releases
-
-### Release v0.1 (May 29, 2019)
-This is the first version of our ParsBERT NER!
--- a/model_cards/HooshvareLab/bert-base-parsbert-ner-uncased/README.md
+++ b/model_cards/HooshvareLab/bert-base-parsbert-ner-uncased/README.md
@ -1,124 +0,0 @@
-## ParsBERT: Transformer-based Model for Persian Language Understanding
-
-ParsBERT is a monolingual language model based on Google’s BERT architecture with the same configurations as BERT-Base. 
-
-Paper presenting ParsBERT: [arXiv:2005.12515](https://arxiv.org/abs/2005.12515)
-
-All the models (downstream tasks) are uncased and trained with whole word masking. (coming soon stay tuned)
-
-
-## Persian NER [ARMAN, PEYMA, ARMAN+PEYMA]
-
-This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
-
-
-
-### PEYMA
-
-PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
-
-1. Organization
-2. Money
-3. Location
-4. Date
-5. Time
-6. Person
-7. Percent
-
-
-|     Label    |   #   |
-|:------------:|:-----:|
-| Organization | 16964 |
-|     Money    |  2037 |
-|   Location   |  8782 |
-|     Date     |  4259 |
-|     Time     |  732  |
-|    Person    |  7675 |
-|    Percent   |  699  |
-
-
-
-**Download**
-You can download the dataset from [here](http://nsurl.org/tasks/task-7-named-entity-recognition-ner-for-farsi/)
-
---
-
-### ARMAN
-
-ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
-
-1. Organization
-2. Location
-3. Facility
-4. Event
-5. Product
-6. Person
-
-
-|     Label    |   #   |
-|:------------:|:-----:|
-| Organization | 30108 |
-|   Location   | 12924 |
-|   Facility   |  4458 |
-|     Event    |  7557 |
-|    Product   |  4389 |
-|    Person    | 15645 |
-
-
-
-**Download**
-You can download the dataset from [here](https://github.com/HaniehP/PersianNER)
-
-
-
-## Results
-
-The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
-
-| Dataset         | ParsBERT | MorphoBERT |  Beheshti-NER  |  LSTM-CRF  |  Rule-Based CRF  |  BiLSTM-CRF  |
-|:---------------:|:--------:|:----------:|:--------------:|:----------:|:----------------:|:------------:|
-|  ARMAN + PEYMA  |   95.13* |      -     |        -       |      -     |         -        |       -      |
-|  PEYMA          |   98.79* |      -     |      90.59     |      -     |       84.00      |       -      |
-|  ARMAN          |   93.10* |    89.9    |      84.03     |    86.55   |         -        |     77.45    |
-
-
-## How to use :hugs:
-| Notebook     |      Description      |   |
-|:----------|:-------------|------:|
-| [How to use Pipelines](https://github.com/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb)  | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb) |
-
-
-## Cite 
-
-Please cite the following paper in your publication if you are using [ParsBERT](https://arxiv.org/abs/2005.12515) in your research:
-
-```markdown
-@article{ParsBERT,
-    title={ParsBERT: Transformer-based Model for Persian Language Understanding},
-    author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
-    journal={ArXiv},
-    year={2020},
-    volume={abs/2005.12515}
-}
-```
-
-
-## Acknowledgments
-
-We hereby, express our gratitude to the [Tensorflow Research Cloud (TFRC) program](https://tensorflow.org/tfrc) for providing us with the necessary computation resources. We also thank [Hooshvare](https://hooshvare.com) Research Group for facilitating dataset gathering and scraping online text resources.
-
-
-## Contributors
-
- Mehrdad Farahani: [Linkedin](https://www.linkedin.com/in/m3hrdadfi/), [Twitter](https://twitter.com/m3hrdadfi), [Github](https://github.com/m3hrdadfi)
- Mohammad Gharachorloo:  [Linkedin](https://www.linkedin.com/in/mohammad-gharachorloo/), [Twitter](https://twitter.com/MGharachorloo), [Github](https://github.com/baarsaam)
- Marzieh Farahani:  [Linkedin](https://www.linkedin.com/in/marziehphi/), [Twitter](https://twitter.com/marziehphi), [Github](https://github.com/marziehphi)
- Mohammad Manthouri:  [Linkedin](https://www.linkedin.com/in/mohammad-manthouri-aka-mansouri-07030766/), [Twitter](https://twitter.com/mmanthouri), [Github](https://github.com/mmanthouri)
- Hooshvare Team:  [Official Website](https://hooshvare.com/), [Linkedin](https://www.linkedin.com/company/hooshvare), [Twitter](https://twitter.com/hooshvare), [Github](https://github.com/hooshvare), [Instagram](https://www.instagram.com/hooshvare/)
-
-+ And a special thanks to Sara Tabrizi for her fantastic poster design. Follow her on: [Linkedin](https://www.linkedin.com/in/sara-tabrizi-64548b79/), [Behance](https://www.behance.net/saratabrizi), [Instagram](https://www.instagram.com/sara_b_tabrizi/)
-
-## Releases
-
-### Release v0.1 (May 29, 2019)
-This is the first version of our ParsBERT NER!
--- a/model_cards/HooshvareLab/bert-base-parsbert-peymaner-uncased/README.md
+++ b/model_cards/HooshvareLab/bert-base-parsbert-peymaner-uncased/README.md
@ -1,124 +0,0 @@
-## ParsBERT: Transformer-based Model for Persian Language Understanding
-
-ParsBERT is a monolingual language model based on Google’s BERT architecture with the same configurations as BERT-Base. 
-
-Paper presenting ParsBERT: [arXiv:2005.12515](https://arxiv.org/abs/2005.12515)
-
-All the models (downstream tasks) are uncased and trained with whole word masking. (coming soon stay tuned)
-
-
-## Persian NER [ARMAN, PEYMA, ARMAN+PEYMA]
-
-This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
-
-
-
-### PEYMA
-
-PEYMA dataset includes 7,145 sentences with a total of 302,530 tokens from which 41,148 tokens are tagged with seven different classes.
-
-1. Organization
-2. Money
-3. Location
-4. Date
-5. Time
-6. Person
-7. Percent
-
-
-|     Label    |   #   |
-|:------------:|:-----:|
-| Organization | 16964 |
-|     Money    |  2037 |
-|   Location   |  8782 |
-|     Date     |  4259 |
-|     Time     |  732  |
-|    Person    |  7675 |
-|    Percent   |  699  |
-
-
-
-**Download**
-You can download the dataset from [here](http://nsurl.org/tasks/task-7-named-entity-recognition-ner-for-farsi/)
-
---
-
-### ARMAN
-
-ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
-
-1. Organization
-2. Location
-3. Facility
-4. Event
-5. Product
-6. Person
-
-
-|     Label    |   #   |
-|:------------:|:-----:|
-| Organization | 30108 |
-|   Location   | 12924 |
-|   Facility   |  4458 |
-|     Event    |  7557 |
-|    Product   |  4389 |
-|    Person    | 15645 |
-
-
-
-**Download**
-You can download the dataset from [here](https://github.com/HaniehP/PersianNER)
-
-
-
-## Results
-
-The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
-
-| Dataset         | ParsBERT | MorphoBERT |  Beheshti-NER  |  LSTM-CRF  |  Rule-Based CRF  |  BiLSTM-CRF  |
-|:---------------:|:--------:|:----------:|:--------------:|:----------:|:----------------:|:------------:|
-|  ARMAN + PEYMA  |   95.13* |      -     |        -       |      -     |         -        |       -      |
-|  PEYMA          |   98.79* |      -     |      90.59     |      -     |       84.00      |       -      |
-|  ARMAN          |   93.10* |    89.9    |      84.03     |    86.55   |         -        |     77.45    |
-
-
-## How to use :hugs:
-| Notebook     |      Description      |   |
-|:----------|:-------------|------:|
-| [How to use Pipelines](https://github.com/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb)  | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb) |
-
-
-## Cite 
-
-Please cite the following paper in your publication if you are using [ParsBERT](https://arxiv.org/abs/2005.12515) in your research:
-
-```markdown
-@article{ParsBERT,
-    title={ParsBERT: Transformer-based Model for Persian Language Understanding},
-    author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
-    journal={ArXiv},
-    year={2020},
-    volume={abs/2005.12515}
-}
-```
-
-
-## Acknowledgments
-
-We hereby, express our gratitude to the [Tensorflow Research Cloud (TFRC) program](https://tensorflow.org/tfrc) for providing us with the necessary computation resources. We also thank [Hooshvare](https://hooshvare.com) Research Group for facilitating dataset gathering and scraping online text resources.
-
-
-## Contributors
-
- Mehrdad Farahani: [Linkedin](https://www.linkedin.com/in/m3hrdadfi/), [Twitter](https://twitter.com/m3hrdadfi), [Github](https://github.com/m3hrdadfi)
- Mohammad Gharachorloo:  [Linkedin](https://www.linkedin.com/in/mohammad-gharachorloo/), [Twitter](https://twitter.com/MGharachorloo), [Github](https://github.com/baarsaam)
- Marzieh Farahani:  [Linkedin](https://www.linkedin.com/in/marziehphi/), [Twitter](https://twitter.com/marziehphi), [Github](https://github.com/marziehphi)
- Mohammad Manthouri:  [Linkedin](https://www.linkedin.com/in/mohammad-manthouri-aka-mansouri-07030766/), [Twitter](https://twitter.com/mmanthouri), [Github](https://github.com/mmanthouri)
- Hooshvare Team:  [Official Website](https://hooshvare.com/), [Linkedin](https://www.linkedin.com/company/hooshvare), [Twitter](https://twitter.com/hooshvare), [Github](https://github.com/hooshvare), [Instagram](https://www.instagram.com/hooshvare/)
-
-+ And a special thanks to Sara Tabrizi for her fantastic poster design. Follow her on: [Linkedin](https://www.linkedin.com/in/sara-tabrizi-64548b79/), [Behance](https://www.behance.net/saratabrizi), [Instagram](https://www.instagram.com/sara_b_tabrizi/)
-
-## Releases
-
-### Release v0.1 (May 29, 2019)
-This is the first version of our ParsBERT NER!
--- a/model_cards/HooshvareLab/bert-base-parsbert-uncased/README.md
+++ b/model_cards/HooshvareLab/bert-base-parsbert-uncased/README.md
@ -1,124 +0,0 @@
-## ParsBERT: Transformer-based Model for Persian Language Understanding
-
-ParsBERT is a monolingual language model based on Google’s BERT architecture with the same configurations as BERT-Base. 
-
-Paper presenting ParsBERT: [arXiv:2005.12515](https://arxiv.org/abs/2005.12515)
-
-All the models (downstream tasks) are uncased and trained with whole word masking. (coming soon stay tuned)
-
-
---
-
-## Introduction
-
-This model is pre-trained on a large Persian corpus with various writing styles from numerous subjects (e.g., scientific, novels, news) with more than 2M documents. A large subset of this corpus was crawled manually.
-
-As a part of ParsBERT methodology, an extensive pre-processing combining POS tagging and WordPiece segmentation was carried out to bring the corpus into a proper format. This process produces more than 40M true sentences. 
-
-
-## Evaluation
-
-ParsBERT is evaluated on three NLP downstream tasks: Sentiment Analysis (SA), Text Classification, and Named Entity Recognition (NER). For this matter and due to insufficient resources, two large datasets for SA and two for text classification were manually composed, which are available for public use and benchmarking. ParsBERT outperformed all other language models, including multilingual BERT and other hybrid deep learning models for all tasks, improving the state-of-the-art performance in Persian language modeling.
-
-## Results
-
-The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
-
-
-
-### Sentiment Analysis (SA) task
-
-|           Dataset          |  ParsBERT | mBERT | DeepSentiPers |
-|:--------------------------:|:---------:|:-----:|:-------------:|
-|   Digikala User Comments   |   81.74*  | 80.74 |       -       |
-|   SnappFood User Comments  |   88.12*  | 87.87 |       -       |
-|   SentiPers (Multi Class)  |   71.11*  |   -   |     69.33     |
-|  SentiPers (Binary Class)  |   92.13*  |   -   |     91.98     |
-
-
-
-### Text Classification (TC) task
-
-|      Dataset      | ParsBERT | mBERT |
-|:-----------------:|:--------:|:-----:|
-| Digikala Magazine |   93.59* | 90.72 |
-|    Persian News   |   97.19* | 95.79 |
-
-
-### Named Entity Recognition (NER) task
-
-| Dataset | ParsBERT |  mBERT   | MorphoBERT |  Beheshti-NER  |  LSTM-CRF  |  Rule-Based CRF  |  BiLSTM-CRF  |
-|:-------:|:--------:|:--------:|:----------:|:--------------:|:----------:|:----------------:|:------------:|
-|  PEYMA  |   93.10* |   86.64  |      -     |      90.59     |      -     |       84.00      |       -      |
-|  ARMAN  |   98.79* |   95.89  |    89.9    |      84.03     |    86.55   |         -        |     77.45    |
-
-
-**If you tested ParsBERT on a public dataset and you want to add your results to the table above, open a pull request or contact us. Also make sure to have your code available online so we can add it as a reference**
-
-## How to use
-
-### TensorFlow 2.0
-
-```python
-from transformers import AutoConfig, AutoTokenizer, TFAutoModel
-
-config = AutoConfig.from_pretrained("HooshvareLab/bert-base-parsbert-uncased")
-tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-base-parsbert-uncased")
-model = AutoModel.from_pretrained("HooshvareLab/bert-base-parsbert-uncased")
-
-text = "ما در هوشواره معتقدیم با انتقال صحیح دانش و آگاهی، همه افراد می‌توانند از ابزارهای هوشمند استفاده کنند. شعار ما هوش مصنوعی برای همه است."
-tokenizer.tokenize(text)
-
->>> ['ما', 'در', 'هوش', '##واره', 'معتقدیم', 'با', 'انتقال', 'صحیح', 'دانش', 'و', 'اگاهی', '،', 'همه', 'افراد', 'میتوانند', 'از', 'ابزارهای', 'هوشمند', 'استفاده', 'کنند', '.', 'شعار', 'ما', 'هوش', 'مصنوعی', 'برای', 'همه', 'است', '.']
-
-```
-
-### Pytorch
-
-```python
-from transformers import AutoConfig, AutoTokenizer, AutoModel
-
-config = AutoConfig.from_pretrained("HooshvareLab/bert-base-parsbert-uncased")
-tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-base-parsbert-uncased")
-model = AutoModel.from_pretrained("HooshvareLab/bert-base-parsbert-uncased")
-```
-
-
-## NLP Tasks Tutorial 
-
-Coming soon stay tuned
-
-
-## Cite 
-
-Please cite the following paper in your publication if you are using [ParsBERT](https://arxiv.org/abs/2005.12515) in your research:
-
-```markdown
-@article{ParsBERT,
-    title={ParsBERT: Transformer-based Model for Persian Language Understanding},
-    author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
-    journal={ArXiv},
-    year={2020},
-    volume={abs/2005.12515}
-}
-```
-
-
-## Acknowledgments
-
-We hereby, express our gratitude to the [Tensorflow Research Cloud (TFRC) program](https://tensorflow.org/tfrc) for providing us with the necessary computation resources. We also thank [Hooshvare](https://hooshvare.com) Research Group for facilitating dataset gathering and scraping online text resources.
-
-
-## Contributors
-
- Mehrdad Farahani: [Linkedin](https://www.linkedin.com/in/m3hrdadfi/), [Twitter](https://twitter.com/m3hrdadfi), [Github](https://github.com/m3hrdadfi)
- Mohammad Gharachorloo:  [Linkedin](https://www.linkedin.com/in/mohammad-gharachorloo/), [Twitter](https://twitter.com/MGharachorloo), [Github](https://github.com/baarsaam)
- Marzieh Farahani:  [Linkedin](https://www.linkedin.com/in/marziehphi/), [Twitter](https://twitter.com/marziehphi), [Github](https://github.com/marziehphi)
- Mohammad Manthouri:  [Linkedin](https://www.linkedin.com/in/mohammad-manthouri-aka-mansouri-07030766/), [Twitter](https://twitter.com/mmanthouri), [Github](https://github.com/mmanthouri)
- Hooshvare Team:  [Official Website](https://hooshvare.com/), [Linkedin](https://www.linkedin.com/company/hooshvare), [Twitter](https://twitter.com/hooshvare), [Github](https://github.com/hooshvare), [Instagram](https://www.instagram.com/hooshvare/)
-
-
-## Releases
-
-### Release v0.1 (May 27, 2019)
-This is the first version of our ParsBERT based on BERT<sub>BASE</sub>
--- a/model_cards/HooshvareLab/bert-fa-base-uncased/README.md
+++ b/model_cards/HooshvareLab/bert-fa-base-uncased/README.md
@ -1,147 +0,0 @@
---
-language: fa
-tags:
- bert-fa
- bert-persian
- persian-lm
-license: apache-2.0
---
-
-# ParsBERT (v2.0)
-A Transformer-based Model for Persian Language Understanding
-
-
-We reconstructed the vocabulary and fine-tuned the ParsBERT v1.1 on the new Persian corpora in order to provide some functionalities for using ParsBERT in other scopes!
-Please follow the [ParsBERT](https://github.com/hooshvare/parsbert) repo for the latest information about previous and current models.
-
-## Introduction
-
-ParsBERT is a monolingual language model based on Google’s BERT architecture. This model is pre-trained on large Persian corpora with various writing styles from numerous subjects (e.g., scientific, novels, news) with more than `3.9M` documents, `73M` sentences, and `1.3B` words.
- 
-Paper presenting ParsBERT: [arXiv:2005.12515](https://arxiv.org/abs/2005.12515)
-
-## Intended uses & limitations
-
-You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?search=bert-fa) to look for
-fine-tuned versions on a task that interests you.
-
-
-### How to use
-
-#### TensorFlow 2.0
-
-```python
-from transformers import AutoConfig, AutoTokenizer, TFAutoModel
-
-config = AutoConfig.from_pretrained("HooshvareLab/bert-fa-base-uncased")
-tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-fa-base-uncased")
-model = TFAutoModel.from_pretrained("HooshvareLab/bert-fa-base-uncased")
-
-text = "ما در هوشواره معتقدیم با انتقال صحیح دانش و آگاهی، همه افراد میتوانند از ابزارهای هوشمند استفاده کنند. شعار ما هوش مصنوعی برای همه است."
-tokenizer.tokenize(text)
-
->>> ['ما', 'در', 'هوش', '##واره', 'معتقدیم', 'با', 'انتقال', 'صحیح', 'دانش', 'و', 'اگاهی', '،', 'همه', 'افراد', 'میتوانند', 'از', 'ابزارهای', 'هوشمند', 'استفاده', 'کنند', '.', 'شعار', 'ما', 'هوش', 'مصنوعی', 'برای', 'همه', 'است', '.']
-```
-
-#### Pytorch
-
-```python
-from transformers import AutoConfig, AutoTokenizer, AutoModel
-
-config = AutoConfig.from_pretrained("HooshvareLab/bert-fa-base-uncased")
-tokenizer = AutoTokenizer.from_pretrained("HooshvareLab/bert-fa-base-uncased")
-model = AutoModel.from_pretrained("HooshvareLab/bert-fa-base-uncased")
-```
-
-## Training
-
-ParsBERT trained on a massive amount of public corpora ([Persian Wikidumps](https://dumps.wikimedia.org/fawiki/), [MirasText](https://github.com/miras-tech/MirasText)) and six other manually crawled text data from a various type of websites ([BigBang Page](https://bigbangpage.com/) `scientific`, [Chetor](https://www.chetor.com/) `lifestyle`, [Eligasht](https://www.eligasht.com/Blog/) `itinerary`,  [Digikala](https://www.digikala.com/mag/) `digital magazine`, [Ted Talks](https://www.ted.com/talks) `general conversational`, Books `novels, storybooks, short stories from old to the contemporary era`).
-
-As a part of ParsBERT methodology, an extensive pre-processing combining POS tagging and WordPiece segmentation was carried out to bring the corpora into a proper format.
-
-## Goals
-Objective goals during training are as below (after 300k steps).
-
-``` bash
-***** Eval results *****
-global_step = 300000
-loss = 1.4392426
-masked_lm_accuracy = 0.6865794
-masked_lm_loss = 1.4469004
-next_sentence_accuracy = 1.0
-next_sentence_loss = 6.534152e-05
-```
-
-
-## Derivative models
-
-### Base Config
-
-#### ParsBERT v2.0 Model
- [HooshvareLab/bert-fa-base-uncased](https://huggingface.co/HooshvareLab/bert-fa-base-uncased) 
-
-#### ParsBERT v2.0 Sentiment Analysis
- [HooshvareLab/bert-fa-base-uncased-sentiment-digikala](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-sentiment-digikala) 
- [HooshvareLab/bert-fa-base-uncased-sentiment-snappfood](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-sentiment-snappfood) 
- [HooshvareLab/bert-fa-base-uncased-sentiment-deepsentipers-binary](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-sentiment-deepsentipers-binary) 
- [HooshvareLab/bert-fa-base-uncased-sentiment-deepsentipers-multi](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-sentiment-deepsentipers-multi) 
-
-#### ParsBERT v2.0 Text Classification
- [HooshvareLab/bert-fa-base-uncased-clf-digimag](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-clf-digimag) 
- [HooshvareLab/bert-fa-base-uncased-clf-persiannews](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-clf-persiannews) 
-
-#### ParsBERT v2.0 NER 
- [HooshvareLab/bert-fa-base-uncased-ner-peyma](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-ner-peyma) 
- [HooshvareLab/bert-fa-base-uncased-ner-arman](https://huggingface.co/HooshvareLab/bert-fa-base-uncased-ner-arman) 
-
-
-## Eval results
-
-ParsBERT is evaluated on three NLP downstream tasks: Sentiment Analysis (SA), Text Classification, and Named Entity Recognition (NER). For this matter and due to insufficient resources, two large datasets for SA and two for text classification were manually composed, which are available for public use and benchmarking. ParsBERT outperformed all other language models, including multilingual BERT and other hybrid deep learning models for all tasks, improving the state-of-the-art performance in Persian language modeling.
-
-
-### Sentiment Analysis (SA) Task
-
-|          Dataset         | ParsBERT v2 | ParsBERT v1 | mBERT | DeepSentiPers |
-|:------------------------:|:-----------:|:-----------:|:-----:|:-------------:|
-|  Digikala User Comments  |    81.72    |    81.74*   | 80.74 |       -       |
-|  SnappFood User Comments |    87.98    |    88.12*   | 87.87 |       -       |
-|  SentiPers (Multi Class) |    71.31*   |    71.11    |   -   |     69.33     |
-| SentiPers (Binary Class) |    92.42*   |    92.13    |   -   |     91.98     |
-
-
-### Text Classification (TC) Task
-
-|      Dataset      | ParsBERT v2 | ParsBERT v1 | mBERT |
-|:-----------------:|:-----------:|:-----------:|:-----:|
-| Digikala Magazine |    93.65*   |    93.59    | 90.72 |
-|    Persian News   |    97.44*   |    97.19    | 95.79 |
-
-
-### Named Entity Recognition (NER) Task
-
-| Dataset | ParsBERT v2 | ParsBERT v1 | mBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF |
-|:-------:|:-----------:|:-----------:|:-----:|:----------:|:------------:|:--------:|:--------------:|:----------:|
-|  PEYMA  |    93.40*   |    93.10    | 86.64 |      -     |     90.59    |     -    |      84.00     |      -     |
-|  ARMAN  |    99.84*   |    98.79    | 95.89 |    89.9    |     84.03    |   86.55  |        -       |    77.45   |
-
-
-
-
-### BibTeX entry and citation info
-
-Please cite in publications as the following:
-
-```bibtex
-@article{ParsBERT,
-    title={ParsBERT: Transformer-based Model for Persian Language Understanding},
-    author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
-    journal={ArXiv},
-    year={2020},
-    volume={abs/2005.12515}
-}
-```
-
-## Questions?
-Post a Github issue on the [ParsBERT Issues](https://github.com/hooshvare/parsbert/issues) repo.
--- a/model_cards/KB/albert-base-swedish-cased-alpha/README.md
+++ b/model_cards/KB/albert-base-swedish-cased-alpha/README.md
@ -1,121 +0,0 @@
---
-language: sv
---
-
-# Swedish BERT Models
-
-The National Library of Sweden / KBLab releases three pretrained language models based on BERT and ALBERT. The models are trained on approximately 15-20GB of text (200M sentences, 3000M tokens) from various sources (books, news, government publications, swedish wikipedia and internet forums) aiming to provide a representative BERT model for Swedish text. A more complete description will be published later on.
-
-The following three models are currently available:
-
- **bert-base-swedish-cased** (*v1*) - A BERT trained with the same hyperparameters as first published by Google.
- **bert-base-swedish-cased-ner** (*experimental*) - a BERT fine-tuned for NER using SUC 3.0.
- **albert-base-swedish-cased-alpha** (*alpha*) - A first attempt at an ALBERT for Swedish.
-
-All models are cased and trained with whole word masking.
-
-## Files
-
-| **name**                        | **files** |
-|---------------------------------|-----------|
-| bert-base-swedish-cased         | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/config.json), [vocab](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/vocab.txt), [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/pytorch_model.bin) |
-| bert-base-swedish-cased-ner     | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/config.json), [vocab](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/vocab.txt) [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/pytorch_model.bin) |
-| albert-base-swedish-cased-alpha | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/config.json), [sentencepiece model](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/spiece.model), [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/pytorch_model.bin) |
-
-TensorFlow model weights will be released soon.
-
-## Usage requirements / installation instructions
-
-The examples below require Huggingface Transformers 2.4.1 and Pytorch 1.3.1 or greater. For Transformers<2.4.0 the tokenizer must be instantiated manually and the `do_lower_case` flag parameter set to `False` and `keep_accents` to `True` (for ALBERT).
-
-To create an environment where the examples can be run, run the following in an terminal on your OS of choice.
-
-```
-# git clone https://github.com/Kungbib/swedish-bert-models
-# cd swedish-bert-models
-# python3 -m venv venv
-# source venv/bin/activate
-# pip install --upgrade pip
-# pip install -r requirements.txt
-```
-
-### BERT Base Swedish
-
-A standard BERT base for Swedish trained on a variety of sources. Vocabulary size is ~50k. Using Huggingface Transformers the model can be loaded in Python as follows:
-
-```python
-from transformers import AutoModel,AutoTokenizer
-
-tok = AutoTokenizer.from_pretrained('KB/bert-base-swedish-cased')
-model = AutoModel.from_pretrained('KB/bert-base-swedish-cased')
-```
-
-
-### BERT base fine-tuned for Swedish NER
-
-This model is fine-tuned on the SUC 3.0 dataset. Using the Huggingface pipeline the model can be easily instantiated. For Transformer<2.4.1 it seems the tokenizer must be loaded separately to disable lower-casing of input strings:
-
-```python
-from transformers import pipeline
-
-nlp = pipeline('ner', model='KB/bert-base-swedish-cased-ner', tokenizer='KB/bert-base-swedish-cased-ner')
-
-nlp('Idag släpper KB tre språkmodeller.')
-```
-
-Running the Python code above should produce in something like the result below. Entity types used are `TME` for time, `PRS` for personal names, `LOC` for locations, `EVN` for events and `ORG` for organisations. These labels are subject to change.
-
-```python
-[ { 'word': 'Idag', 'score': 0.9998126029968262, 'entity': 'TME' },
-  { 'word': 'KB',   'score': 0.9814832210540771, 'entity': 'ORG' } ]
-```
-
-The BERT tokenizer often splits words into multiple tokens, with the subparts starting with `##`, for example the string `Engelbert kör Volvo till Herrängens fotbollsklubb` gets tokenized as `Engel ##bert kör Volvo till Herr ##ängens fotbolls ##klubb`. To glue parts back together one can use something like this:
-
-```python
-text = 'Engelbert tar Volvon till Tele2 Arena för att titta på Djurgården IF ' +\
-       'som spelar fotboll i VM klockan två på kvällen.'
-
-l = []
-for token in nlp(text):
-    if token['word'].startswith('##'):
-        l[-1]['word'] += token['word'][2:]
-    else:
-        l += [ token ]
-
-print(l)
-```
-
-Which should result in the following (though less cleanly formatted):
-
-```python
-[ { 'word': 'Engelbert',     'score': 0.99..., 'entity': 'PRS'},
-  { 'word': 'Volvon',        'score': 0.99..., 'entity': 'OBJ'},
-  { 'word': 'Tele2',         'score': 0.99..., 'entity': 'LOC'},
-  { 'word': 'Arena',         'score': 0.99..., 'entity': 'LOC'},
-  { 'word': 'Djurgården',    'score': 0.99..., 'entity': 'ORG'},
-  { 'word': 'IF',            'score': 0.99..., 'entity': 'ORG'},
-  { 'word': 'VM',            'score': 0.99..., 'entity': 'EVN'},
-  { 'word': 'klockan',       'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'två',           'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'på',            'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'kvällen',       'score': 0.54..., 'entity': 'TME'} ]
-```
-
-### ALBERT base
-
-The easiest way to do this is, again, using Huggingface Transformers:
-
-```python
-from transformers import AutoModel,AutoTokenizer
-
-tok = AutoTokenizer.from_pretrained('KB/albert-base-swedish-cased-alpha'),
-model = AutoModel.from_pretrained('KB/albert-base-swedish-cased-alpha')
-```
-
-## Acknowledgements ❤️
-
- Resources from Stockholms University, Umeå University and Swedish Language Bank at Gothenburg University were used when fine-tuning BERT for NER.
- Model pretraining was made partly in-house at the KBLab and partly (for material without active copyright) with the support of Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
- Models are hosted on S3 by Huggingface 🤗
-
--- a/model_cards/KB/bert-base-swedish-cased-ner/README.md
+++ b/model_cards/KB/bert-base-swedish-cased-ner/README.md
@ -1,121 +0,0 @@
---
-language: sv
---
-
-# Swedish BERT Models
-
-The National Library of Sweden / KBLab releases three pretrained language models based on BERT and ALBERT. The models are trained on approximately 15-20GB of text (200M sentences, 3000M tokens) from various sources (books, news, government publications, swedish wikipedia and internet forums) aiming to provide a representative BERT model for Swedish text. A more complete description will be published later on.
-
-The following three models are currently available:
-
- **bert-base-swedish-cased** (*v1*) - A BERT trained with the same hyperparameters as first published by Google.
- **bert-base-swedish-cased-ner** (*experimental*) - a BERT fine-tuned for NER using SUC 3.0.
- **albert-base-swedish-cased-alpha** (*alpha*) - A first attempt at an ALBERT for Swedish.
-
-All models are cased and trained with whole word masking.
-
-## Files
-
-| **name**                        | **files** |
-|---------------------------------|-----------|
-| bert-base-swedish-cased         | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/config.json), [vocab](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/vocab.txt), [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/pytorch_model.bin) |
-| bert-base-swedish-cased-ner     | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/config.json), [vocab](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/vocab.txt) [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/pytorch_model.bin) |
-| albert-base-swedish-cased-alpha | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/config.json), [sentencepiece model](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/spiece.model), [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/pytorch_model.bin) |
-
-TensorFlow model weights will be released soon.
-
-## Usage requirements / installation instructions
-
-The examples below require Huggingface Transformers 2.4.1 and Pytorch 1.3.1 or greater. For Transformers<2.4.0 the tokenizer must be instantiated manually and the `do_lower_case` flag parameter set to `False` and `keep_accents` to `True` (for ALBERT).
-
-To create an environment where the examples can be run, run the following in an terminal on your OS of choice.
-
-```
-# git clone https://github.com/Kungbib/swedish-bert-models
-# cd swedish-bert-models
-# python3 -m venv venv
-# source venv/bin/activate
-# pip install --upgrade pip
-# pip install -r requirements.txt
-```
-
-### BERT Base Swedish
-
-A standard BERT base for Swedish trained on a variety of sources. Vocabulary size is ~50k. Using Huggingface Transformers the model can be loaded in Python as follows:
-
-```python
-from transformers import AutoModel,AutoTokenizer
-
-tok = AutoTokenizer.from_pretrained('KB/bert-base-swedish-cased')
-model = AutoModel.from_pretrained('KB/bert-base-swedish-cased')
-```
-
-
-### BERT base fine-tuned for Swedish NER
-
-This model is fine-tuned on the SUC 3.0 dataset. Using the Huggingface pipeline the model can be easily instantiated. For Transformer<2.4.1 it seems the tokenizer must be loaded separately to disable lower-casing of input strings:
-
-```python
-from transformers import pipeline
-
-nlp = pipeline('ner', model='KB/bert-base-swedish-cased-ner', tokenizer='KB/bert-base-swedish-cased-ner')
-
-nlp('Idag släpper KB tre språkmodeller.')
-```
-
-Running the Python code above should produce in something like the result below. Entity types used are `TME` for time, `PRS` for personal names, `LOC` for locations, `EVN` for events and `ORG` for organisations. These labels are subject to change.
-
-```python
-[ { 'word': 'Idag', 'score': 0.9998126029968262, 'entity': 'TME' },
-  { 'word': 'KB',   'score': 0.9814832210540771, 'entity': 'ORG' } ]
-```
-
-The BERT tokenizer often splits words into multiple tokens, with the subparts starting with `##`, for example the string `Engelbert kör Volvo till Herrängens fotbollsklubb` gets tokenized as `Engel ##bert kör Volvo till Herr ##ängens fotbolls ##klubb`. To glue parts back together one can use something like this:
-
-```python
-text = 'Engelbert tar Volvon till Tele2 Arena för att titta på Djurgården IF ' +\
-       'som spelar fotboll i VM klockan två på kvällen.'
-
-l = []
-for token in nlp(text):
-    if token['word'].startswith('##'):
-        l[-1]['word'] += token['word'][2:]
-    else:
-        l += [ token ]
-
-print(l)
-```
-
-Which should result in the following (though less cleanly formatted):
-
-```python
-[ { 'word': 'Engelbert',     'score': 0.99..., 'entity': 'PRS'},
-  { 'word': 'Volvon',        'score': 0.99..., 'entity': 'OBJ'},
-  { 'word': 'Tele2',         'score': 0.99..., 'entity': 'LOC'},
-  { 'word': 'Arena',         'score': 0.99..., 'entity': 'LOC'},
-  { 'word': 'Djurgården',    'score': 0.99..., 'entity': 'ORG'},
-  { 'word': 'IF',            'score': 0.99..., 'entity': 'ORG'},
-  { 'word': 'VM',            'score': 0.99..., 'entity': 'EVN'},
-  { 'word': 'klockan',       'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'två',           'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'på',            'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'kvällen',       'score': 0.54..., 'entity': 'TME'} ]
-```
-
-### ALBERT base
-
-The easiest way to do this is, again, using Huggingface Transformers:
-
-```python
-from transformers import AutoModel,AutoTokenizer
-
-tok = AutoTokenizer.from_pretrained('KB/albert-base-swedish-cased-alpha'),
-model = AutoModel.from_pretrained('KB/albert-base-swedish-cased-alpha')
-```
-
-## Acknowledgements ❤️
-
- Resources from Stockholms University, Umeå University and Swedish Language Bank at Gothenburg University were used when fine-tuning BERT for NER.
- Model pretraining was made partly in-house at the KBLab and partly (for material without active copyright) with the support of Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
- Models are hosted on S3 by Huggingface 🤗
-
--- a/model_cards/KB/bert-base-swedish-cased/README.md
+++ b/model_cards/KB/bert-base-swedish-cased/README.md
@ -1,121 +0,0 @@
---
-language: sv
---
-
-# Swedish BERT Models
-
-The National Library of Sweden / KBLab releases three pretrained language models based on BERT and ALBERT. The models are trained on aproximately 15-20GB of text (200M sentences, 3000M tokens) from various sources (books, news, government publications, swedish wikipedia and internet forums) aiming to provide a representative BERT model for Swedish text. A more complete description will be published later on.
-
-The following three models are currently available:
-
- **bert-base-swedish-cased** (*v1*) - A BERT trained with the same hyperparameters as first published by Google.
- **bert-base-swedish-cased-ner** (*experimental*) - a BERT fine-tuned for NER using SUC 3.0.
- **albert-base-swedish-cased-alpha** (*alpha*) - A first attempt at an ALBERT for Swedish.
-
-All models are cased and trained with whole word masking.
-
-## Files
-
-| **name**                        | **files** |
-|---------------------------------|-----------|
-| bert-base-swedish-cased         | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/config.json), [vocab](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/vocab.txt), [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased/pytorch_model.bin) |
-| bert-base-swedish-cased-ner     | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/config.json), [vocab](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/vocab.txt) [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/bert-base-swedish-cased-ner/pytorch_model.bin) |
-| albert-base-swedish-cased-alpha | [config](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/config.json), [sentencepiece model](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/spiece.model), [pytorch_model.bin](https://s3.amazonaws.com/models.huggingface.co/bert/KB/albert-base-swedish-cased-alpha/pytorch_model.bin) |
-
-TensorFlow model weights will be released soon.
-
-## Usage requirements / installation instructions
-
-The examples below require Huggingface Transformers 2.4.1 and Pytorch 1.3.1 or greater. For Transformers<2.4.0 the tokenizer must be instantiated manually and the `do_lower_case` flag parameter set to `False` and `keep_accents` to `True` (for ALBERT).
-
-To create an environment where the examples can be run, run the following in an terminal on your OS of choice.
-
-```
-# git clone https://github.com/Kungbib/swedish-bert-models
-# cd swedish-bert-models
-# python3 -m venv venv
-# source venv/bin/activate
-# pip install --upgrade pip
-# pip install -r requirements.txt
-```
-
-### BERT Base Swedish
-
-A standard BERT base for Swedish trained on a variety of sources. Vocabulary size is ~50k. Using Huggingface Transformers the model can be loaded in Python as follows:
-
-```python
-from transformers import AutoModel,AutoTokenizer
-
-tok = AutoTokenizer.from_pretrained('KB/bert-base-swedish-cased')
-model = AutoModel.from_pretrained('KB/bert-base-swedish-cased')
-```
-
-
-### BERT base fine-tuned for Swedish NER
-
-This model is fine-tuned on the SUC 3.0 dataset. Using the Huggingface pipeline the model can be easily instantiated. For Transformer<2.4.1 it seems the tokenizer must be loaded separately to disable lower-casing of input strings:
-
-```python
-from transformers import pipeline
-
-nlp = pipeline('ner', model='KB/bert-base-swedish-cased-ner', tokenizer='KB/bert-base-swedish-cased-ner')
-
-nlp('Idag släpper KB tre språkmodeller.')
-```
-
-Running the Python code above should produce in something like the result below. Entity types used are `TME` for time, `PRS` for personal names, `LOC` for locations, `EVN` for events and `ORG` for organisations. These labels are subject to change.
-
-```python
-[ { 'word': 'Idag', 'score': 0.9998126029968262, 'entity': 'TME' },
-  { 'word': 'KB',   'score': 0.9814832210540771, 'entity': 'ORG' } ]
-```
-
-The BERT tokenizer often splits words into multiple tokens, with the subparts starting with `##`, for example the string `Engelbert kör Volvo till Herrängens fotbollsklubb` gets tokenized as `Engel ##bert kör Volvo till Herr ##ängens fotbolls ##klubb`. To glue parts back together one can use something like this:
-
-```python
-text = 'Engelbert tar Volvon till Tele2 Arena för att titta på Djurgården IF ' +\
-       'som spelar fotboll i VM klockan två på kvällen.'
-
-l = []
-for token in nlp(text):
-    if token['word'].startswith('##'):
-        l[-1]['word'] += token['word'][2:]
-    else:
-        l += [ token ]
-
-print(l)
-```
-
-Which should result in the following (though less cleanly formated):
-
-```python
-[ { 'word': 'Engelbert',     'score': 0.99..., 'entity': 'PRS'},
-  { 'word': 'Volvon',        'score': 0.99..., 'entity': 'OBJ'},
-  { 'word': 'Tele2',         'score': 0.99..., 'entity': 'LOC'},
-  { 'word': 'Arena',         'score': 0.99..., 'entity': 'LOC'},
-  { 'word': 'Djurgården',    'score': 0.99..., 'entity': 'ORG'},
-  { 'word': 'IF',            'score': 0.99..., 'entity': 'ORG'},
-  { 'word': 'VM',            'score': 0.99..., 'entity': 'EVN'},
-  { 'word': 'klockan',       'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'två',           'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'på',            'score': 0.99..., 'entity': 'TME'},
-  { 'word': 'kvällen',       'score': 0.54..., 'entity': 'TME'} ]
-```
-
-### ALBERT base
-
-The easisest way to do this is, again, using Huggingface Transformers:
-
-```python
-from transformers import AutoModel,AutoTokenizer
-
-tok = AutoTokenizer.from_pretrained('KB/albert-base-swedish-cased-alpha'),
-model = AutoModel.from_pretrained('KB/albert-base-swedish-cased-alpha')
-```
-
-## Acknowledgements ❤️
-
- Resources from Stockholms University, Umeå University and Swedish Language Bank at Gothenburg University were used when fine-tuning BERT for NER.
- Model pretraining was made partly in-house at the KBLab and partly (for material without active copyright) with the support of Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
- Models are hosted on S3 by Huggingface 🤗
-
--- a/model_cards/LorenzoDeMattei/GePpeTto/README.md
+++ b/model_cards/LorenzoDeMattei/GePpeTto/README.md
@ -1,141 +0,0 @@
---
-language: it
---
-
-# GePpeTto GPT2 Model 🇮🇹
-
-Pretrained GPT2 117M model for Italian.
-
-You can find further details in the paper:
-
-Lorenzo De Mattei, Michele Cafagna, Felice Dell’Orletta, Malvina Nissim, Marco Guerini "GePpeTto Carves Italian into a Language Model", arXiv preprint. Pdf available at: https://arxiv.org/abs/2004.14253
-
-## Pretraining Corpus
-
-The pretraining set comprises two main sources. The first one is a dump of Italian Wikipedia (November 2019), 
-consisting of 2.8GB of text. The second one is the ItWac corpus (Baroni et al., 2009), which amounts to 11GB of web
-texts. This collection provides a mix of standard and less standard Italian, on a rather wide chronological span, 
-with older texts than the Wikipedia dump (the latter stretches only to the late 2000s).
-
-## Pretraining details
-
-This model was trained using GPT2's Hugging Face implemenation on 4 NVIDIA Tesla T4 GPU for 620k steps.
-
-Training parameters:
-
- GPT-2 small configuration
- vocabulary size: 30k
- Batch size: 32
- Block size: 100
- Adam Optimizer
- Initial learning rate: 5e-5
- Warm up steps: 10k
-
-## Perplexity scores
-
-| Domain | Perplexity |
-|---|---|
-| Wikipedia | 26.1052 |
-| ItWac | 30.3965 |
-| Legal | 37.2197 |
-| News | 45.3859 |
-| Social Media | 84.6408 |
-
-For further details, qualitative analysis and human evaluation check out: https://arxiv.org/abs/2004.14253
-
-## Load Pretrained Model
-
-You can use this model by installing Huggingface library `transformers`. And you can use it directly by initializing it like this:  
-
-```python
-from transformers import GPT2Tokenizer, GPT2Model
-
-model = GPT2Model.from_pretrained('LorenzoDeMattei/GePpeTto')
-tokenizer = GPT2Tokenizer.from_pretrained(
-    'LorenzoDeMattei/GePpeTto',
-)
-```
-
-## Example using GPT2LMHeadModel
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline, GPT2Tokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("LorenzoDeMattei/GePpeTto")
-model = AutoModelWithLMHead.from_pretrained("LorenzoDeMattei/GePpeTto")
-
-text_generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
-prompts = [
-    "Wikipedia Geppetto",
-    "Maestro Ciliegia regala il pezzo di legno al suo amico Geppetto, il quale lo prende per fabbricarsi un burattino maraviglioso"]
-
-
-samples_outputs = text_generator(
-    prompts,
-    do_sample=True,
-    max_length=50,
-    top_k=50,
-    top_p=0.95,
-    num_return_sequences=3
-)
-
-
-for i, sample_outputs in enumerate(samples_outputs):
-    print(100 * '-')
-    print("Prompt:", prompts[i])
-    for sample_output in sample_outputs:
-        print("Sample:", sample_output['generated_text'])
-        print()
-
-```
-
-Output is,
-
-```
----------------------------------------------------------------------------------------------------
-Prompt: Wikipedia Geppetto
-Sample: Wikipedia Geppetto rosso (film 1920)
-
-Geppetto rosso ("The Smokes in the Black") è un film muto del 1920 diretto da Henry H. Leonard.
-
-Il film fu prodotto dalla Selig Poly
-
-Sample: Wikipedia Geppetto
-
-Geppetto ("Geppetto" in piemontese) è un comune italiano di 978 abitanti della provincia di Cuneo in Piemonte.
-
-L'abitato, che si trova nel versante valtellinese, si sviluppa nella
-
-Sample: Wikipedia Geppetto di Natale (romanzo)
-
-Geppetto di Natale è un romanzo di Mario Caiano, pubblicato nel 2012.
-
----------------------------------------------------------------------------------------------------
-Prompt: Maestro Ciliegia regala il pezzo di legno al suo amico Geppetto, il quale lo prende per fabbricarsi un burattino maraviglioso
-Sample: Maestro Ciliegia regala il pezzo di legno al suo amico Geppetto, il quale lo prende per fabbricarsi un burattino maraviglioso. Il burattino riesce a scappare. Dopo aver trovato un prezioso sacchetto si reca
-
-Sample: Maestro Ciliegia regala il pezzo di legno al suo amico Geppetto, il quale lo prende per fabbricarsi un burattino maraviglioso, e l'unico che lo possiede, ma, di fronte a tutte queste prove
-
-Sample: Maestro Ciliegia regala il pezzo di legno al suo amico Geppetto, il quale lo prende per fabbricarsi un burattino maraviglioso: - A voi gli occhi, le guance! A voi il mio pezzo!
-```
-
-## Citation
-
-Please use the following bibtex entry:
-
-```
-@misc{mattei2020geppetto,
-    title={GePpeTto Carves Italian into a Language Model},
-    author={Lorenzo De Mattei and Michele Cafagna and Felice Dell'Orletta and Malvina Nissim and Marco Guerini},
-    year={2020},
-    eprint={2004.14253},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-
-## References
-
-Marco Baroni, Silvia Bernardini, Adriano Ferraresi,
-and Eros Zanchetta. 2009. The WaCky wide web: a
-collection of very large linguistically processed webcrawled corpora. Language resources and evaluation, 43(3):209–226.
--- a/model_cards/Michau/t5-base-en-generate-headline/README.md
+++ b/model_cards/Michau/t5-base-en-generate-headline/README.md
@ -1,47 +0,0 @@
-## About the model
-
-The model has been trained on a collection of 500k articles with headings. Its purpose is to create a one-line heading suitable for the given article.
-
-Sample code with a WikiNews article:
-
-```python
-import torch
-from transformers import T5ForConditionalGeneration,T5Tokenizer
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
-model = T5ForConditionalGeneration.from_pretrained("Michau/t5-base-en-generate-headline")
-tokenizer = T5Tokenizer.from_pretrained("Michau/t5-base-en-generate-headline")
-model = model.to(device)
-
-article = '''
-Very early yesterday morning, the United States President Donald Trump reported he and his wife First Lady Melania Trump tested positive for COVID-19. Officials said the Trumps' 14-year-old son Barron tested negative as did First Family and Senior Advisors Jared Kushner and Ivanka Trump.
-Trump took to social media, posting at 12:54 am local time (0454 UTC) on Twitter, "Tonight, [Melania] and I tested positive for COVID-19. We will begin our quarantine and recovery process immediately. We will get through this TOGETHER!" Yesterday afternoon Marine One landed on the White House's South Lawn flying Trump to Walter Reed National Military Medical Center (WRNMMC) in Bethesda, Maryland.
-Reports said both were showing "mild symptoms". Senior administration officials were tested as people were informed of the positive test. Senior advisor Hope Hicks had tested positive on Thursday.
-Presidential physician Sean Conley issued a statement saying Trump has been given zinc, vitamin D, Pepcid and a daily Aspirin. Conley also gave a single dose of the experimental polyclonal antibodies drug from Regeneron Pharmaceuticals.
-According to official statements, Trump, now operating from the WRNMMC, is to continue performing his duties as president during a 14-day quarantine. In the event of Trump becoming incapacitated, Vice President Mike Pence could take over the duties of president via the 25th Amendment of the US Constitution. The Pence family all tested negative as of yesterday and there were no changes regarding Pence's campaign events.
-'''
-
-text =  "headline: " + article
-
-max_len = 256
-
-encoding = tokenizer.encode_plus(text, return_tensors = "pt")
-input_ids = encoding["input_ids"].to(device)
-attention_masks = encoding["attention_mask"].to(device)
-
-beam_outputs = model.generate(
-    input_ids = input_ids,
-    attention_mask = attention_masks,
-    max_length = 64,
-    num_beams = 3,
-    early_stopping = True,
-)
-
-result = tokenizer.decode(beam_outputs[0])
-print(result)
-```
-
-Result:
-
-```Trump and First Lady Melania Test Positive for COVID-19```
--- a/model_cards/MoseliMotsoehli/TswanaBert/README.md
+++ b/model_cards/MoseliMotsoehli/TswanaBert/README.md
@ -1,74 +0,0 @@
---
-language: tn
---
-
-# TswanaBert
-Pretrained model on the Tswana language using a masked language modeling (MLM) objective.
-
-## Model Description.
-TswanaBERT is a transformer model pre-trained on a corpus of Setswana in a self-supervised fashion by masking part of the input words and training to predict the masks by using byte-level tokens.
-
-## Intended uses & limitations
-The model can  be used for either masked language modeling or next word prediction. It can also be fine-tuned on a specific down-stream NLP application. 
-
-#### How to use
-
-```python
->>> from transformers import pipeline
->>> from transformers import AutoTokenizer, AutoModelWithLMHead
-
->>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/TswanaBert")
->>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/TswanaBert")
->>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
->>> unmasker("Ntshopotse <mask> e godile.")
-
-[{'score': 0.32749542593955994,
-  'sequence': '<s>Ntshopotse setse e godile.</s>',
-  'token': 538,
-  'token_str': 'Ġsetse'},
- {'score': 0.060260992497205734,
-  'sequence': '<s>Ntshopotse le e godile.</s>',
-  'token': 270,
-  'token_str': 'Ġle'},
- {'score': 0.058460816740989685,
-  'sequence': '<s>Ntshopotse bone e godile.</s>',
-  'token': 364,
-  'token_str': 'Ġbone'},
- {'score': 0.05694682151079178,
-  'sequence': '<s>Ntshopotse ga e godile.</s>',
-  'token': 298,
-  'token_str': 'Ġga'},
- {'score': 0.0565204992890358,
-  'sequence': '<s>Ntshopotse, e godile.</s>',
-  'token': 16,
-  'token_str': ','}]
-```
-
-#### Limitations and bias
-The model is trained on a relatively small collection of setwana, mostly from news articles and creative writtings, and so is not representative enough of the language as yet.
-
-## Training data
-
-1. The largest portion of this dataset (10k)  sentences of text, comes from the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download)
-
-2. I Then added SABC news headlines collected by Marivate Vukosi, & Sefara Tshephisho, (2020)  that is generously made available on [zenoodo](http://doi.org/10.5281/zenodo.3668495 ). This added 185 tswana sentences to my corpus. 
-
-3. I went on to add 300 more sentences by scrapping following news sites and blogs that mosty originate in Botswana. I actively continue to expand the dataset.
-
-* http://setswana.blogspot.com/
-* https://omniglot.com/writing/tswana.php
-* http://www.dailynews.gov.bw/
-* http://www.mmegi.bw/index.php
-* https://tsena.co.bw
-* http://www.botswana.co.za/Cultural_Issues-travel/botswana-country-guide-en-route.html
-* https://www.poemhunter.com/poem/2013-setswana/
-https://www.poemhunter.com/poem/ngwana-wa-mosetsana/
- 
-
-### BibTeX entry and citation info
-
-```bibtex
-@inproceedings{author = {Moseli Motsoehli},
-  year={2020}
-}
-```
--- a/model_cards/MoseliMotsoehli/zuBERTa/README.md
+++ b/model_cards/MoseliMotsoehli/zuBERTa/README.md
@ -1,56 +0,0 @@
---
-language: zu
---
-
-# zuBERTa
-zuBERTa is a RoBERTa style transformer language model trained on zulu text.
-
-## Intended uses & limitations
-The model can be used for getting embeddings to use on a down-stream task such as question answering.
-
-#### How to use
-
-```python
->>> from transformers import pipeline
->>> from transformers import AutoTokenizer, AutoModelWithLMHead
-
->>> tokenizer = AutoTokenizer.from_pretrained("MoseliMotsoehli/zuBERTa")
->>> model = AutoModelWithLMHead.from_pretrained("MoseliMotsoehli/zuBERTa")
->>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
->>> unmasker("Abafika eNkandla bafika sebeholwa <mask> uMpongo kaZingelwayo.")
-
-[
-  {
-    "sequence": "<s>Abafika eNkandla bafika sebeholwa khona uMpongo kaZingelwayo.</s>",
-    "score": 0.050459690392017365,
-    "token": 555,
-    "token_str": "Ġkhona"
-  },
-  {
-    "sequence": "<s>Abafika eNkandla bafika sebeholwa inkosi uMpongo kaZingelwayo.</s>",
-    "score": 0.03668094798922539,
-    "token": 2321,
-    "token_str": "Ġinkosi"
-  },
-  {
-    "sequence": "<s>Abafika eNkandla bafika sebeholwa ubukhosi uMpongo kaZingelwayo.</s>",
-    "score": 0.028774697333574295,
-    "token": 5101,
-    "token_str": "Ġubukhosi"
-  }
-]
-```
-
-## Training data
-
-1. 30k sentences of text, came from the [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download) of zulu 2018. These were collected from news articles and creative writtings. 
-2. ~7500 articles of human generated translations were scraped from the zulu [wikipedia](https://zu.wikipedia.org/wiki/Special:AllPages).
-
-### BibTeX entry and citation info
-
-```bibtex
-@inproceedings{author = {Moseli Motsoehli},
-  title = {Towards transformation of Southern African language models through transformers.},
-  year={2020}
-}
-```
--- a/model_cards/Musixmatch/umberto-commoncrawl-cased-v1/README.md
+++ b/model_cards/Musixmatch/umberto-commoncrawl-cased-v1/README.md
@ -1,118 +0,0 @@
---
-language: it
---
-
-# UmBERTo Commoncrawl Cased
-
-[UmBERTo](https://github.com/musixmatchresearch/umberto) is a Roberta-based Language Model trained on large Italian Corpora and uses two innovative approaches: SentencePiece and Whole Word Masking. Now available at [github.com/huggingface/transformers](https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1)
-
-<p align="center">
-    <img src="https://user-images.githubusercontent.com/7140210/72913702-d55a8480-3d3d-11ea-99fc-f2ef29af4e72.jpg" width="700"> </br>
-    Marco Lodola, Monument to Umberto Eco, Alessandria 2019
-</p>
-
-## Dataset
-UmBERTo-Commoncrawl-Cased utilizes the Italian subcorpus of [OSCAR](https://traces1.inria.fr/oscar/) as training set of the language model. We used deduplicated version of the Italian corpus that consists in 70 GB of plain text data, 210M sentences with 11B words where the sentences have been filtered and shuffled at line level in order to be used for NLP research.
-
-## Pre-trained model
-
-| Model | WWM | Cased | Tokenizer | Vocab Size  | Train Steps |  Download |
-| ------ | ------ | ------ | ------ | ------ |------ | ------ |
-| `umberto-commoncrawl-cased-v1` | YES | YES | SPM | 32K | 125k | [Link](http://bit.ly/35zO7GH) |
-
-This model was trained with [SentencePiece](https://github.com/google/sentencepiece) and Whole Word Masking.
-
-## Downstream Tasks
-These results refers to umberto-commoncrawl-cased model. All details are at [Umberto](https://github.com/musixmatchresearch/umberto) Official Page.
-
-#### Named Entity Recognition (NER)
-
-| Dataset | F1 | Precision | Recall | Accuracy |
-| ------ | ------ | ------ |  ------ |  ------ |
-| **ICAB-EvalITA07** | **87.565**  | 86.596  | 88.556  | 98.690 | 
-| **WikiNER-ITA** | **92.531**  | 92.509 | 92.553 | 99.136 | 
-
-#### Part of Speech (POS)
-
-| Dataset | F1 | Precision | Recall | Accuracy |
-| ------ | ------ | ------ |  ------ |  ------ |
-| **UD_Italian-ISDT** | 98.870  | 98.861 | 98.879 | **98.977** | 
-| **UD_Italian-ParTUT** | 98.786 | 98.812 |  98.760 | **98.903** | 
-
-
-
-## Usage
-
-##### Load UmBERTo with AutoModel, Autotokenizer:
-
-```python
-
-import torch
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-commoncrawl-cased-v1")
-umberto = AutoModel.from_pretrained("Musixmatch/umberto-commoncrawl-cased-v1")
-
-encoded_input = tokenizer.encode("Umberto Eco è stato un grande scrittore")
-input_ids = torch.tensor(encoded_input).unsqueeze(0)  # Batch size 1
-outputs = umberto(input_ids)
-last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output
-```
-
-##### Predict masked token:
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-	"fill-mask",
-	model="Musixmatch/umberto-commoncrawl-cased-v1",
-	tokenizer="Musixmatch/umberto-commoncrawl-cased-v1"
-)
-
-result = fill_mask("Umberto Eco è <mask> un grande scrittore")
-# {'sequence': '<s> Umberto Eco è considerato un grande scrittore</s>', 'score': 0.18599839508533478, 'token': 5032}
-# {'sequence': '<s> Umberto Eco è stato un grande scrittore</s>', 'score': 0.17816807329654694, 'token': 471}
-# {'sequence': '<s> Umberto Eco è sicuramente un grande scrittore</s>', 'score': 0.16565583646297455, 'token': 2654}
-# {'sequence': '<s> Umberto Eco è indubbiamente un grande scrittore</s>', 'score': 0.0932890921831131, 'token': 17908}
-# {'sequence': '<s> Umberto Eco è certamente un grande scrittore</s>', 'score': 0.054701317101716995, 'token': 5269}
-```
-
-
-## Citation
-All of the original datasets are publicly available or were released with the owners' grant. The datasets are all released under a CC0 or CCBY license.
-
-* UD Italian-ISDT Dataset [Github](https://github.com/UniversalDependencies/UD_Italian-ISDT)
-* UD Italian-ParTUT Dataset [Github](https://github.com/UniversalDependencies/UD_Italian-ParTUT)
-* I-CAB (Italian Content Annotation Bank), EvalITA [Page](http://www.evalita.it/)
-* WIKINER [Page](https://figshare.com/articles/Learning_multilingual_named_entity_recognition_from_Wikipedia/5462500) , [Paper](https://www.sciencedirect.com/science/article/pii/S0004370212000276?via%3Dihub)
-
-```
-@inproceedings {magnini2006annotazione,
-	title = {Annotazione di contenuti concettuali in un corpus italiano: I - CAB},
-	author = {Magnini,Bernardo and Cappelli,Amedeo and Pianta,Emanuele and Speranza,Manuela and Bartalesi Lenzi,V and Sprugnoli,Rachele and Romano,Lorenza and Girardi,Christian and Negri,Matteo},
-	booktitle = {Proc.of SILFI 2006},
-	year = {2006}
-}
-@inproceedings {magnini2006cab,
-	title = {I - CAB: the Italian Content Annotation Bank.},
-	author = {Magnini,Bernardo and Pianta,Emanuele and Girardi,Christian and Negri,Matteo and Romano,Lorenza and Speranza,Manuela and Lenzi,Valentina Bartalesi and Sprugnoli,Rachele},
-	booktitle = {LREC},
-	pages = {963--968},
-	year = {2006},
-	organization = {Citeseer}
-}
-```
-
-## Authors
-
-**Loreto Parisi**: `loreto at musixmatch dot com`, [loretoparisi](https://github.com/loretoparisi)
-**Simone Francia**: `simone.francia at musixmatch dot com`, [simonefrancia](https://github.com/simonefrancia)
-**Paolo Magnani**: `paul.magnani95 at gmail dot com`, [paulthemagno](https://github.com/paulthemagno)
-
-## About Musixmatch AI
-![Musxmatch Ai mac app icon-128](https://user-images.githubusercontent.com/163333/72244273-396aa380-35ee-11ea-894b-4ea48230c02b.png)
-We do Machine Learning and Artificial Intelligence @[musixmatch](https://twitter.com/Musixmatch)
-Follow us on [Twitter](https://twitter.com/musixmatchai) [Github](https://github.com/musixmatchresearch)
-
-
--- a/model_cards/Musixmatch/umberto-wikipedia-uncased-v1/README.md
+++ b/model_cards/Musixmatch/umberto-wikipedia-uncased-v1/README.md
@ -1,117 +0,0 @@
---
-language: it
---
-
-# UmBERTo Wikipedia Uncased
-
-[UmBERTo](https://github.com/musixmatchresearch/umberto) is a Roberta-based Language Model trained on large Italian Corpora and uses two innovative approaches: SentencePiece and Whole Word Masking. Now available at [github.com/huggingface/transformers](https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1)
-
-<p align="center">
-    <img src="https://user-images.githubusercontent.com/7140210/72913702-d55a8480-3d3d-11ea-99fc-f2ef29af4e72.jpg" width="700"> </br>
-    Marco Lodola, Monument to Umberto Eco, Alessandria 2019
-</p>
-
-## Dataset
-UmBERTo-Wikipedia-Uncased Training is trained on a relative small corpus (~7GB) extracted from [Wikipedia-ITA](https://linguatools.org/tools/corpora/wikipedia-monolingual-corpora/).
-
-## Pre-trained model
-
-| Model | WWM | Cased | Tokenizer | Vocab Size  | Train Steps |  Download |
-| ------ | ------ | ------ | ------ | ------ |------ | ------ |
-| `umberto-wikipedia-uncased-v1` | YES | YES | SPM | 32K | 100k | [Link](http://bit.ly/35wbSj6) |
-
-This model was trained with [SentencePiece](https://github.com/google/sentencepiece) and Whole Word Masking.
-
-## Downstream Tasks
-These results refers to umberto-wikipedia-uncased model. All details are at [Umberto](https://github.com/musixmatchresearch/umberto) Official Page.
-
-#### Named Entity Recognition (NER)
-
-| Dataset | F1 | Precision | Recall | Accuracy |
-| ------ | ------ | ------ |  ------ |  ----- |
-| **ICAB-EvalITA07** | **86.240** | 85.939 | 86.544 | 98.534 | 
-| **WikiNER-ITA** | **90.483** | 90.328 | 90.638 | 98.661 | 
-
-#### Part of Speech (POS)
-
-| Dataset | F1 | Precision | Recall | Accuracy |
-| ------ | ------ | ------ |  ------ |  ------ |
-| **UD_Italian-ISDT** | 98.563  | 98.508 | 98.618 | **98.717** | 
-| **UD_Italian-ParTUT** | 97.810 | 97.835 |  97.784 | **98.060** | 
-
-
-
-## Usage
-
-##### Load UmBERTo Wikipedia Uncased with AutoModel, Autotokenizer:
-
-```python
-
-import torch
-from transformers import AutoTokenizer, AutoModel
-
-tokenizer = AutoTokenizer.from_pretrained("Musixmatch/umberto-wikipedia-uncased-v1")
-umberto = AutoModel.from_pretrained("Musixmatch/umberto-wikipedia-uncased-v1")
-
-encoded_input = tokenizer.encode("Umberto Eco è stato un grande scrittore")
-input_ids = torch.tensor(encoded_input).unsqueeze(0)  # Batch size 1
-outputs = umberto(input_ids)
-last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output
-```
-
-##### Predict masked token:
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-	"fill-mask",
-	model="Musixmatch/umberto-wikipedia-uncased-v1",
-	tokenizer="Musixmatch/umberto-wikipedia-uncased-v1"
-)
-
-result = fill_mask("Umberto Eco è <mask> un grande scrittore")
-# {'sequence': '<s> umberto eco è stato un grande scrittore</s>', 'score': 0.5784581303596497, 'token': 361}
-# {'sequence': '<s> umberto eco è anche un grande scrittore</s>', 'score': 0.33813193440437317, 'token': 269}
-# {'sequence': '<s> umberto eco è considerato un grande scrittore</s>', 'score': 0.027196012437343597, 'token': 3236}
-# {'sequence': '<s> umberto eco è diventato un grande scrittore</s>', 'score': 0.013716378249228, 'token': 5742}
-# {'sequence': '<s> umberto eco è inoltre un grande scrittore</s>', 'score': 0.010662357322871685, 'token': 1030}
-```
-
-
-## Citation
-All of the original datasets are publicly available or were released with the owners' grant. The datasets are all released under a CC0 or CCBY license.
-
-* UD Italian-ISDT Dataset [Github](https://github.com/UniversalDependencies/UD_Italian-ISDT)
-* UD Italian-ParTUT Dataset [Github](https://github.com/UniversalDependencies/UD_Italian-ParTUT)
-* I-CAB (Italian Content Annotation Bank), EvalITA [Page](http://www.evalita.it/)
-* WIKINER [Page](https://figshare.com/articles/Learning_multilingual_named_entity_recognition_from_Wikipedia/5462500) , [Paper](https://www.sciencedirect.com/science/article/pii/S0004370212000276?via%3Dihub)
-
-```
-@inproceedings {magnini2006annotazione,
-	title = {Annotazione di contenuti concettuali in un corpus italiano: I - CAB},
-	author = {Magnini,Bernardo and Cappelli,Amedeo and Pianta,Emanuele and Speranza,Manuela and Bartalesi Lenzi,V and Sprugnoli,Rachele and Romano,Lorenza and Girardi,Christian and Negri,Matteo},
-	booktitle = {Proc.of SILFI 2006},
-	year = {2006}
-}
-@inproceedings {magnini2006cab,
-	title = {I - CAB: the Italian Content Annotation Bank.},
-	author = {Magnini,Bernardo and Pianta,Emanuele and Girardi,Christian and Negri,Matteo and Romano,Lorenza and Speranza,Manuela and Lenzi,Valentina Bartalesi and Sprugnoli,Rachele},
-	booktitle = {LREC},
-	pages = {963--968},
-	year = {2006},
-	organization = {Citeseer}
-}
-```
-
-## Authors
-
-**Loreto Parisi**: `loreto at musixmatch dot com`, [loretoparisi](https://github.com/loretoparisi)
-**Simone Francia**: `simone.francia at musixmatch dot com`, [simonefrancia](https://github.com/simonefrancia)
-**Paolo Magnani**: `paul.magnani95 at gmail dot com`, [paulthemagno](https://github.com/paulthemagno)
-
-## About Musixmatch AI
-![Musxmatch Ai mac app icon-128](https://user-images.githubusercontent.com/163333/72244273-396aa380-35ee-11ea-894b-4ea48230c02b.png)
-We do Machine Learning and Artificial Intelligence @[musixmatch](https://twitter.com/Musixmatch)
-Follow us on [Twitter](https://twitter.com/musixmatchai) [Github](https://github.com/musixmatchresearch)
-
--- a/model_cards/NLP4H/ms_bert/README.md
+++ b/model_cards/NLP4H/ms_bert/README.md
@ -1,54 +0,0 @@
-# MS-BERT
-
-## Introduction
-
-This repository provides codes and models of MS-BERT.
-MS-BERT was pre-trained on notes from neurological examination for Multiple Sclerosis (MS) patients at St. Michael's Hospital in Toronto, Canada.
-
-## Data
-
-The dataset contained approximately 75,000 clinical notes, for about 5000 patients, totaling to over 35.7 million words.
-These notes were collected from patients who visited St. Michael's Hospital MS Clinic between 2015 to 2019.
-The notes contained a variety of information pertaining to a neurological exam.
-For example, a note can contain information on the patient's condition, their progress over time and diagnosis.
-The gender split within the dataset was observed to be 72% female and 28% male ([which reflects the natural discrepancy seen in MS][1]).
-Further sections will describe how MS-BERT was pre trained through the use of these clinically relevant and rich neurological notes.
-
-## Data pre-processing
-
-The data was pre-processed to remove any identifying information. This includes information on: patient names, doctor names, hospital names, patient identification numbers, phone numbers, addresses, and time. In order to de-identify the information, we used a curated database that contained patient and doctor information. This curated database was paired with regular expressions to find and remove any identifying pieces of information. Each of these identifiers were replaced with a specific token. These tokens were chosen based on three criteria: (1) they belong to the current BERT vocab, (2), they have relatively the same semantic meaning as the word they are replacing, and (3), the token is not found in the original unprocessed dataset. The replacements that met the criteria above were as follows: 
-
-Female first names -> Lucie
-
-Male first names -> Ezekiel
-
-Last/family names -> Salamanca.
-
-Dates -> 2010s
-
-Patient IDs -> 999
-
-Phone numbers -> 1718
-
-Addresses -> Silesia
-
-Time -> 1610
-
-Locations/Hospital/Clinic names -> Troy
-
-## Pre-training
-
-The starting point for our model is the already pre-trained and fine-tuned BLUE-BERT base. We further pre-train it using the masked language modelling task from the huggingface transformers [library](https://github.com/huggingface). 
-
-The hyperparameters can be found in the config file in this repository or [here](https://s3.amazonaws.com/models.huggingface.co/bert/NLP4H/ms_bert/config.json)
-
-## Acknowledgements
-
-We would like to thank the researchers and staff at the Data Science and Advanced Analytics (DSAA) department, St. Michael’s Hospital, for providing consistent support and guidance throughout this project.
-We would also like to thank Dr. Marzyeh Ghassemi, Taylor Killan, Nathan Ng and Haoran Zhang for providing us the opportunity to work on this exciting project.
-
-## Disclaimer
-
-MS-BERT shows the results of research conducted at the Data Science and Advanced Analytics (DSAA) department, St. Michael’s Hospital. The results produced by MS-BERT are not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not make decisions about their health solely on the basis of the results produced by MS-BERT. St. Michael’s Hospital does not independently verify the validity or utility of the results produced by MS-BERT. If you have questions about the results produced by MS-BERT please consult a healthcare professional. If you would like more information about the research conducted at DSAA please contact [Zhen Yang](mailto:zhen.yang@unityhealth.to). If you would like more information on neurological examination notes please contact [Dr. Tony Antoniou](mailto:tony.antoniou@unityhealth.to) or [Dr. Jiwon Oh](mailto:jiwon.oh@unityhealth.to) from the MS clinic at St. Michael's Hospital.
-
-[1]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707353/
--- a/model_cards/Naveen-k/KanBERTo/README.md
+++ b/model_cards/Naveen-k/KanBERTo/README.md
@ -1,28 +0,0 @@
---
-language: kn
---
-
-# Welcome to KanBERTo (ಕನ್ಬರ್ಟೋ)
-
-## Model Description
- 
-> This is a small language model for [Kannada](https://en.wikipedia.org/wiki/Kannada) language with 1M data samples taken from
-  [OSCAR page](https://traces1.inria.fr/oscar/files/compressed-orig/kn.txt.gz)
-
-## Training params 
-
- **Dataset** - 1M data samples are used to train this model from OSCAR page(https://traces1.inria.fr/oscar/) eventhough data set is of 1.7 GB due to resource constraint to train 
-I have picked only 1M data from the total 1.7GB data set. If you are interested in collaboration and have computational resources to train on you are most welcome to do so.
-
- **Preprocessing** - ByteLevelBPETokenizer is used to tokenize the sentences at character level and vocabulary size is set to 52k as per standard values given by 🤗 
- **Hyperparameters** - __ByteLevelBPETokenizer__ : vocabulary size = 52_000 and  min_frequency = 2
-                        __Trainer__ :               num_train_epochs=12 - trained for 12 epochs
-                                                    per_gpu_train_batch_size=64 - batch size for the datasamples is 64
-                                                    save_steps=10_000 - save model for every 10k steps
-                                                    save_total_limit=2 - save limit is set for 2
-
-**Intended uses & limitations**
-  this is for anyone who wants to make use of kannada language models for various tasks like language generation, translation and many more use cases.
-
-**Whatever else is helpful!**
-  If you are intersted in collaboration feel free to reach  me [Naveen](mailto:naveen.maltesh@gmail.com)
--- a/model_cards/NeuML/bert-small-cord19-squad2/README.md
+++ b/model_cards/NeuML/bert-small-cord19-squad2/README.md
@ -1,26 +0,0 @@
-# BERT-Small CORD-19 fine-tuned on SQuAD 2.0
-
-[bert-small-cord19 model](https://huggingface.co/NeuML/bert-small-cord19) fine-tuned on SQuAD 2.0
-
-## Building the model
-
-```bash
-python run_squad.py
-    --model_type bert
-    --model_name_or_path bert-small-cord19
-    --do_train
-    --do_eval
-    --do_lower_case
-    --version_2_with_negative
-    --train_file train-v2.0.json
-    --predict_file dev-v2.0.json
-    --per_gpu_train_batch_size 8
-    --learning_rate 3e-5
-    --num_train_epochs 3.0
-    --max_seq_length 384
-    --doc_stride 128
-    --output_dir bert-small-cord19-squad2
-    --save_steps 0
-    --threads 8
-    --overwrite_cache
-    --overwrite_output_dir
--- a/model_cards/NeuML/bert-small-cord19/README.md
+++ b/model_cards/NeuML/bert-small-cord19/README.md
@ -1,25 +0,0 @@
-# BERT-Small fine-tuned on CORD-19 dataset
-
-[BERT L6_H-512_A-8 model](https://huggingface.co/google/bert_uncased_L-6_H-512_A-8) fine-tuned on the [CORD-19 dataset](https://www.semanticscholar.org/cord19).
-
-## CORD-19 data subset
-The training data for this dataset is stored as a [Kaggle dataset](https://www.kaggle.com/davidmezzetti/cord19-qa?select=cord19.txt). The training
-data is a subset of the full corpus, focusing on high-quality, study-design detected articles.
-
-## Building the model
-
-```bash
-python run_language_modeling.py
-    --model_type bert
-    --model_name_or_path google/bert_uncased_L-6_H-512_A-8
-    --do_train
-    --mlm
-    --line_by_line
-    --block_size 512
-    --train_data_file cord19.txt
-    --per_gpu_train_batch_size 4
-    --learning_rate 3e-5
-    --num_train_epochs 3.0
-    --output_dir bert-small-cord19
-    --save_steps 0
-    --overwrite_output_dir
--- a/model_cards/NeuML/bert-small-cord19qa/README.md
+++ b/model_cards/NeuML/bert-small-cord19qa/README.md
@ -1,63 +0,0 @@
-# BERT-Small fine-tuned on CORD-19 QA dataset
-
-[bert-small-cord19-squad model](https://huggingface.co/NeuML/bert-small-cord19-squad2) fine-tuned on the [CORD-19 QA dataset](https://www.kaggle.com/davidmezzetti/cord19-qa?select=cord19-qa.json).
-
-## CORD-19 QA dataset
-The CORD-19 QA dataset is a SQuAD 2.0 formatted list of question, context, answer combinations covering the [CORD-19 dataset](https://www.semanticscholar.org/cord19).
-
-## Building the model
-
-```bash
-python run_squad.py \
-    --model_type bert \
-    --model_name_or_path bert-small-cord19-squad \
-    --do_train \
-    --do_lower_case \
-    --version_2_with_negative \
-    --train_file cord19-qa.json \
-    --per_gpu_train_batch_size 8 \
-    --learning_rate 5e-5 \
-    --num_train_epochs 10.0 \
-    --max_seq_length 384 \
-    --doc_stride 128 \
-    --output_dir bert-small-cord19qa \
-    --save_steps 0 \
-    --threads 8 \
-    --overwrite_cache \
-    --overwrite_output_dir
-```
-
-## Testing the model
-
-Example usage below:
-
-```python
-from transformers import pipeline
-
-qa = pipeline(
-    "question-answering",
-    model="NeuML/bert-small-cord19qa",
-    tokenizer="NeuML/bert-small-cord19qa"
-)
-
-qa({
-    "question": "What is the median incubation period?",
-    "context": "The incubation period is around 5 days (range: 4-7 days) with a maximum of 12-13 day"
-})
-
-qa({
-    "question": "What is the incubation period range?",
-    "context": "The incubation period is around 5 days (range: 4-7 days) with a maximum of 12-13 day"
-})
-
-qa({
-    "question": "What type of surfaces does it persist?",
-    "context": "The virus can survive on surfaces for up to 72 hours such as plastic and stainless steel ."
-})
-```
-
-```json
-{"score": 0.5970273583242793, "start": 32, "end": 38, "answer": "5 days"}
-{"score": 0.999555868193891, "start": 39, "end": 56, "answer": "(range: 4-7 days)"}
-{"score": 0.9992726505196998, "start": 61, "end": 88, "answer": "plastic and stainless steel"}
-```
--- a/model_cards/NlpHUST/vibert4news-base-cased/README.md
+++ b/model_cards/NlpHUST/vibert4news-base-cased/README.md
@ -1,55 +0,0 @@
---
-language: vn
---
-# BERT for Vietnamese is trained on more 20 GB news dataset
-
-Apply for task sentiment analysis on using [AIViVN's comments dataset](https://www.aivivn.com/contests/6)
-
-The model achieved 0.90268 on the public leaderboard, (winner's score is 0.90087)
-Bert4news is used for a toolkit Vietnames(segmentation and Named Entity Recognition) at ViNLPtoolkit(https://github.com/bino282/ViNLP)
-
-***************New Mar 11 , 2020 ***************
-
-**[BERT](https://github.com/google-research/bert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
-
-We use word sentencepiece, use basic bert tokenization and same config with bert base with lowercase = False.
-
-You can download trained model:
- [tensorflow](https://drive.google.com/file/d/1X-sRDYf7moS_h61J3L79NkMVGHP-P-k5/view?usp=sharing).
- [pytorch](https://drive.google.com/file/d/11aFSTpYIurn-oI2XpAmcCTccB_AonMOu/view?usp=sharing).
-
-Use with huggingface/transformers
-``` bash
-import torch
-from transformers import AutoTokenizer,AutoModel
-tokenizer= AutoTokenizer.from_pretrained("NlpHUST/vibert4news-base-cased")
-bert_model = AutoModel.from_pretrained("NlpHUST/vibert4news-base-cased")
-
-line = "Tôi là sinh viên trường Bách Khoa Hà Nội ."
-input_id = tokenizer.encode(line,add_special_tokens = True)
-att_mask = [int(token_id > 0) for token_id in input_id]
-input_ids = torch.tensor([input_id])
-att_masks = torch.tensor([att_mask])
-with torch.no_grad():
-    features = bert_model(input_ids,att_masks)
-
-print(features)
-
-```
-
-Run training with base config
-
-``` bash
-
-python train_pytorch.py \
-  --model_path=bert4news.pytorch \
-  --max_len=200 \
-  --batch_size=16 \
-  --epochs=6 \
-  --lr=2e-5
-
-```
-
-### Contact information
-For personal communication related to this project, please contact Nha Nguyen Van (nha282@gmail.com).
-
--- a/model_cards/Norod78/hewiki-articles-distilGPT2py-il/README.md
+++ b/model_cards/Norod78/hewiki-articles-distilGPT2py-il/README.md
@ -1,109 +0,0 @@
---
-language: he
-
-thumbnail: https://avatars1.githubusercontent.com/u/3617152?norod.jpg
-widget:
- text: "<|startoftext|>החוק השני של מועדון קרב הוא"
- text: "<|startoftext|>ראש הממשלה בן גוריון"
- text: "<|startoftext|>למידת מכונה (סרט)"
- text: "<|startoftext|>מנשה פומפרניקל"
- text: "<|startoftext|>אי שוויון "
-
-license: mit
---
-
-
-# hewiki-articles-distilGPT2py-il
-
-## A tiny GPT2 model for generating Hebrew text
-
-A distilGPT2 sized model. <br>
-Training data was hewiki-20200701-pages-articles-multistream.xml.bz2 from https://dumps.wikimedia.org/hewiki/20200701/  <br>
-XML has been converted to plain text using Wikipedia Extractor http://medialab.di.unipi.it/wiki/Wikipedia_Extractor  <br>
-I then added <|startoftext|> and <|endoftext|> markers and deleted empty lines.  <br>
-
-#### How to use
-
-```python
-import torch
-import torch.nn as nn
-from transformers import GPT2Tokenizer, GPT2LMHeadModel
-
-tokenizer = GPT2Tokenizer.from_pretrained("Norod78/hewiki-articles-distilGPT2py-il")
-model = GPT2LMHeadModel.from_pretrained("Norod78/hewiki-articles-distilGPT2py-il").eval()
-
-bos_token = tokenizer.bos_token #Beginning of sentace 
-eos_token = tokenizer.eos_token #End of sentence 
-
-def generate_word(model, tokens_tensor, temperature=1.0):
-  """ 
-  Sample a word given a tensor of tokens of previous words from a model. Given 
-  the words we have, sample a plausible word. Temperature is used for 
-  controlling randomness. If using temperature==0 we simply use a greedy arg max. 
-  Else, we sample from a multinomial distribution using a lower inverse 
-  temperature to allow for more randomness to escape repetitions. 
-  """
-  with torch.no_grad():
-    outputs = model(tokens_tensor)
-    predictions = outputs[0]
-    if temperature>0:
-      # Make the distribution more or less skewed based on the temperature
-      predictions = outputs[0]/temperature
-      # Sample from the distribution
-      softmax = nn.Softmax(dim=0)
-      predicted_index = torch.multinomial(softmax(predictions[0,-1,:]),1).item()
-    # Simply take the arg-max of the distribution
-    else:
-      predicted_index = torch.argmax(predictions[0, -1, :]).item()
-    # Decode the encoding to the corresponding word
-    predicted_text = tokenizer.decode([predicted_index])
-  return predicted_text
-
-def generate_sentence(model, tokenizer, initial_text, temperature=1.0):
-  """ Generate a sentence given some initial text using a model and a tokenizer.
-  Returns the new sentence. """
-        
-  # Encode a text inputs
-  text = ""
-  sentence = text
-
-  # We avoid an infinite loop by setting a maximum range
-  for i in range(0,84):
-    indexed_tokens = tokenizer.encode(initial_text + text)
-      
-    # Convert indexed tokens in a PyTorch tensor
-    tokens_tensor = torch.tensor([indexed_tokens])
-    
-    new_word = generate_word(model, tokens_tensor, temperature=temperature)
-
-    # Here the temperature is slowly decreased with each generated word,
-    # this ensures that the sentence (ending) makes more sense.
-    # We don't decrease to a temperature of 0.0 to leave some randomness in.
-    if temperature<(1-0.008):
-      temperature += 0.008
-    else:
-      temperature = 0.996
-
-    text = text+new_word
-
-    # Stop generating new words when we have reached the end of the line or the poem
-    if eos_token in new_word:
-      # returns new sentence and whether poem is done
-      return (text.replace(eos_token,"").strip(), True)
-    elif '/' in new_word:
-      return (text.strip(), False)
-    elif bos_token in new_word:
-        return (text.replace(bos_token,"").strip(), False)
-      
-  return (text, True)
-
-for output_num in range(1,5):
-  init_text = "בוקר טוב"
-  text = bos_token + init_text
-  for i in range(0,84):
-    sentence = generate_sentence(model, tokenizer, text, temperature=0.9)    
-    text = init_text + sentence[0]
-    print(text)
-    if (sentence[1] == True):
-      break   
-```
--- a/model_cards/Ogayo/Hel-ach-en/README.md
+++ b/model_cards/Ogayo/Hel-ach-en/README.md
@ -1,48 +0,0 @@
---
-language: 
- ach 
- en
-tags:
- translation
-license: cc-by-4.0
-datasets:
- JW300
-metrics:
- bleu
---
-
-# HEL-ACH-EN
-
-## Model description
-
-MT model translating Acholi to English initialized with weights from [opus-mt-luo-en](https://huggingface.co/Helsinki-NLP/opus-mt-luo-en) on HuggingFace.
-
-## Intended uses & limitations
-Machine Translation experiments. Do not use for sensitive tasks.
-#### How to use
-
-```python
-# You can include sample code which will be formatted
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-
-tokenizer = AutoTokenizer.from_pretrained("Ogayo/Hel-ach-en")
-
-model = AutoModelForSeq2SeqLM.from_pretrained("Ogayo/Hel-ach-en")
-
-```
-
-#### Limitations and bias
-
-Trained on Jehovah Witnesses data so contains theirs and Christian views.
-
-## Training data
-Trained on OPUS JW300 data.
-Initialized with weights from [opus-mt-luo-en](https://huggingface.co/Helsinki-NLP/opus-mt-luo-en?text=Bed+gi+nyasi+mar+chieng%27+nyuol+mopong%27+gi+mor%21#model_card)
-
-## Training procedure
-
-Remove duplicates and rows with no alphabetic characters. Used GPU
-## Eval results
-testset | BLEU 
--- | --- 
-JW300.luo.en| 46.1
--- a/model_cards/Primer/bart-squad2/README.md
+++ b/model_cards/Primer/bart-squad2/README.md
@ -1,63 +0,0 @@
---
-language: "en"
---
-
-# BART-Squad2
-
-## Model description
-
-BART for extractive (span-based) question answering, trained on Squad 2.0.
-
-F1 score of 87.4.
-
-## Intended uses & limitations
-
-Unfortunately, the Huggingface auto-inference API won't run this model, so if you're attempting to try it through the input box above and it complains, don't be discouraged!
-
-#### How to use
-
-Here's a quick way to get question answering running locally:
-
-```python
-from transformers import AutoTokenizer, AutoModelForQuestionAnswering
-
-tokenizer = AutoTokenizer.from_pretrained("Primer/bart-squad2")
-model = AutoModelForQuestionAnswering.from_pretrained("Primer/bart-squad2")
-model.to('cuda'); model.eval()
-
-def answer(question, text):
-    seq = '<s>' +  question + ' </s> </s> ' + text + ' </s>'
-    tokens = tokenizer.encode_plus(seq, return_tensors='pt', padding='max_length', max_length=1024)
-    input_ids = tokens['input_ids'].to('cuda')
-    attention_mask = tokens['attention_mask'].to('cuda')
-    start, end, _ = model(input_ids, attention_mask=attention_mask)
-    start_idx = int(start.argmax().int())
-    end_idx =  int(end.argmax().int())
-    print(tokenizer.decode(input_ids[0, start_idx:end_idx]).strip())
-    # ^^ it will be an empty string if the model decided "unanswerable"
-
->>> question = "Where does Tom live?"
->>> context = "Tom is an engineer in San Francisco."
->>> answer(question, context)
-San Francisco
-```
-
-(Just drop the `.to('cuda')` stuff if running on CPU).
-
-#### Limitations and bias
-
-Unknown, no further evaluation has been performed. In a technical sense one big limitation is that it's 1.6G 😬
-
-## Training procedure
-
-`run_squad.py` with:
-
-|param|value|
-|---|---|
-|batch size|8|
-|max_seq_length|1024|
-|learning rate|1e-5|
-|epochs|2|
-
-Modified to freeze shared parameters and encoder embeddings.
-
--- a/model_cards/README.md
+++ b/model_cards/README.md
@ -0,0 +1,26 @@
+## 🔥 Model cards now live inside each huggingface.co model repo 🔥
+
+
+For consistency, ease of use and scalability, `README.md` model cards now live directly inside each model repo on the HuggingFace model hub.
+
+### How to update a model card
+
+You can directly update a model card inside any model repo you have **write access** to, i.e.:
+- a model under your username namespace
+- a model under any organization you are a part of.
+
+You can either:
+- update it, commit and push using your usual git workflow (command line, GUI, etc.)
+- or edit it directly from the website's UI.
+
+**What if you want to create or update a model card for a model you don't have write access to?**
+
+In that case, given that we don't have a Pull request system yet on huggingface.co (🤯),
+you can open an issue here, post the card's content, and tag the model author(s) and/or the Hugging Face team.
+
+We might implement a more seamless process at some point, so your early feedback is precious!
+Please let us know of any suggestion.
+
+### What happened to the model cards here?
+
+We migrated every model card from the repo to its corresponding huggingface.co model repo. Individual commits were preserved, and they link back to the original commit on GitHub.
--- a/model_cards/Rostlab/prot_bert/README.md
+++ b/model_cards/Rostlab/prot_bert/README.md
@ -1,141 +0,0 @@
---
-language: protein
-tags:
- protein language model
-datasets:
- Uniref100
---
-
-# ProtBert model
-
-Pretrained model on protein sequences using a masked language modeling (MLM) objective. It was introduced in
-[this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
-[this repository](https://github.com/agemagician/ProtTrans). This model is trained on uppercase amino acids: it only works with capital letter amino acids.
-
-
-## Model description
-
-ProtBert is based on Bert model which pretrained on a large corpus of protein sequences in a self-supervised fashion.
-This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
-
-One important difference between our Bert model and the original Bert version is the way of dealing with sequences as separate documents.
-This means the Next sentence prediction is not used, as each sequence is treated as a complete document.
-The masking follows the original Bert training with randomly masks 15% of the amino acids in the input. 
-
-At the end, the feature extracted from this model revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein
-shape.
-This implied learning some of the grammar of the language of life realized in protein sequences.
-
-## Intended uses & limitations
-
-The model could be used for protein feature extraction or to be fine-tuned on downstream tasks.
-We have noticed in some tasks you could gain more accuracy by fine-tuning the model rather than using it as a feature extractor.
-
-### How to use
-
-You can use this model directly with a pipeline for masked language modeling:
-
-```python
->>> from transformers import BertForMaskedLM, BertTokenizer, pipeline
->>> tokenizer = BertTokenizer.from_pretrained("Rostlab/prot_bert", do_lower_case=False )
->>> model = BertForMaskedLM.from_pretrained("Rostlab/prot_bert")
->>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
->>> unmasker('D L I P T S S K L V V [MASK] D T S L Q V K K A F F A L V T')
-
-[{'score': 0.11088453233242035,
-  'sequence': '[CLS] D L I P T S S K L V V L D T S L Q V K K A F F A L V T [SEP]',
-  'token': 5,
-  'token_str': 'L'},
- {'score': 0.08402521163225174,
-  'sequence': '[CLS] D L I P T S S K L V V S D T S L Q V K K A F F A L V T [SEP]',
-  'token': 10,
-  'token_str': 'S'},
- {'score': 0.07328339666128159,
-  'sequence': '[CLS] D L I P T S S K L V V V D T S L Q V K K A F F A L V T [SEP]',
-  'token': 8,
-  'token_str': 'V'},
- {'score': 0.06921856850385666,
-  'sequence': '[CLS] D L I P T S S K L V V K D T S L Q V K K A F F A L V T [SEP]',
-  'token': 12,
-  'token_str': 'K'},
- {'score': 0.06382402777671814,
-  'sequence': '[CLS] D L I P T S S K L V V I D T S L Q V K K A F F A L V T [SEP]',
-  'token': 11,
-  'token_str': 'I'}]
-```
-
-Here is how to use this model to get the features of a given protein sequence in PyTorch:
-
-```python
-from transformers import BertModel, BertTokenizer
-import re
-tokenizer = BertTokenizer.from_pretrained("Rostlab/prot_bert", do_lower_case=False )
-model = BertModel.from_pretrained("Rostlab/prot_bert")
-sequence_Example = "A E T C Z A O"
-sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example)
-encoded_input = tokenizer(sequence_Example, return_tensors='pt')
-output = model(**encoded_input)
-```
-
-## Training data
-
-The ProtBert model was pretrained on [Uniref100](https://www.uniprot.org/downloads), a dataset consisting of 217 million protein sequences.
-
-## Training procedure
-
-### Preprocessing
-
-The protein sequences are uppercased and tokenized using a single space and a vocabulary size of 21. The rare amino acids "U,Z,O,B" were mapped to "X".
-The inputs of the model are then of the form:
-
-```
-[CLS] Protein Sequence A [SEP] Protein Sequence B [SEP]
-```
-
-Furthermore, each protein sequence was treated as a separate document.
-The preprocessing step was performed twice, once for a combined length (2 sequences) of less than 512 amino acids, and another time using a combined length (2 sequences) of less than 2048 amino acids.
-
-The details of the masking procedure for each sequence followed the original Bert model as following:
- 15% of the amino acids are masked.
- In 80% of the cases, the masked amino acids are replaced by `[MASK]`.
- In 10% of the cases, the masked amino acids are replaced by a random amino acid (different) from the one they replace.
- In the 10% remaining cases, the masked amino acids are left as is.
-
-### Pretraining
-
-The model was trained on a single TPU Pod V3-512 for 400k steps in total.
-300K steps using sequence length 512 (batch size 15k), and 100K steps using sequence length 2048 (batch size 2.5k).
-The optimizer used is Lamb with a learning rate of 0.002, a weight decay of 0.01, learning rate warmup for 40k steps and linear decay of the learning rate after.
-
-## Evaluation results
-
-When fine-tuned on downstream tasks, this model achieves the following results:
-
-Test results :
-
-| Task/Dataset | secondary structure (3-states) | secondary structure (8-states)  |  Localization | Membrane  |
-|:-----:|:-----:|:-----:|:-----:|:-----:|
-|   CASP12  | 75 | 63 |    |    |
-|   TS115   | 83 | 72 |    |    | 
-|   CB513   | 81 | 66 |    |    |
-|  DeepLoc  |    |    | 79 | 91 |
-
-### BibTeX entry and citation info
-
-```bibtex
-@article {Elnaggar2020.07.12.199554,
-	author = {Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Wang, Yu and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and BHOWMIK, DEBSINDHU and Rost, Burkhard},
-	title = {ProtTrans: Towards Cracking the Language of Life{\textquoteright}s Code Through Self-Supervised Deep Learning and High Performance Computing},
-	elocation-id = {2020.07.12.199554},
-	year = {2020},
-	doi = {10.1101/2020.07.12.199554},
-	publisher = {Cold Spring Harbor Laboratory},
-	abstract = {Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive language models (Transformer-XL, XLNet) and two auto-encoder models (Bert, Albert) on data from UniRef and BFD containing up to 393 billion amino acids (words) from 2.1 billion protein sequences (22- and 112 times the entire English Wikipedia). The LMs were trained on the Summit supercomputer at Oak Ridge National Laboratory (ORNL), using 936 nodes (total 5616 GPUs) and one TPU Pod (V3-512 or V3-1024). We validated the advantage of up-scaling LMs to larger models supported by bigger data by predicting secondary structure (3-states: Q3=76-84, 8 states: Q8=65-73), sub-cellular localization for 10 cellular compartments (Q10=74) and whether a protein is membrane-bound or water-soluble (Q2=89). Dimensionality reduction revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein shape. This implied learning some of the grammar of the language of life realized in protein sequences. The successful up-scaling of protein LMs through HPC to larger data sets slightly reduced the gap between models trained on evolutionary information and LMs. Availability ProtTrans: \&lt;a href="https://github.com/agemagician/ProtTrans"\&gt;https://github.com/agemagician/ProtTrans\&lt;/a\&gt;Competing Interest StatementThe authors have declared no competing interest.},
-	URL = {https://www.biorxiv.org/content/early/2020/07/21/2020.07.12.199554},
-	eprint = {https://www.biorxiv.org/content/early/2020/07/21/2020.07.12.199554.full.pdf},
-	journal = {bioRxiv}
-}
-```
-
-> Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/)
--- a/model_cards/Rostlab/prot_bert_bfd/README.md
+++ b/model_cards/Rostlab/prot_bert_bfd/README.md
@ -1,141 +0,0 @@
---
-language: protein
-tags:
- protein language model
-datasets:
- BFD
---
-
-# ProtBert-BFD model
-
-Pretrained model on protein sequences using a masked language modeling (MLM) objective. It was introduced in
-[this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
-[this repository](https://github.com/agemagician/ProtTrans). This model is trained on uppercase amino acids: it only works with capital letter amino acids.
-
-
-## Model description
-
-ProtBert-BFD is based on Bert model which pretrained on a large corpus of protein sequences in a self-supervised fashion.
-This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
-
-One important difference between our Bert model and the original Bert version is the way of dealing with sequences as separate documents
-This means the Next sentence prediction is not used, as each sequence is treated as a complete document.
-The masking follows the original Bert training with randomly masks 15% of the amino acids in the input. 
-
-At the end, the feature extracted from this model revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein
-shape.
-This implied learning some of the grammar of the language of life realized in protein sequences.
-
-## Intended uses & limitations
-
-The model could be used for protein feature extraction or to be fine-tuned on downstream tasks.
-We have noticed in some tasks you could gain more accuracy by fine-tuning the model rather than using it as a feature extractor.
-
-### How to use
-
-You can use this model directly with a pipeline for masked language modeling:
-
-```python
->>> from transformers import BertForMaskedLM, BertTokenizer, pipeline
->>> tokenizer = BertTokenizer.from_pretrained('Rostlab/prot_bert_bfd', do_lower_case=False )
->>> model = BertForMaskedLM.from_pretrained("Rostlab/prot_bert_bfd")
->>> unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
->>> unmasker('D L I P T S S K L V V [MASK] D T S L Q V K K A F F A L V T')
-
-[{'score': 0.1165614128112793,
-  'sequence': '[CLS] D L I P T S S K L V V L D T S L Q V K K A F F A L V T [SEP]',
-  'token': 5,
-  'token_str': 'L'},
- {'score': 0.08976086974143982,
-  'sequence': '[CLS] D L I P T S S K L V V V D T S L Q V K K A F F A L V T [SEP]',
-  'token': 8,
-  'token_str': 'V'},
- {'score': 0.08864385634660721,
-  'sequence': '[CLS] D L I P T S S K L V V S D T S L Q V K K A F F A L V T [SEP]',
-  'token': 10,
-  'token_str': 'S'},
- {'score': 0.06227643042802811,
-  'sequence': '[CLS] D L I P T S S K L V V A D T S L Q V K K A F F A L V T [SEP]',
-  'token': 6,
-  'token_str': 'A'},
- {'score': 0.06194969266653061,
-  'sequence': '[CLS] D L I P T S S K L V V T D T S L Q V K K A F F A L V T [SEP]',
-  'token': 15,
-  'token_str': 'T'}]
-```
-
-Here is how to use this model to get the features of a given protein sequence in PyTorch:
-
-```python
-from transformers import BertModel, BertTokenizer
-import re
-tokenizer = BertTokenizer.from_pretrained('Rostlab/prot_bert_bfd', do_lower_case=False )
-model = BertModel.from_pretrained("Rostlab/prot_bert_bfd")
-sequence_Example = "A E T C Z A O"
-sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example)
-encoded_input = tokenizer(sequence_Example, return_tensors='pt')
-output = model(**encoded_input)
-```
-
-## Training data
-
-The ProtBert-BFD model was pretrained on [BFD](https://bfd.mmseqs.com/), a dataset consisting of 2.1 billion protein sequences.
-
-## Training procedure
-
-### Preprocessing
-
-The protein sequences are uppercased and tokenized using a single space and a vocabulary size of 21.
-The inputs of the model are then of the form:
-
-```
-[CLS] Protein Sequence A [SEP] Protein Sequence B [SEP]
-```
-
-Furthermore, each protein sequence was treated as a separate document.
-The preprocessing step was performed twice, once for a combined length (2 sequences) of less than 512 amino acids, and another time using a combined length (2 sequences) of less than 2048 amino acids.
-
-The details of the masking procedure for each sequence followed the original Bert model as following:
- 15% of the amino acids are masked.
- In 80% of the cases, the masked amino acids are replaced by `[MASK]`.
- In 10% of the cases, the masked amino acids are replaced by a random amino acid (different) from the one they replace.
- In the 10% remaining cases, the masked amino acids are left as is.
-
-### Pretraining
-
-The model was trained on a single TPU Pod V3-1024 for one million steps in total.
-800k steps using sequence length 512 (batch size 32k), and 200K steps using sequence length 2048 (batch size 6k).
-The optimizer used is Lamb with a learning rate of 0.002, a weight decay of 0.01, learning rate warmup for 140k steps and linear decay of the learning rate after.
-
-## Evaluation results
-
-When fine-tuned on downstream tasks, this model achieves the following results:
-
-Test results :
-
-| Task/Dataset | secondary structure (3-states) | secondary structure (8-states)  |  Localization | Membrane  |
-|:-----:|:-----:|:-----:|:-----:|:-----:|
-|   CASP12  | 76 | 65 |    |    |
-|   TS115   | 84 | 73 |    |    | 
-|   CB513   | 83 | 70 |    |    |
-|  DeepLoc  |    |    | 78 | 91 |
-
-### BibTeX entry and citation info
-
-```bibtex
-@article {Elnaggar2020.07.12.199554,
-	author = {Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Wang, Yu and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and BHOWMIK, DEBSINDHU and Rost, Burkhard},
-	title = {ProtTrans: Towards Cracking the Language of Life{\textquoteright}s Code Through Self-Supervised Deep Learning and High Performance Computing},
-	elocation-id = {2020.07.12.199554},
-	year = {2020},
-	doi = {10.1101/2020.07.12.199554},
-	publisher = {Cold Spring Harbor Laboratory},
-	abstract = {Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive language models (Transformer-XL, XLNet) and two auto-encoder models (Bert, Albert) on data from UniRef and BFD containing up to 393 billion amino acids (words) from 2.1 billion protein sequences (22- and 112 times the entire English Wikipedia). The LMs were trained on the Summit supercomputer at Oak Ridge National Laboratory (ORNL), using 936 nodes (total 5616 GPUs) and one TPU Pod (V3-512 or V3-1024). We validated the advantage of up-scaling LMs to larger models supported by bigger data by predicting secondary structure (3-states: Q3=76-84, 8 states: Q8=65-73), sub-cellular localization for 10 cellular compartments (Q10=74) and whether a protein is membrane-bound or water-soluble (Q2=89). Dimensionality reduction revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein shape. This implied learning some of the grammar of the language of life realized in protein sequences. The successful up-scaling of protein LMs through HPC to larger data sets slightly reduced the gap between models trained on evolutionary information and LMs. Availability ProtTrans: \&lt;a href="https://github.com/agemagician/ProtTrans"\&gt;https://github.com/agemagician/ProtTrans\&lt;/a\&gt;Competing Interest StatementThe authors have declared no competing interest.},
-	URL = {https://www.biorxiv.org/content/early/2020/07/21/2020.07.12.199554},
-	eprint = {https://www.biorxiv.org/content/early/2020/07/21/2020.07.12.199554.full.pdf},
-	journal = {bioRxiv}
-}
-```
-
-> Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/)
--- a/model_cards/Rostlab/prot_t5_xl_bfd/README.md
+++ b/model_cards/Rostlab/prot_t5_xl_bfd/README.md
@ -1,125 +0,0 @@
---
-language: protein
-tags:
- protein language model
-datasets:
- BFD
---
-
-# ProtT5-XL-BFD model
-
-Pretrained model on protein sequences using a masked language modeling (MLM) objective. It was introduced in
-[this paper](https://doi.org/10.1101/2020.07.12.199554) and first released in
-[this repository](https://github.com/agemagician/ProtTrans). This model is trained on uppercase amino acids: it only works with capital letter amino acids.
-
-
-## Model description
-
-ProtT5-XL-BFD is based on the `t5-3b` model and was pretrained on a large corpus of protein sequences in a self-supervised fashion.
-This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
-
-One important difference between this T5 model and the original T5 version is the denosing objective.
-The original T5-3B model was pretrained using a span denosing objective, while this model was pre-trained with a Bart-like MLM denosing objective.
-The masking probability is consistent with the original T5 training by randomly masking 15% of the amino acids in the input.
-
-It has been shown that the features extracted from this self-supervised model (LM-embeddings) captured important biophysical properties governing protein shape.
-shape.
-This implied learning some of the grammar of the language of life realized in protein sequences.
-
-## Intended uses & limitations
-
-The model could be used for protein feature extraction or to be fine-tuned on downstream tasks.
-We have noticed in some tasks on can gain more accuracy by fine-tuning the model rather than using it as a feature extractor.
-We have also noticed that for feature extraction, its better to use the feature extracted from the encoder not from the decoder.
-
-### How to use
-
-Here is how to use this model to extract the features of a given protein sequence in PyTorch:
-
-```python
-from transformers import T5Tokenizer, T5Model
-import re
-import torch
-
-tokenizer = T5Tokenizer.from_pretrained('Rostlab/prot_t5_xl_bfd', do_lower_case=False)
-
-model = T5Model.from_pretrained("Rostlab/prot_t5_xl_bfd")
-
-sequences_Example = ["A E T C Z A O","S K T Z P"]
-
-sequences_Example = [re.sub(r"[UZOB]", "X", sequence) for sequence in sequences_Example]
-
-ids = tokenizer.batch_encode_plus(sequences_Example, add_special_tokens=True, padding=True)
-
-input_ids = torch.tensor(ids['input_ids'])
-attention_mask = torch.tensor(ids['attention_mask'])
-
-with torch.no_grad():
-    embedding = model(input_ids=input_ids,attention_mask=attention_mask,decoder_input_ids=None)
-
-# For feature extraction we recommend to use the encoder embedding
-encoder_embedding = embedding[2].cpu().numpy()
-decoder_embedding = embedding[0].cpu().numpy()
-```
-
-## Training data
-
-The ProtT5-XL-BFD model was pretrained on [BFD](https://bfd.mmseqs.com/), a dataset consisting of 2.1 billion protein sequences.
-
-## Training procedure
-
-### Preprocessing
-
-The protein sequences are uppercased and tokenized using a single space and a vocabulary size of 21. The rare amino acids "U,Z,O,B" were mapped to "X".
-The inputs of the model are then of the form:
-
-```
-Protein Sequence [EOS]
-```
-
-The preprocessing step was performed on the fly, by cutting and padding the protein sequences up to 512 tokens.
-
-The details of the masking procedure for each sequence are as follows:
- 15% of the amino acids are masked.
- In 90% of the cases, the masked amino acids are replaced by `[MASK]` token.
- In 10% of the cases, the masked amino acids are replaced by a random amino acid (different) from the one they replace.
-
-### Pretraining
-
-The model was trained on a single TPU Pod V3-1024 for 1.2 million steps in total, using sequence length 512 (batch size 4k).
-It has a total of approximately 3B parameters and was trained using the encoder-decoder architecture.
-The optimizer used is AdaFactor with inverse square root learning rate schedule for pre-training.
-
-
-## Evaluation results
-
-When the model is used for feature etraction, this model achieves the following results:
-
-Test results :
-
-| Task/Dataset | secondary structure (3-states) | secondary structure (8-states)  |  Localization | Membrane  |
-|:-----:|:-----:|:-----:|:-----:|:-----:|
-|   CASP12  | 77 | 66 |    |    |
-|   TS115   | 85 | 74 |    |    | 
-|   CB513   | 84 | 71 |    |    |
-|  DeepLoc  |    |    | 77 | 91 |
-
-### BibTeX entry and citation info
-
-```bibtex
-@article {Elnaggar2020.07.12.199554,
-	author = {Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Wang, Yu and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and BHOWMIK, DEBSINDHU and Rost, Burkhard},
-	title = {ProtTrans: Towards Cracking the Language of Life{\textquoteright}s Code Through Self-Supervised Deep Learning and High Performance Computing},
-	elocation-id = {2020.07.12.199554},
-	year = {2020},
-	doi = {10.1101/2020.07.12.199554},
-	publisher = {Cold Spring Harbor Laboratory},
-	abstract = {Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive language models (Transformer-XL, XLNet) and two auto-encoder models (Bert, Albert) on data from UniRef and BFD containing up to 393 billion amino acids (words) from 2.1 billion protein sequences (22- and 112 times the entire English Wikipedia). The LMs were trained on the Summit supercomputer at Oak Ridge National Laboratory (ORNL), using 936 nodes (total 5616 GPUs) and one TPU Pod (V3-512 or V3-1024). We validated the advantage of up-scaling LMs to larger models supported by bigger data by predicting secondary structure (3-states: Q3=76-84, 8 states: Q8=65-73), sub-cellular localization for 10 cellular compartments (Q10=74) and whether a protein is membrane-bound or water-soluble (Q2=89). Dimensionality reduction revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein shape. This implied learning some of the grammar of the language of life realized in protein sequences. The successful up-scaling of protein LMs through HPC to larger data sets slightly reduced the gap between models trained on evolutionary information and LMs. Availability ProtTrans: \&lt;a href="https://github.com/agemagician/ProtTrans"\&gt;https://github.com/agemagician/ProtTrans\&lt;/a\&gt;Competing Interest StatementThe authors have declared no competing interest.},
-	URL = {https://www.biorxiv.org/content/early/2020/07/21/2020.07.12.199554},
-	eprint = {https://www.biorxiv.org/content/early/2020/07/21/2020.07.12.199554.full.pdf},
-	journal = {bioRxiv}
-}
-```
-
-> Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/)
--- a/model_cards/SZTAKI-HLT/hubert-base-cc/README.md
+++ b/model_cards/SZTAKI-HLT/hubert-base-cc/README.md
@ -1,46 +0,0 @@
---
-language: hu
-license: apache-2.0
-datasets:
- common_crawl
- wikipedia
---
-
-# huBERT base model (cased)
-
-## Model description
-
-Cased BERT model for Hungarian, trained on the (filtered, deduplicated) Hungarian subset of the Common Crawl and a snapshot of the Hungarian Wikipedia.
-
-## Intended uses & limitations
-
-The model can be used as any other (cased) BERT model. It has been tested on the chunking and
-named entity recognition tasks and set a new state-of-the-art on the former.
-
-## Training
-
-Details of the training data and procedure can be found in the PhD thesis linked below. (With the caveat that it only contains preliminary results
-based on the Wikipedia subcorpus. Evaluation of the full model will appear in a future paper.)
-
-## Eval results
-
-When fine-tuned (via `BertForTokenClassification`) on chunking and NER, the model outperforms multilingual BERT, achieves state-of-the-art results on the
-former task and comes within 0.5% F1 to the SotA on the latter. The exact scores are
-
-| NER | Minimal NP | Maximal NP |
-|-----|------------|------------|
-| 97.62% | **97.14%** | **96.97%** |
-
-### BibTeX entry and citation info
-
-The training corpus, parameters and the evaluation methods are discussed in the
-[following PhD thesis](https://hlt.bme.hu/en/publ/nemeskey_2020):
-
-```bibtex
-@PhDThesis{ Nemeskey:2020,
-  author = {Nemeskey, Dávid Márk},
-  title  = {Natural Language Processing Methods for Language Modeling},
-  year   = {2020},
-  school = {E\"otv\"os Lor\'and University}
-}
-```
--- a/model_cards/SparkBeyond/roberta-large-sts-b/README.md
+++ b/model_cards/SparkBeyond/roberta-large-sts-b/README.md
@ -1,50 +0,0 @@
-
-
-# Roberta Large STS-B
-
-This model is a fine tuned RoBERTA model over STS-B.
-It was trained with these params:
-!python /content/transformers/examples/text-classification/run_glue.py \
-    --model_type roberta \
-    --model_name_or_path roberta-large \
-    --task_name STS-B \
-    --do_train \
-    --do_eval \
-    --do_lower_case \
-    --data_dir /content/glue_data/STS-B/ \
-    --max_seq_length 128 \
-    --per_gpu_eval_batch_size=8   \
-    --per_gpu_train_batch_size=8   \
-    --learning_rate 2e-5 \
-    --num_train_epochs 3.0 \
-    --output_dir /content/roberta-sts-b
-
-
-## How to run
-
-```python
-
-
-
-import toolz
-import torch
-batch_size = 6
-
-def roberta_similarity_batches(to_predict):
-  batches = toolz.partition(batch_size, to_predict)
-  similarity_scores = []  
-  for batch in batches: 
-    sentences = [(sentence_similarity["sent1"], sentence_similarity["sent2"])  for sentence_similarity in batch]   
-    batch_scores = similarity_roberta(model, tokenizer,sentences)
-    similarity_scores = similarity_scores + batch_scores[0].cpu().squeeze(axis=1).tolist()
-  return similarity_scores
-
-def similarity_roberta(model, tokenizer, sent_pairs):
-  batch_token = tokenizer(sent_pairs, padding='max_length', truncation=True, max_length=500)
-  res = model(torch.tensor(batch_token['input_ids']).cuda(), attention_mask=torch.tensor(batch_token["attention_mask"]).cuda())  
-  return res
-
-similarity_roberta(model, tokenizer, [('NEW YORK--(BUSINESS WIRE)--Rosen Law Firm, a global investor rights law firm, announces it is investigating potential securities claims on behalf of shareholders of Vale S.A. ( VALE ) resulting from allegations that Vale may have issued materially misleading business information to the investing public',
-                                       'EQUITY ALERT: Rosen Law Firm Announces Investigation of Securities Claims Against Vale S.A. – VALE')])
-                                       
-```                                 
--- a/model_cards/T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb/README.md
+++ b/model_cards/T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb/README.md
@ -1,9 +0,0 @@
---
-language: de
-license: mit
---
-
-# bert-german-dbmdz-uncased-sentence-stsb
-**This model is outdated!**
-
-The new [T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) model is better for German language. It is also the current best model for English language and works cross-lingually. Please consider using that model.
--- a/model_cards/T-Systems-onsite/cross-en-de-roberta-sentence-transformer/README.md
+++ b/model_cards/T-Systems-onsite/cross-en-de-roberta-sentence-transformer/README.md
@ -1,85 +0,0 @@
---
-language: 
- de
- en
-license: mit
-tags:
- sentence_embedding
- search
- pytorch 
- xlm-roberta 
- roberta
- xlm-r-distilroberta-base-paraphrase-v1
- paraphrase
-datasets:
- STSbenchmark
-metrics:
- Spearman’s rank correlation
- cosine similarity
---
-
-# Cross English & German RoBERTa for Sentence Embeddings
-This model is intended to [compute sentence (text) embeddings](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html) for English and German text. These embeddings can then be compared with [cosine-similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to find sentences with a similar semantic meaning. For example this can be useful for [semantic textual similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html), [semantic search](https://www.sbert.net/docs/usage/semantic_search.html), or [paraphrase mining](https://www.sbert.net/docs/usage/paraphrase_mining.html). To do this you have to use the [Sentence Transformers Python framework](https://github.com/UKPLab/sentence-transformers).
-
-The speciality of this model is that it also works cross-lingually. Regardless of the language, the sentences are translated into very similar vectors according to their semantics. This means that you can, for example, enter a search in German and find results according to the semantics in German and also in English. Using a xlm model and _multilingual finetuning with language-crossing_ we reach performance that even exceeds the best current dedicated English large model (see Evaluation section below).
-
-> Sentence-BERT (SBERT) is a  modification  of  the  pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.
-
-Source: [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
-
-This model is fine-tuned from [Philip May](https://eniak.de/) and open-sourced by [T-Systems-onsite](https://www.t-systems-onsite.de/). Special thanks to [Nils Reimers](https://www.nils-reimers.de/) for your awesome open-source work, the Sentence Transformers, the models and your help on GitHub.
-
-## How to use
-**The usage description above - provided by Hugging Face - is wrong for sentence embeddings! Please use this:**
-
-To use this model install the `sentence-transformers` package (see here: <https://github.com/UKPLab/sentence-transformers>).
-
-```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer('T-Systems-onsite/cross-en-de-roberta-sentence-transformer')
-```
-
-For details of usage and examples see here:
- [Computing Sentence Embeddings](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html)
- [Semantic Textual Similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html)
- [Paraphrase Mining](https://www.sbert.net/docs/usage/paraphrase_mining.html)
- [Semantic Search](https://www.sbert.net/docs/usage/semantic_search.html)
- [Cross-Encoders](https://www.sbert.net/docs/usage/cross-encoder.html)
- [Examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples)
-
-## Training
-The base model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base). This model has been further trained by [Nils Reimers](https://www.nils-reimers.de/) on a large scale paraphrase dataset for 50+ languages. [Nils Reimers](https://www.nils-reimers.de/) about this [on GitHub](https://github.com/UKPLab/sentence-transformers/issues/509#issuecomment-712243280):
-
->A paper is upcoming for the paraphrase models.
->
->These models were trained on various datasets with Millions of examples for paraphrases, mainly derived from Wikipedia edit logs, paraphrases mined from Wikipedia and SimpleWiki, paraphrases from news reports, AllNLI-entailment pairs with in-batch-negative loss etc.
->
->In internal tests, they perform much better than the NLI+STSb models as they have see more and broader type of training data. NLI+STSb has the issue that they are rather narrow in their domain and do not contain any domain specific words / sentences (like from chemistry, computer science, math etc.). The paraphrase models has seen plenty of sentences from various domains.
->
->More details with the setup, all the datasets, and a wider evaluation will follow soon.
-
-The resulting model called `xlm-r-distilroberta-base-paraphrase-v1` has been released here: <https://github.com/UKPLab/sentence-transformers/releases/tag/v0.3.8>
-
-Building on this cross language model we fine-tuned it for English and German language on the [STSbenchmark](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) dataset. For German language we used the dataset of our [German STSbenchmark dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark) which has been translated with [deepl.com](https://www.deepl.com/translator). Additionally to the German and English training samples we generated samples of English and German crossed. We call this _multilingual finetuning with language-crossing_. It doubled the traing-datasize and tests show that it further improves performance.
-
-We did an automatic hyperparameter search for 33 trials with [Optuna](https://github.com/optuna/optuna). Using 10-fold crossvalidation on the deepl.com test and dev dataset we found the following best hyperparameter:
- batch_size = 8
- num_epochs = 2
- lr = 1.026343323298136e-05,
- eps = 4.462251033010287e-06
- weight_decay = 0.04794438776350409
- warmup_steps_proportion = 0.1609010732760181
-
-The final model was trained with these hyperparameters on the combination of the train and dev datasets from English, German and the crossings of them. The testset was left for testing.
-
-# Evaluation
-The evaluation has been done on English, German and both languages crossed with the STSbenchmark test data. The evaluation-code is available on [Colab](https://colab.research.google.com/drive/1gtGnKq_dYU_sDYqMohTYVMVpxMJjyH0M?usp=sharing). As the metric for evaluation we use the Spearman’s rank correlation between the  cosine-similarity of the sentence embeddings and STSbenchmark labels.
-
-| Model Name                                                    | Spearman<br/>German | Spearman<br/>English | Spearman<br/>EN-DE & DE-EN<br/>(cross) |
-|---------------------------------------------------------------|-------------------|--------------------|------------------|
-| xlm-r-distilroberta-base-paraphrase-v1                        | 0.8079            | 0.8350             | 0.7983           |
-| [xlm-r-100langs-bert-base-nli-stsb-mean-tokens](https://huggingface.co/sentence-transformers/xlm-r-100langs-bert-base-nli-stsb-mean-tokens)                 | 0.7877            | 0.8465             | 0.7908           |
-| xlm-r-bert-base-nli-stsb-mean-tokens                          | 0.7877            | 0.8465             | 0.7908           |
-| [roberta-large-nli-stsb-mean-tokens](https://huggingface.co/sentence-transformers/roberta-large-nli-stsb-mean-tokens)                            | 0.6371            | 0.8639             | 0.4109           |
-| [T-Systems-onsite/<br/>german-roberta-sentence-transformer-v2](https://huggingface.co/T-Systems-onsite/german-roberta-sentence-transformer-v2)       | 0.8529            | 0.8634             | 0.8415           |
-| **T-Systems-onsite/<br/>cross-en-de-roberta-sentence-transformer** | **0.8550**        |  **0.8660**        | **0.8525**       |
--- a/model_cards/T-Systems-onsite/german-roberta-sentence-transformer-v2/README.md
+++ b/model_cards/T-Systems-onsite/german-roberta-sentence-transformer-v2/README.md
@ -1,82 +0,0 @@
---
-language: de
-license: mit
-tags:
- sentence_embedding
- search
- pytorch 
- xlm-roberta 
- roberta
- xlm-r-distilroberta-base-paraphrase-v1
- paraphrase
-datasets:
- STSbenchmark
-metrics:
- Spearman’s rank correlation
- cosine similarity
---
-
-# German RoBERTa for Sentence Embeddings V2
-**The new [T-Systems-onsite/cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer) model is slightly better for German language. It is also the current best model for English language and works cross-lingually. Please consider using that model.**
-
-This model is intended to [compute sentence (text embeddings)](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html) for German text. These embeddings can then be compared with [cosine-similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to find sentences with a similar semantic meaning. For example this can be useful for [semantic textual similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html), [semantic search](https://www.sbert.net/docs/usage/semantic_search.html), or [paraphrase mining](https://www.sbert.net/docs/usage/paraphrase_mining.html). To do this you have to use the [Sentence Transformers Python framework](https://github.com/UKPLab/sentence-transformers).
-
-> Sentence-BERT (SBERT) is a  modification  of  the  pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.
-
-Source: [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
-
-This model is fine-tuned from [Philip May](https://eniak.de/) and open-sourced by [T-Systems-onsite](https://www.t-systems-onsite.de/). Special thanks to [Nils Reimers](https://www.nils-reimers.de/) for your awesome open-source work, the Sentence Transformers, the models and your help on GitHub.
-
-## How to use
-**The usage description above - provided by Hugging Face - is wrong for sentence embeddings! Please use this:**
-
-To use this model install the `sentence-transformers` package (see here: <https://github.com/UKPLab/sentence-transformers>).
-
-```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer('T-Systems-onsite/german-roberta-sentence-transformer-v2')
-```
-
-For details of usage and examples see here:
- [Computing Sentence Embeddings](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html)
- [Semantic Textual Similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html)
- [Paraphrase Mining](https://www.sbert.net/docs/usage/paraphrase_mining.html)
- [Semantic Search](https://www.sbert.net/docs/usage/semantic_search.html)
- [Cross-Encoders](https://www.sbert.net/docs/usage/cross-encoder.html)
- [Examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples)
-
-## Training
-The base model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base). This model has been further trained by [Nils Reimers](https://www.nils-reimers.de/) on a large scale paraphrase dataset for 50+ languages. [Nils Reimers](https://www.nils-reimers.de/) about this [on GitHub](https://github.com/UKPLab/sentence-transformers/issues/509#issuecomment-712243280):
-
->A paper is upcoming for the paraphrase models.
->
->These models were trained on various datasets with Millions of examples for paraphrases, mainly derived from Wikipedia edit logs, paraphrases mined from Wikipedia and SimpleWiki, paraphrases from news reports, AllNLI-entailment pairs with in-batch-negative loss etc.
->
->In internal tests, they perform much better than the NLI+STSb models as they have see more and broader type of training data. NLI+STSb has the issue that they are rather narrow in their domain and do not contain any domain specific words / sentences (like from chemistry, computer science, math etc.). The paraphrase models has seen plenty of sentences from various domains.
->
->More details with the setup, all the datasets, and a wider evaluation will follow soon.
-
-The resulting model called `xlm-r-distilroberta-base-paraphrase-v1` has been released here: <https://github.com/UKPLab/sentence-transformers/releases/tag/v0.3.8>
-
-Building on this cross language model we fine-tuned it for German language on the [deepl.com](https://www.deepl.com/translator) dataset of our [German STSbenchmark dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark).
-
-We did an automatic hyperparameter search for 102 trials with [Optuna](https://github.com/optuna/optuna). Using 10-fold crossvalidation on the deepl.com test and dev dataset we found the following best hyperparameters:
- batch_size = 15
- num_epochs = 4
- lr = 2.2995320905210864e-05
- eps = 1.8979875906303792e-06
- weight_decay = 0.003314045812507563
- warmup_steps_proportion = 0.46141685205829014
-
-The final model was trained with these hyperparameters on the combination of `sts_de_train.csv` and `sts_de_dev.csv`. The `sts_de_test.csv` was left for testing.
-
-# Evaluation
-The evaluation has been done on the test set of our [German STSbenchmark dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark). The code is available on [Colab](https://colab.research.google.com/drive/1aCWOqDQx953kEnQ5k4Qn7uiixokocOHv?usp=sharing). As the metric for evaluation we use the Spearman’s rank correlation between the  cosine-similarity of the sentence embeddings and STSbenchmark labels.
-
-| Model Name                           | Spearman rank correlation<br/>(German)           |
-|--------------------------------------|-------------------------------------|
-| xlm-r-distilroberta-base-paraphrase-v1                        | 0.8079     |
-| xlm-r-100langs-bert-base-nli-stsb-mean-tokens                 | 0.8194     |
-| xlm-r-bert-base-nli-stsb-mean-tokens                          | 0.8194     |
-| **T-Systems-onsite/<br/>german-roberta-sentence-transformer-v2**   | **0.8529** |
-| **[T-Systems-onsite/<br/>cross-en-de-roberta-sentence-transformer](https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer)** | **0.8550** |
--- a/model_cards/Tereveni-AI/gpt2-124M-uk-fiction/README.md
+++ b/model_cards/Tereveni-AI/gpt2-124M-uk-fiction/README.md
@ -1,39 +0,0 @@
---
-language: uk
---
-
-Note: **default code snippet above won't work** because we are using `AlbertTokenizer` with `GPT2LMHeadModel`, see [issue](https://github.com/huggingface/transformers/issues/4285).
-
-## GPT2 124M Trained on Ukranian Fiction
-
-### Training details
-
-Model was trained on corpus of 4040 fiction books, 2.77 GiB in total.
-Evaluation on [brown-uk](https://github.com/brown-uk/corpus) gives perplexity of 50.16. 
-
-### Example usage:
-```python
-from transformers import AlbertTokenizer, GPT2LMHeadModel
-
-tokenizer = AlbertTokenizer.from_pretrained("Tereveni-AI/gpt2-124M-uk-fiction")
-model = GPT2LMHeadModel.from_pretrained("Tereveni-AI/gpt2-124M-uk-fiction")
-
-input_ids = tokenizer.encode("Но зла Юнона, суча дочка,", add_special_tokens=False, return_tensors='pt')
-
-outputs = model.generate(
-    input_ids,
-    do_sample=True,
-    num_return_sequences=3,
-    max_length=50
-)
-
-for i, out in enumerate(outputs):
-    print("{}: {}".format(i, tokenizer.decode(out)))
-```
-
-Prints something like this:
-```bash
-0: Но зла Юнона, суча дочка, яка затьмарила всі її таємниці: І хто з'їсть її душу, той помре». І, не дочекавшись гніву богів, посунула в пітьму, щоб не бачити перед собою. Але, за
-1: Но зла Юнона, суча дочка, і довела мене до божевілля. Але він не знав нічого. Після того як я його побачив, мені стало зле. Я втратив рівновагу. Але в мене не було часу на роздуми. Я вже втратив надію
-2: Но зла Юнона, суча дочка, не нарікала нам! — раптом вигукнула Юнона. — Це ти, старий йолопе! — мовила вона, не перестаючи сміятись. — Хіба ти не знаєш, що мені подобається ходити з тобою?
-```
--- a/model_cards/TurkuNLP/bert-base-finnish-cased-v1/README.md
+++ b/model_cards/TurkuNLP/bert-base-finnish-cased-v1/README.md
@ -1,84 +0,0 @@
---
-language: fi
---
-
-## Quickstart
-
-**Release 1.0** (November 25, 2019)
-
-Download the models here:
-
-* Cased Finnish BERT Base: [bert-base-finnish-cased-v1.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-cased-v1.zip)
-* Uncased Finnish BERT Base: [bert-base-finnish-uncased-v1.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-uncased-v1.zip)
-
-We generally recommend the use of the cased model.
-
-Paper presenting Finnish BERT: [arXiv:1912.07076](https://arxiv.org/abs/1912.07076)
-
-## What's this?
-
-A version of Google's [BERT](https://github.com/google-research/bert) deep transfer learning model for Finnish. The model can be fine-tuned to achieve state-of-the-art results for various Finnish natural language processing tasks.
-
-FinBERT features a custom 50,000 wordpiece vocabulary that has much better coverage of Finnish words than e.g. the previously released [multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md) models from Google:
-
-| Vocabulary | Example |
-|------------|---------|
-| FinBERT    | Suomessa vaihtuu kesän aikana sekä pääministeri että valtiovarain ##ministeri . |
-| Multilingual BERT | Suomessa vai ##htuu kes ##än aikana sekä p ##ää ##minister ##i että valt ##io ##vara ##in ##minister ##i . |
-
-FinBERT has been pre-trained for 1 million steps on over 3 billion tokens (24B characters) of Finnish text drawn from news, online discussion, and internet crawls. By contrast, Multilingual BERT was trained on Wikipedia texts, where the Finnish Wikipedia text is approximately 3% of the amount used to train FinBERT.
-
-These features allow FinBERT to outperform not only Multilingual BERT but also all previously proposed models when fine-tuned for Finnish natural language processing tasks.
-
-## Results
-
-### Document classification
-
-![learning curves for Yle and Ylilauta document classification](https://raw.githubusercontent.com/TurkuNLP/FinBERT/master/img/yle-ylilauta-curves.png)
-
-FinBERT outperforms multilingual BERT (M-BERT) on document classification over a range of training set sizes on the Yle news (left) and Ylilauta online discussion (right) corpora. (Baseline classification performance with [FastText](https://fasttext.cc/) included for reference.)
-
-[[code](https://github.com/spyysalo/finbert-text-classification)][[Yle data](https://github.com/spyysalo/yle-corpus)] [[Ylilauta data](https://github.com/spyysalo/ylilauta-corpus)]
-
-### Named Entity Recognition
-
-Evaluation on FiNER corpus ([Ruokolainen et al 2019](https://arxiv.org/abs/1908.04212))
-
-| Model          | Accuracy |
-|--------------------|----------|
-| **FinBERT**  | **92.40%** |
-| Multilingual BERT | 90.29% |
-| [FiNER-tagger](https://github.com/Traubert/FiNer-rules) (rule-based) | 86.82%      |
-
-(FiNER tagger results from [Ruokolainen et al. 2019](https://arxiv.org/pdf/1908.04212.pdf))
-
-[[code](https://github.com/jouniluoma/keras-bert-ner)][[data](https://github.com/mpsilfve/finer-data)]
-
-### Part of speech tagging
-
-Evaluation on three Finnish corpora annotated with [Universal Dependencies](https://universaldependencies.org/) part-of-speech tags: the Turku Dependency Treebank (TDT), FinnTreeBank (FTB), and Parallel UD treebank (PUD)
-
-| Model             |     TDT     |     FTB     |     PUD     |
-|-------------------|-------------|-------------|-------------|
-| **FinBERT**       | **98.23%**  | **98.39%**  | **98.08%**  |
-| Multilingual BERT |   96.97%    |   95.87%    |   97.58%    |
-
-[[code](https://github.com/spyysalo/bert-pos)][[data](http://hdl.handle.net/11234/1-2837)]
-
-## Use with PyTorch
-
-If you want to use the model with the huggingface/transformers library, follow the steps in [huggingface_transformers.md](https://github.com/TurkuNLP/FinBERT/blob/master/huggingface_transformers.md)
-
-## Previous releases
-
-### Release 0.2
-
-**October 24, 2019** Beta version of the BERT base uncased model trained from scratch on a corpus of Finnish news, online discussions, and crawled data. 
-
-Download the model here: [bert-base-finnish-uncased.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-uncased.zip)
-
-### Release 0.1
-
-**September 30, 2019** We release a beta version of the BERT base cased model trained from scratch on a corpus of Finnish news, online discussions, and crawled data. 
-
-Download the model here: [bert-base-finnish-cased.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-cased.zip)
--- a/model_cards/TurkuNLP/bert-base-finnish-uncased-v1/README.md
+++ b/model_cards/TurkuNLP/bert-base-finnish-uncased-v1/README.md
@ -1,84 +0,0 @@
---
-language: fi
---
-
-## Quickstart
-
-**Release 1.0** (November 25, 2019)
-
-Download the models here:
-
-* Cased Finnish BERT Base: [bert-base-finnish-cased-v1.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-cased-v1.zip)
-* Uncased Finnish BERT Base: [bert-base-finnish-uncased-v1.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-uncased-v1.zip)
-
-We generally recommend the use of the cased model.
-
-Paper presenting Finnish BERT: [arXiv:1912.07076](https://arxiv.org/abs/1912.07076)
-
-## What's this?
-
-A version of Google's [BERT](https://github.com/google-research/bert) deep transfer learning model for Finnish. The model can be fine-tuned to achieve state-of-the-art results for various Finnish natural language processing tasks.
-
-FinBERT features a custom 50,000 wordpiece vocabulary that has much better coverage of Finnish words than e.g. the previously released [multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md) models from Google:
-
-| Vocabulary | Example |
-|------------|---------|
-| FinBERT    | Suomessa vaihtuu kesän aikana sekä pääministeri että valtiovarain ##ministeri . |
-| Multilingual BERT | Suomessa vai ##htuu kes ##än aikana sekä p ##ää ##minister ##i että valt ##io ##vara ##in ##minister ##i . |
-
-FinBERT has been pre-trained for 1 million steps on over 3 billion tokens (24B characters) of Finnish text drawn from news, online discussion, and internet crawls. By contrast, Multilingual BERT was trained on Wikipedia texts, where the Finnish Wikipedia text is approximately 3% of the amount used to train FinBERT.
-
-These features allow FinBERT to outperform not only Multilingual BERT but also all previously proposed models when fine-tuned for Finnish natural language processing tasks.
-
-## Results
-
-### Document classification
-
-![learning curves for Yle and Ylilauta document classification](https://raw.githubusercontent.com/TurkuNLP/FinBERT/master/img/yle-ylilauta-curves.png)
-
-FinBERT outperforms multilingual BERT (M-BERT) on document classification over a range of training set sizes on the Yle news (left) and Ylilauta online discussion (right) corpora. (Baseline classification performance with [FastText](https://fasttext.cc/) included for reference.)
-
-[[code](https://github.com/spyysalo/finbert-text-classification)][[Yle data](https://github.com/spyysalo/yle-corpus)] [[Ylilauta data](https://github.com/spyysalo/ylilauta-corpus)]
-
-### Named Entity Recognition
-
-Evaluation on FiNER corpus ([Ruokolainen et al 2019](https://arxiv.org/abs/1908.04212))
-
-| Model          | Accuracy |
-|--------------------|----------|
-| **FinBERT**  | **92.40%** |
-| Multilingual BERT | 90.29% |
-| [FiNER-tagger](https://github.com/Traubert/FiNer-rules) (rule-based) | 86.82%      |
-
-(FiNER tagger results from [Ruokolainen et al. 2019](https://arxiv.org/pdf/1908.04212.pdf))
-
-[[code](https://github.com/jouniluoma/keras-bert-ner)][[data](https://github.com/mpsilfve/finer-data)]
-
-### Part of speech tagging
-
-Evaluation on three Finnish corpora annotated with [Universal Dependencies](https://universaldependencies.org/) part-of-speech tags: the Turku Dependency Treebank (TDT), FinnTreeBank (FTB), and Parallel UD treebank (PUD)
-
-| Model             |     TDT     |     FTB     |     PUD     |
-|-------------------|-------------|-------------|-------------|
-| **FinBERT**       | **98.23%**  | **98.39%**  | **98.08%**  |
-| Multilingual BERT |   96.97%    |   95.87%    |   97.58%    |
-
-[[code](https://github.com/spyysalo/bert-pos)][[data](http://hdl.handle.net/11234/1-2837)]
-
-## Use with PyTorch
-
-If you want to use the model with the huggingface/transformers library, follow the steps in [huggingface_transformers.md](https://github.com/TurkuNLP/FinBERT/blob/master/huggingface_transformers.md)
-
-## Previous releases
-
-### Release 0.2
-
-**October 24, 2019** Beta version of the BERT base uncased model trained from scratch on a corpus of Finnish news, online discussions, and crawled data. 
-
-Download the model here: [bert-base-finnish-uncased.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-uncased.zip)
-
-### Release 0.1
-
-**September 30, 2019** We release a beta version of the BERT base cased model trained from scratch on a corpus of Finnish news, online discussions, and crawled data. 
-
-Download the model here: [bert-base-finnish-cased.zip](http://dl.turkunlp.org/finbert/bert-base-finnish-cased.zip)
--- a/model_cards/TypicaAI/magbert-ner/README.md
+++ b/model_cards/TypicaAI/magbert-ner/README.md
@ -1,59 +0,0 @@
---
-language: fr
-widget:
- text: "Je m'appelle Hicham et je vis a Fès"
---
-
-# MagBERT-NER: a state-of-the-art NER model for Moroccan French language (Maghreb)
-
-## Introduction
-
-[MagBERT-NER] is a state-of-the-art NER model for Moroccan French language (Maghreb). The MagBERT-NER model was fine-tuned for NER Task based the language model for French Camembert (based on the RoBERTa architecture).
-
-For further information or requests, please visite our website at [typica.ai Website](https://typica.ai/) or send us an email at contactus@typica.ai
-
-## How to use MagBERT-NER with HuggingFace
-
-##### Load MagBERT-NER and its sub-word tokenizer :
-
-```python
-from transformers import AutoTokenizer, AutoModelForTokenClassification
-
-tokenizer = AutoTokenizer.from_pretrained("TypicaAI/magbert-ner")
-model = AutoModelForTokenClassification.from_pretrained("TypicaAI/magbert-ner")
-
-
-##### Process text sample (from wikipedia about the current Prime Minister of Morocco) Using NER pipeline  
-
-from transformers import pipeline
-
-nlp = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
-nlp("Saad Dine El Otmani, né le 16 janvier 1956 à Inezgane, est un homme d'État marocain, chef du gouvernement du Maroc depuis le 5 avril 2017")
-
-
-#[{'entity_group': 'I-PERSON',
-#  'score': 0.8941445276141167,
-#  'word': 'Saad Dine El Otmani'},
-# {'entity_group': 'B-DATE',
-#  'score': 0.5967703461647034,
-#  'word': '16 janvier 1956'},
-# {'entity_group': 'B-GPE', 'score': 0.7160899192094803, 'word': 'Inezgane'},
-# {'entity_group': 'B-NORP', 'score': 0.7971733212471008, 'word': 'marocain'},
-# {'entity_group': 'B-GPE', 'score': 0.8921478390693665, 'word': 'Maroc'},
-# {'entity_group': 'B-DATE',
-#  'score': 0.5760444005330404,
-#  'word': '5 avril 2017'}]
-
-```
-
-
-## Authors 
-
-MagBert-NER Model was trained by Hicham Assoudi, Ph.D. 
-For any questions, comments you can contact me at assoudi@typica.ai
-
-
-## Citation
-
-If you use our work, please cite:
-Hicham Assoudi, Ph.D., MagBERT-NER: a state-of-the-art NER model for Moroccan French language (Maghreb), (2020)
--- a/model_cards/Vamsi/T5_Paraphrase_Paws/README.md
+++ b/model_cards/Vamsi/T5_Paraphrase_Paws/README.md
@ -1,51 +0,0 @@
---
-language: "en"
-tags:
- paraphrase-generation
- text-generation
- Conditional Generation
-inference: false
---
-
-# Paraphrase-Generation
-
-## Model description
-
-T5 Model for generating paraphrases of english sentences. Trained on the [Google PAWS](https://github.com/google-research-datasets/paws) dataset.
-
-## How to use
-
-PyTorch and TF models available
-
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-
-tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws")  
-model = AutoModelForSeq2SeqLM.from_pretrained("Vamsi/T5_Paraphrase_Paws")
-
-sentence = "This is something which i cannot understand at all"
-
-text =  "paraphrase: " + sentence + " </s>"
-
-encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")
-input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")
-
-
-outputs = model.generate(
-    input_ids=input_ids, attention_mask=attention_masks,
-    max_length=256,
-    do_sample=True,
-    top_k=120,
-    top_p=0.95,
-    early_stopping=True,
-    num_return_sequences=5
-)
-
-for output in outputs:
-    line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
-    print(line)
-
-
-```
-
-For more reference on training your own T5 model or using this model, do check out [Paraphrase Generation](https://github.com/Vamsi995/Paraphrase-Generator).
--- a/model_cards/VictorSanh/roberta-base-finetuned-yelp-polarity/README.md
+++ b/model_cards/VictorSanh/roberta-base-finetuned-yelp-polarity/README.md
@ -1,25 +0,0 @@
---
-language: en
-datasets:
- yelp_polarity
---
-
-# RoBERTa-base-finetuned-yelp-polarity
-
-This is a [RoBERTa-base](https://huggingface.co/roberta-base) checkpoint fine-tuned on binary sentiment classifcation from [Yelp polarity](https://huggingface.co/nlp/viewer/?dataset=yelp_polarity).
-It gets **98.08%** accuracy on the test set.
-
-## Hyper-parameters
-
-We used the following hyper-parameters to train the model on one GPU:
-```python
-num_train_epochs            = 2.0
-learning_rate               = 1e-05
-weight_decay                = 0.0
-adam_epsilon                = 1e-08
-max_grad_norm               = 1.0
-per_device_train_batch_size = 32
-gradient_accumulation_steps = 1
-warmup_steps                = 3500
-seed                        = 42
-```
--- a/model_cards/ViktorAlm/electra-base-norwegian-uncased-discriminator/README.md
+++ b/model_cards/ViktorAlm/electra-base-norwegian-uncased-discriminator/README.md
@ -1,25 +0,0 @@
---
-language: no
-thumbnail: https://i.imgur.com/QqSEC5I.png
---
-
-# Norwegian Electra
-![Image of norwegian electra](https://i.imgur.com/QqSEC5I.png)
-
-Trained on Oscar + wikipedia + opensubtitles + some other data I had with the awesome power of TPUs(V3-8)
-
-Use with caution. I have no downstream tasks in Norwegian to test on so I have no idea of its performance yet.
-# Model
-## Electra: Pre-training Text Encoders as Discriminators Rather Than Generators
-Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning
- https://openreview.net/pdf?id=r1xMH1BtvB
- https://github.com/google-research/electra
-# Acknowledgments
-### TensorFlow Research Cloud
-Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️
- https://www.tensorflow.org/tfrc
-#### OSCAR corpus
- https://oscar-corpus.com/
-#### OPUS
- http://opus.nlpl.eu/
- http://www.opensubtitles.org/
--- a/model_cards/a-ware/bart-squadv2/README.md
+++ b/model_cards/a-ware/bart-squadv2/README.md
@ -1,76 +0,0 @@
---
-datasets:
- squad_v2
---
-
-# BART-LARGE finetuned on SQuADv2
-
-This is bart-large model finetuned on SQuADv2 dataset for question answering task
-
-## Model details
-BART was propsed in the [paper](https://arxiv.org/abs/1910.13461) **BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension**.
-BART is a seq2seq model intended for both NLG and NLU tasks. 
-
-To use BART for question answering tasks, we feed the complete document into the encoder and decoder, and use the top
-hidden state of the decoder as a representation for each
-word. This representation is used to classify the token. As given in the paper bart-large achives comparable to ROBERTa on SQuAD.
-Another notable thing about BART is that it can handle sequences with upto 1024 tokens.
-
-| Param               | #Value |
-|---------------------|--------|
-| encoder layers      | 12     |
-| decoder layers      | 12     |
-| hidden size         | 4096   |
-| num attetion heads  | 16     |
-| on disk size        | 1.63GB |
-
-
-## Model training
-This model was trained with following parameters using simpletransformers wrapper:
-```
-train_args = {
-    'learning_rate': 1e-5,
-    'max_seq_length': 512,
-    'doc_stride': 512,
-    'overwrite_output_dir': True,
-    'reprocess_input_data': False,
-    'train_batch_size': 8,
-    'num_train_epochs': 2,
-    'gradient_accumulation_steps': 2,
-    'no_cache': True,
-    'use_cached_eval_features': False,
-    'save_model_every_epoch': False,
-    'output_dir': "bart-squadv2",
-    'eval_batch_size': 32,
-    'fp16_opt_level': 'O2',
-    }
-```
-
-[You can even train your own model using this colab notebook](https://colab.research.google.com/drive/1I5cK1M_0dLaf5xoewh6swcm5nAInfwHy?usp=sharing)
-
-## Results
-```{"correct": 6832, "similar": 4409, "incorrect": 632, "eval_loss": -14.950117511952177}```
-
-## Model in Action  🚀
-```python3
-from transformers import BartTokenizer, BartForQuestionAnswering
-import torch
-
-tokenizer = BartTokenizer.from_pretrained('a-ware/bart-squadv2')
-model = BartForQuestionAnswering.from_pretrained('a-ware/bart-squadv2')
-
-question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
-encoding = tokenizer(question, text, return_tensors='pt')
-input_ids = encoding['input_ids']
-attention_mask = encoding['attention_mask']
-
-start_scores, end_scores = model(input_ids, attention_mask=attention_mask, output_attentions=False)[:2]
-
-all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
-answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])
-answer = tokenizer.convert_tokens_to_ids(answer.split())
-answer = tokenizer.decode(answer)
-#answer => 'a nice puppet' 
-```
-
-> Created with ❤️ by A-ware UG [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/aware-ai)
--- a/model_cards/a-ware/roberta-large-squad-classification/README.md
+++ b/model_cards/a-ware/roberta-large-squad-classification/README.md
@ -1,48 +0,0 @@
---
-datasets:
- squad_v2
---
-
-# Roberta-LARGE finetuned on SQuADv2
-
-This is roberta-large model finetuned on SQuADv2 dataset for question answering answerability classification
-
-## Model details
-This model is simply an Sequenceclassification model with two inputs (context and question) in a list.
-The result is either [1] for answerable or [0] if it is not answerable.
-It was trained over 4 epochs on squadv2 dataset and can be used to filter out which context is good to give into the QA model to avoid bad answers.
-
-## Model training
-This model was trained with following parameters using simpletransformers wrapper:
-```
-train_args = {
-    'learning_rate': 1e-5,
-    'max_seq_length': 512,
-    'overwrite_output_dir': True,
-    'reprocess_input_data': False,
-    'train_batch_size': 4,
-    'num_train_epochs': 4,
-    'gradient_accumulation_steps': 2,
-    'no_cache': True,
-    'use_cached_eval_features': False,
-    'save_model_every_epoch': False,
-    'output_dir': "bart-squadv2",
-    'eval_batch_size': 8,
-    'fp16_opt_level': 'O2',
-    }
-```
-
-## Results
-```{"accuracy": 90.48%}```
-## Model in Action  🚀
-```python3
-from simpletransformers.classification import ClassificationModel
-
-model = ClassificationModel('roberta', 'a-ware/roberta-large-squadv2', num_labels=2, args=train_args)
-
-predictions, raw_outputs = model.predict([["my dog is an year old. he loves to go into the rain", "how old is my dog ?"]])
-print(predictions)
-==> [1]
-```
-
-> Created with ❤️ by A-ware UG [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/aware-ai)
--- a/model_cards/a-ware/xlmroberta-squadv2/README.md
+++ b/model_cards/a-ware/xlmroberta-squadv2/README.md
@ -1,59 +0,0 @@
---
-datasets:
- squad_v2
---
-
-# XLM-ROBERTA-LARGE finetuned on SQuADv2
-
-This is xlm-roberta-large model finetuned on SQuADv2 dataset for question answering task
-
-## Model details
-XLM-Roberta was propsed in the [paper](https://arxiv.org/pdf/1911.02116.pdf) **XLM-R: State-of-the-art cross-lingual understanding through self-supervision
-
-## Model training
-This model was trained with following parameters using simpletransformers wrapper:
-```
-train_args = {
-    'learning_rate': 1e-5,
-    'max_seq_length': 512,
-    'doc_stride': 512,
-    'overwrite_output_dir': True,
-    'reprocess_input_data': False,
-    'train_batch_size': 8,
-    'num_train_epochs': 2,
-    'gradient_accumulation_steps': 2,
-    'no_cache': True,
-    'use_cached_eval_features': False,
-    'save_model_every_epoch': False,
-    'output_dir': "bart-squadv2",
-    'eval_batch_size': 32,
-    'fp16_opt_level': 'O2',
-    }
-```
-
-## Results
-```{"correct": 6961, "similar": 4359, "incorrect": 553, "eval_loss": -12.177856394381962}```
-
-## Model in Action  🚀
-```python3
-from transformers import XLMRobertaTokenizer, XLMRobertaForQuestionAnswering
-import torch
-
-tokenizer = XLMRobertaTokenizer.from_pretrained('a-ware/xlmroberta-squadv2')
-model = XLMRobertaForQuestionAnswering.from_pretrained('a-ware/xlmroberta-squadv2')
-
-question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
-encoding = tokenizer(question, text, return_tensors='pt')
-input_ids = encoding['input_ids']
-attention_mask = encoding['attention_mask']
-
-start_scores, end_scores = model(input_ids, attention_mask=attention_mask, output_attentions=False)[:2]
-
-all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
-answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])
-answer = tokenizer.convert_tokens_to_ids(answer.split())
-answer = tokenizer.decode(answer)
-#answer => 'a nice puppet' 
-```
-
-> Created with ❤️ by A-ware UG [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/aware-ai)
--- a/model_cards/abhilash1910/financial_roberta/README.md
+++ b/model_cards/abhilash1910/financial_roberta/README.md
@ -1,132 +0,0 @@
---
-tags:
- finance
---
-# Roberta Masked Language Model Trained On Financial Phrasebank Corpus 
-
-
-This is a Masked Language Model trained with [Roberta](https://huggingface.co/transformers/model_doc/roberta.html) on a Financial Phrasebank Corpus.
-The model is built using Huggingface transformers.
-The model can be found at :[Financial_Roberta](https://huggingface.co/abhilash1910/financial_roberta)
-
-
-## Specifications
-
-
-The corpus for training is taken from the Financial Phrasebank (Malo et al)[https://www.researchgate.net/publication/251231107_Good_Debt_or_Bad_Debt_Detecting_Semantic_Orientations_in_Economic_Texts]. 
-
-
-## Model Specification
-
-
-The model chosen for training is [Roberta](https://arxiv.org/abs/1907.11692) with the following specifications:
- 1. vocab_size=56000
- 2. max_position_embeddings=514
- 3. num_attention_heads=12
- 4. num_hidden_layers=6
- 5. type_vocab_size=1
-
-
-This is trained by using  RobertaConfig from transformers package.
-The model is trained for 10 epochs with a gpu batch size of 64 units. 
-
-
-
-## Usage Specifications
-
-
-For using this model, we have to first import AutoTokenizer and AutoModelWithLMHead Modules from transformers
-After that we have to specify, the pre-trained model,which in this case is 'abhilash1910/financial_roberta' for the tokenizers and the model.
-
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("abhilash1910/financial_roberta")
-
-model = AutoModelWithLMHead.from_pretrained("abhilash1910/financial_roberta")
-```
-
-
-After this the model will be downloaded, it will take some time to download all the model files.
-For testing the model, we have to import  pipeline module from transformers and create a masked output model for inference as follows:
-
-
-```python
-from transformers import pipeline
-model_mask = pipeline('fill-mask', model='abhilash1910/inancial_roberta')
-model_mask("The  company had a <mask> of 20% in 2020.")
-```
-
-
-Some of the examples are also provided with generic financial statements:
-
-Example 1:
-
-
-```python
-model_mask("The  company had a <mask> of 20% in 2020.")
-```
-
-
-Output:
-
-
-```bash
-[{'sequence': '<s>The  company had a profit of 20% in 2020.</s>',
-  'score': 0.023112965747714043,
-  'token': 421,
-  'token_str': 'Ġprofit'},
- {'sequence': '<s>The  company had a loss of 20% in 2020.</s>',
-  'score': 0.021379893645644188,
-  'token': 616,
-  'token_str': 'Ġloss'},
- {'sequence': '<s>The  company had a year of 20% in 2020.</s>',
-  'score': 0.0185744296759367,
-  'token': 443,
-  'token_str': 'Ġyear'},
- {'sequence': '<s>The  company had a sales of 20% in 2020.</s>',
-  'score': 0.018143286928534508,
-  'token': 428,
-  'token_str': 'Ġsales'},
- {'sequence': '<s>The  company had a value of 20% in 2020.</s>',
-  'score': 0.015319528989493847,
-  'token': 776,
-  'token_str': 'Ġvalue'}]
-  ```
- 
- Example 2:
- 
-```python
- model_mask("The <mask>  is listed under NYSE")
-```
-
-Output:
-
-```bash
-[{'sequence': '<s>The company  is listed under NYSE</s>',
-  'score': 0.1566661298274994,
-  'token': 359,
-  'token_str': 'Ġcompany'},
- {'sequence': '<s>The total  is listed under NYSE</s>',
-  'score': 0.05542507395148277,
-  'token': 522,
-  'token_str': 'Ġtotal'},
- {'sequence': '<s>The value  is listed under NYSE</s>',
-  'score': 0.04729423299431801,
-  'token': 776,
-  'token_str': 'Ġvalue'},
- {'sequence': '<s>The order  is listed under NYSE</s>',
-  'score': 0.02533523552119732,
-  'token': 798,
-  'token_str': 'Ġorder'},
- {'sequence': '<s>The contract  is listed under NYSE</s>',
-  'score': 0.02087237872183323,
-  'token': 635,
-  'token_str': 'Ġcontract'}]
-  ```
-  
-
-## Resources
-
-For all resources , please look into the [HuggingFace](https://huggingface.co/) Site and the [Repositories](https://github.com/huggingface).
--- a/model_cards/abhilash1910/french-roberta/README.md
+++ b/model_cards/abhilash1910/french-roberta/README.md
@ -1,131 +0,0 @@
-# Roberta Trained Model For Masked Language Model On French Corpus :robot:
-
-
-This is a Masked Language Model trained with [Roberta](https://huggingface.co/transformers/model_doc/roberta.html) on a small French News Corpus(Leipzig corpora).
-The model is built using Huggingface transformers.
-The model can be found at :[French-Roberta](https://huggingface.co/abhilash1910/french-roberta)
-
-
-## Specifications
-
-
-The corpus for training is taken from Leipzig Corpora (French News) , and is trained on a small set of the corpus (300K). 
-
-
-## Model Specification
-
-
-The model chosen for training is [Roberta](https://arxiv.org/abs/1907.11692) with the following specifications:
- 1. vocab_size=32000
- 2. max_position_embeddings=514
- 3. num_attention_heads=12
- 4. num_hidden_layers=6
- 5. type_vocab_size=1
-
-
-This is trained by using  RobertaConfig from transformers package.The total training parameters :68124416
-The model is trained for 100 epochs with a gpu batch size of 64 units. 
-More details for building custom models can be found at the [HuggingFace Blog](https://huggingface.co/blog/how-to-train)
-
-
-
-## Usage Specifications
-
-
-For using this model, we have to first import AutoTokenizer and AutoModelWithLMHead Modules from transformers
-After that we have to specify, the pre-trained model,which in this case is 'abhilash1910/french-roberta' for the tokenizers and the model.
-
-
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("abhilash1910/french-roberta")
-
-model = AutoModelWithLMHead.from_pretrained("abhilash1910/french-roberta")
-```
-
-
-After this the model will be downloaded, it will take some time to download all the model files.
-For testing the model, we have to import  pipeline module from transformers and create a masked output model for inference as follows:
-
-
-```python
-from transformers import pipeline
-model_mask = pipeline('fill-mask', model='abhilash1910/french-roberta')
-model_mask("Le tweet <mask>.")
-```
-
-
-Some of the examples are also provided with generic French sentences:
-
-Example 1:
-
-
-```python
-model_mask("À ce jour, <mask> projet a entraîné")
-```
-
-
-Output:
-
-
-```bash
-[{'sequence': '<s>À ce jour, belles projet a entraîné</s>',
-  'score': 0.18685665726661682,
-  'token': 6504,
-  'token_str': 'Ġbelles'},
- {'sequence': '<s>À ce jour,- projet a entraîné</s>',
-  'score': 0.0005200508167035878,
-  'token': 17,
-  'token_str': '-'},
- {'sequence': '<s>À ce jour, de projet a entraîné</s>',
-  'score': 0.00045729897101409733,
-  'token': 268,
-  'token_str': 'Ġde'},
- {'sequence': '<s>À ce jour, du projet a entraîné</s>',
-  'score': 0.0004307595663703978,
-  'token': 326,
-  'token_str': 'Ġdu'},
- {'sequence': '<s>À ce jour," projet a entraîné</s>',
-  'score': 0.0004219160182401538,
-  'token': 6,
-  'token_str': '"'}]
-  ```
- 
- Example 2:
- 
-```python
- model_mask("C'est un <mask>")
-```
-
-Output:
-
-```bash
-[{'sequence': "<s>C'est un belles</s>",
-  'score': 0.16440927982330322,
-  'token': 6504,
-  'token_str': 'Ġbelles'},
- {'sequence': "<s>C'est un de</s>",
-  'score': 0.0005495127406902611,
-  'token': 268,
-  'token_str': 'Ġde'},
- {'sequence': "<s>C'est un du</s>",
-  'score': 0.00044988933950662613,
-  'token': 326,
-  'token_str': 'Ġdu'},
- {'sequence': "<s>C'est un-</s>",
-  'score': 0.00044542422983795404,
-  'token': 17,
-  'token_str': '-'},
- {'sequence': "<s>C'est un\t</s>",
-  'score': 0.00037563967634923756,
-  'token': 202,
-  'token_str': 'ĉ'}]
-  ```
-  
-
-## Resources
-
-For all resources , please look into the [HuggingFace](https://huggingface.co/) Site and the [Repositories](https://github.com/huggingface).
-
-
--- a/model_cards/activebus/BERT-DK_laptop/README.md
+++ b/model_cards/activebus/BERT-DK_laptop/README.md
@ -1,43 +0,0 @@
-# ReviewBERT
-
-BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
-
-`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`. 
-
-
-## Model Description
-
-The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
-Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
-
-`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`. 
-
-## Instructions
-Loading the post-trained weights are as simple as, e.g., 
-
-```python
-import torch
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-DK_laptop")
-model = AutoModel.from_pretrained("activebus/BERT-DK_laptop")
-
-```
-
-
-## Evaluation Results
-
-Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
-
-
-## Citation
-If you find this work useful, please cite as following.
-```
-@inproceedings{xu_bert2019,
-    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
-    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
-    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
-    month = "jun",
-    year = "2019",
-}
-```
--- a/model_cards/activebus/BERT-DK_rest/README.md
+++ b/model_cards/activebus/BERT-DK_rest/README.md
@ -1,41 +0,0 @@
-# ReviewBERT
-
-BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.
-
-`BERT-DK_rest` is trained from 1G (19 types) restaurants from Yelp.  
-
-## Model Description
-
-The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
-Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
-
-
-## Instructions
-Loading the post-trained weights are as simple as, e.g., 
-
-```python
-import torch
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-DK_rest")
-model = AutoModel.from_pretrained("activebus/BERT-DK_rest")
-
-```
-
-
-## Evaluation Results
-
-Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
-
-
-## Citation
-If you find this work useful, please cite as following.
-```
-@inproceedings{xu_bert2019,
-    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
-    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
-    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
-    month = "jun",
-    year = "2019",
-}
-```
--- a/model_cards/activebus/BERT-PT_laptop/README.md
+++ b/model_cards/activebus/BERT-PT_laptop/README.md
@ -1,41 +0,0 @@
-# ReviewBERT
-
-BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
-
-`BERT-DK_laptop` is trained from 100MB laptop corpus under `Electronics/Computers & Accessories/Laptops`. 
-`BERT-PT_*` addtionally uses SQuAD 1.1.  
-
-## Model Description
-
-The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
-Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
-
-
-## Instructions
-Loading the post-trained weights are as simple as, e.g., 
-
-```python
-import torch
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-PT_laptop")
-model = AutoModel.from_pretrained("activebus/BERT-PT_laptop")
-
-```
-
-## Evaluation Results
-
-Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
-
-
-## Citation
-If you find this work useful, please cite as following.
-```
-@inproceedings{xu_bert2019,
-    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
-    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
-    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
-    month = "jun",
-    year = "2019",
-}
-```
--- a/model_cards/activebus/BERT-PT_rest/README.md
+++ b/model_cards/activebus/BERT-PT_rest/README.md
@ -1,42 +0,0 @@
-# ReviewBERT
-
-BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
-
-`BERT-DK_rest` is trained from 1G (19 types) restaurants from Yelp.
-`BERT-PT_*` addtionally uses SQuAD 1.1.  
-
-## Model Description
-
-The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
-Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
-
-
-## Instructions
-Loading the post-trained weights are as simple as, e.g., 
-
-```python
-import torch
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-PT_rest")
-model = AutoModel.from_pretrained("activebus/BERT-PT_rest")
-
-```
-
-
-## Evaluation Results
-
-Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
-
-
-## Citation
-If you find this work useful, please cite as following.
-```
-@inproceedings{xu_bert2019,
-    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
-    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
-    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
-    month = "jun",
-    year = "2019",
-}
-```
--- a/model_cards/activebus/BERT-XD_Review/README.md
+++ b/model_cards/activebus/BERT-XD_Review/README.md
@ -1,44 +0,0 @@
-# ReviewBERT
-
-BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
-Please visit https://github.com/howardhsu/BERT-for-RRC-ABSA for details.  
-
-`BERT-XD_Review` is a cross-domain (beyond just `laptop` and `restaurant`) language model, where each example is from a single product / restaurant with the same rating, post-trained (fine-tuned) on a combination of 5-core Amazon reviews and all Yelp data, expected to be 22 G in total. It is trained for 4 epochs on `bert-base-uncased`.
-The preprocessing code [here](https://github.com/howardhsu/BERT-for-RRC-ABSA/transformers).
-
-## Model Description
-
-The original model is from `BERT-base-uncased`.  
-Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
-
-
-## Instructions
-Loading the post-trained weights are as simple as, e.g., 
-
-```python
-import torch
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("activebus/BERT-XD_Review")
-model = AutoModel.from_pretrained("activebus/BERT-XD_Review")
-
-```
-
-
-## Evaluation Results
-
-Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
-`BERT_Review` is expected to have similar performance on domain-specific tasks (such as aspect extraction) as `BERT-DK`, but much better on general tasks such as aspect sentiment classification (different domains mostly share similar sentiment words).
-
-
-## Citation
-If you find this work useful, please cite as following.
-```
-@inproceedings{xu_bert2019,
-    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
-    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
-    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
-    month = "jun",
-    year = "2019",
-}
-```
--- a/model_cards/activebus/BERT_Review/README.md
+++ b/model_cards/activebus/BERT_Review/README.md
@ -1,44 +0,0 @@
-# ReviewBERT
-
-BERT (post-)trained from review corpus to understand sentiment, options and various e-commence aspects.  
-
-`BERT_Review` is cross-domain (beyond just `laptop` and `restaurant`) language model with one example from randomly mixed domains, post-trained (fine-tuned) on a combination of 5-core Amazon reviews and all Yelp data, expected to be 22 G in total. It is trained for 4 epochs on `bert-base-uncased`.
-The preprocessing code [here](https://github.com/howardhsu/BERT-for-RRC-ABSA/transformers).
-
-
-## Model Description
-
-The original model is from `BERT-base-uncased` trained from Wikipedia+BookCorpus.  
-Models are post-trained from [Amazon Dataset](http://jmcauley.ucsd.edu/data/amazon/) and [Yelp Dataset](https://www.yelp.com/dataset/challenge/).  
-
-
-## Instructions
-Loading the post-trained weights are as simple as, e.g., 
-
-```python
-import torch
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("activebus/BERT_Review")
-model = AutoModel.from_pretrained("activebus/BERT_Review")
-
-```
-
-
-## Evaluation Results
-
-Check our [NAACL paper](https://www.aclweb.org/anthology/N19-1242.pdf) 
-`BERT_Review` is expected to have similar performance on domain-specific tasks (such as aspect extraction) as `BERT-DK`, but much better on general tasks such as aspect sentiment classification (different domains mostly share similar sentiment words).
-
-
-## Citation
-If you find this work useful, please cite as following.
-```
-@inproceedings{xu_bert2019,
-    title = "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis",
-    author = "Xu, Hu and Liu, Bing and Shu, Lei and Yu, Philip S.",
-    booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics",
-    month = "jun",
-    year = "2019",
-}
-```
--- a/model_cards/adalbertojunior/PTT5-SMALL-SUM/README.md
+++ b/model_cards/adalbertojunior/PTT5-SMALL-SUM/README.md
@ -1,37 +0,0 @@
---
-language: pt
---
-
-# PTT5-SMALL-SUM
-
-## Model description
-
-This model was trained to summarize texts in portuguese
-
-
-based on ```unicamp-dl/ptt5-small-portuguese-vocab```
-
-#### How to use
-
-```python
-from transformers import T5Tokenizer, T5ForConditionalGeneration
-
-tokenizer = T5Tokenizer.from_pretrained('adalbertojunior/PTT5-SMALL-SUM')
-
-t5 = T5ForConditionalGeneration.from_pretrained('adalbertojunior/PTT5-SMALL-SUM')
-
-text="Esse é um exemplo de sumarização."
-
-input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True)
-
-generated_ids = t5.generate(
-        input_ids=input_ids,
-        num_beams=1,
-        max_length=40,
-        #repetition_penalty=2.5
-    ).squeeze()
-    
-predicted_span = tokenizer.decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
-
-
-```
--- a/Show More
+++ b/Show More