From 6e8a38568eb874f31eb49c42285c3a634fca12e7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?M=2E=20Yusuf=20Sar=C4=B1g=C3=B6z?= <yusufsarigoz@gmail.com>
Date: Sun, 9 Aug 2020 10:39:51 +0300
Subject: [PATCH] [model_cards] electra-base-turkish-cased-ner (#6350)

* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
---
 .../mys/electra-base-turkish-cased-ner/README.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)
 create mode 100644 model_cards/mys/electra-base-turkish-cased-ner/README.md

diff --git a/model_cards/mys/electra-base-turkish-cased-ner/README.md b/model_cards/mys/electra-base-turkish-cased-ner/README.md
new file mode 100644
index 00000000000..addd29099b0
--- /dev/null
+++ b/model_cards/mys/electra-base-turkish-cased-ner/README.md
@@ -0,0 +1,16 @@
+---
+language: tr
+---
+
+## What is this
+
+A NER model for Turkish with 48 categories trained on the dataset [Shrinked TWNERTC Turkish NER Data](https://www.kaggle.com/behcetsenturk/shrinked-twnertc-turkish-ner-data-by-kuzgunlar) by Behçet Şentürk, which is itself a filtered and cleaned version of the following automatically labeled dataset:
+
+> Sahin, H. Bahadir; Eren, Mustafa Tolga; Tirkaz, Caglar; Sonmez, Ozan; Yildiz, Eray (2017), “English/Turkish Wikipedia Named-Entity Recognition and Text Categorization Dataset”, Mendeley Data, v1 http://dx.doi.org/10.17632/cdcztymf4k.1
+
+## Backbone model
+
+The backbone model is [electra-base-turkish-cased-discriminator](https://huggingface.co/dbmdz/electra-base-turkish-cased-discriminator), and I finetuned it for token classification.
+
+I'm continuing to figure out if it is possible to improve accuracy with this dataset, but it is already usable for non-critic applications. You can reach out to me on [Twitter](https://twitter.com/myusufsarigoz) for discussions and issues. 
+I will also release a notebook to finetune NER models with Shrinked TWNERTC as well as sample inference code to demonstrate what's possible with this model.