diff --git a/model_cards/bayartsogt/albert-mongolian/README.md b/model_cards/bayartsogt/albert-mongolian/README.md new file mode 100644 index 00000000000..31b734c076b --- /dev/null +++ b/model_cards/bayartsogt/albert-mongolian/README.md @@ -0,0 +1,55 @@ +# ALBERT-Mongolian +[pretraining repo link](https://github.com/bayartsogt-ya/albert-mongolian) +## Model description +Here we provide pretrained ALBERT model and trained SentencePiece model for Mongolia text. Training data is the Mongolian wikipedia corpus from Wikipedia Downloads and Mongolian News corpus. + +## Evaluation Result: +``` +loss = 1.7478163 +masked_lm_accuracy = 0.6838185 +masked_lm_loss = 1.6687671 +sentence_order_accuracy = 0.998125 +sentence_order_loss = 0.007942731 +``` + +## Fine-tuning Result on Eduge Dataset: +``` + precision recall f1-score support + + байгал орчин 0.83 0.76 0.80 483 + боловсрол 0.79 0.75 0.77 420 + спорт 0.98 0.96 0.97 1391 + технологи 0.85 0.83 0.84 543 + улс төр 0.88 0.87 0.87 1336 + урлаг соёл 0.89 0.94 0.91 726 + хууль 0.87 0.83 0.85 840 + эдийн засаг 0.80 0.84 0.82 1265 + эрүүл мэнд 0.84 0.90 0.87 562 + + accuracy 0.87 7566 + macro avg 0.86 0.85 0.86 7566 + weighted avg 0.87 0.87 0.87 7566 +``` + +## Reference +1. [ALBERT - official repo](https://github.com/google-research/albert) +2. [WikiExtrator](https://github.com/attardi/wikiextractor) +3. [Mongolian BERT](https://github.com/tugstugi/mongolian-bert) +4. [ALBERT - Japanese](https://github.com/alinear-corp/albert-japanese) +5. [Mongolian Text Classification](https://github.com/sharavsambuu/mongolian-text-classification) +6. [You's paper](https://arxiv.org/abs/1904.00962) + +## Citation +``` +@misc{albert-mongolian, + author = {Bayartsogt Yadamsuren}, + title = {ALBERT Pretrained Model on Mongolian Datasets}, + year = {2020}, + publisher = {GitHub}, + journal = {GitHub repository}, + howpublished = {\url{https://github.com/bayartsogt-ya/albert-mongolian/}} +} +``` + +## For More Information +Please contact by bayartsogtyadamsuren@icloud.com