transformers/model_cards/bashar-talafha/multi-dialect-bert-base-arabic
2020-07-14 12:51:57 -04:00
..
README.md Update README.md (#5696) 2020-07-14 12:51:57 -04:00

language thumbnail datasets
arabic https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png
nadi

Multi-dialect-Arabic-BERT

This is a repository of Multi-dialect Arabic BERT model.

By Mawdoo3-AI.


Background reference: http://www.qfi.org/wp-content/uploads/2018/02/Qfi_Infographic_Mother-Language_Final.pdf

About our Multi-dialect-Arabic-BERT model

Instead of training the Multi-dialect Arabic BERT model from scratch, we initialized the weights of the model using Arabic-BERT and trained it on 10M arabic tweets from the unlabled data of The Nuanced Arabic Dialect Identification (NADI) shared task.

To cite this work

We will update the BibTeX once the paper published.

Usage

The model weights can be loaded using transformers library by HuggingFace.

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")
model = AutoModel.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")

Example using pipeline:

from transformers import pipeline

fill_mask = pipeline(
    "fill-mask",
    model="bashar-talafha/multi-dialect-bert-base-arabic ",
    tokenizer="bashar-talafha/multi-dialect-bert-base-arabic "
)

fill_mask(" سافر الرحالة من مطار [MASK] ")
[{'sequence': '[CLS] سافر الرحالة من مطار الكويت [SEP]', 'score': 0.08296813815832138, 'token': 3226},
 {'sequence': '[CLS] سافر الرحالة من مطار دبي [SEP]', 'score': 0.05123933032155037, 'token': 4747},
 {'sequence': '[CLS] سافر الرحالة من مطار مسقط [SEP]', 'score': 0.046838656067848206, 'token': 13205},
 {'sequence': '[CLS] سافر الرحالة من مطار القاهرة [SEP]', 'score': 0.03234650194644928, 'token': 4003},
 {'sequence': '[CLS] سافر الرحالة من مطار الرياض [SEP]', 'score': 0.02606341242790222, 'token': 2200}]

Repository

Please check the original repository for more information.