Create README.md (#5871)

2025-08-03 03:31:05 +06:00 · 2020-07-21 19:19:48 +02:00 · 2020-07-21 19:19:48 +02:00 · e7844d60c2
commit e7844d60c2
parent b1ee69763c
1 changed files with 50 additions and 0 deletions
--- a/model_cards/jannesg/takalane_nbl_roberta/README.md
+++ b/model_cards/jannesg/takalane_nbl_roberta/README.md
@ -0,0 +1,50 @@
+---
+language: 
+- nr
+thumbnail: https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg
+tags:
+- nr
+- fill-mask
+- pytorch
+- roberta
+- lm-head
+- masked-lm
+license: MIT
+---
+
+# Takalani Sesame - Ndebele 🇿🇦
+
+<img src="https://pbs.twimg.com/media/EVjR6BsWoAAFaq5.jpg" width="600"/> 
+
+## Model description
+
+Takalani Sesame (named after the South African version of Sesame Street) is a project that aims to promote the use of South African languages in NLP, and in particular look at techniques for low-resource languages to equalise performance with larger languages around the world.
+
+## Intended uses & limitations
+
+#### How to use
+
+```python
+from transformers import AutoTokenizer, AutoModelWithLMHead
+
+tokenizer = AutoTokenizer.from_pretrained("jannesg/takalane_nbl_roberta")
+
+model = AutoModelWithLMHead.from_pretrained("jannesg/takalane_nbl_roberta")
+```
+
+#### Limitations and bias
+
+Updates will be added continously to improve performance. This is a very low resource language, results may be poor at first. 
+
+## Training data
+
+Data collected from [https://wortschatz.uni-leipzig.de/en](https://wortschatz.uni-leipzig.de/en) <br/>
+**Sentences:** 318M
+
+## Training procedure
+
+No preprocessing. Standard Huggingface hyperparameters. 
+
+## Author
+
+Jannes Germishuys [website](http://jannesgg.github.io)