transformers/model_cards/codegram/calbert-base-uncased/README.md
2020-04-25 09:16:40 -04:00

26 lines
900 B
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language: catalan
---
# CALBERT: a Catalan Language Model
## Introduction
CALBERT is an open-source language model for Catalan based on the ALBERT architecture.
It is now available on Hugging Face in its `base-uncased` version, and was pretrained on the [OSCAR dataset](https://traces1.inria.fr/oscar/).
For further information or requests, please go to the [GitHub repository](https://github.com/codegram/calbert)
## Pre-trained models
| Model | Arch. | Training data |
|-------------------------------------|------------------|-----------------------------------|
| `codegram` / `calbert-base-uncased` | Base (uncased) | OSCAR (4.3 GB of text) |
## Authors
CALBERT was trained and evaluated by [Txus Bach](https://twitter.com/txustice), as part of [Codegram](https://www.codegram.com)'s applied research.