--- language: catalan --- # CALBERT: a Catalan Language Model ## Introduction CALBERT is an open-source language model forĀ Catalan based on theĀ ALBERT architecture. It is now available on Hugging Face in its `base-uncased` version, and was pretrained on the [OSCAR dataset](https://traces1.inria.fr/oscar/). For further information or requests, please go to the [GitHub repository](https://github.com/codegram/calbert) ## Pre-trained models | Model | Arch. | Training data | |-------------------------------------|------------------|-----------------------------------| | `codegram` / `calbert-base-uncased` | Base (uncased) | OSCAR (4.3 GB of text) | ## Authors CALBERT was trained and evaluated by [Txus Bach](https://twitter.com/txustice), as part of [Codegram](https://www.codegram.com)'s applied research.