mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

History

Severin Simmler 7f98edd7e3 Model card: Literary German BERT (#2843 ) * feat: create model card * chore: add description * feat: stats plot * Delete prosa-jahre.svg * feat: years plot (again) * chore: add more details * fix: typos * feat: kfold plot * feat: kfold plot * Rename model_cards/severinsimmler/literary-german-bert.md to model_cards/severinsimmler/literary-german-bert/README.md * Support for linked images + add tags cc @severinsimmler Co-authored-by: Julien Chaumond <chaumond@gmail.com>		2020-02-13 15:43:44 -05:00
..
kfold.png	Model card: Literary German BERT (#2843 )	2020-02-13 15:43:44 -05:00
prosa-jahre.png	Model card: Literary German BERT (#2843 )	2020-02-13 15:43:44 -05:00
README.md	Model card: Literary German BERT (#2843 )	2020-02-13 15:43:44 -05:00

README.md

language	thumbnail
german	kfold.png

German BERT for literary texts

This German BERT is based on bert-base-german-dbmdz-cased, and has been adapted to the domain of literary texts by fine-tuning the language modeling task on the Corpus of German-Language Fiction. Afterwards the model was fine-tuned for named entity recognition on the DROC corpus, so you can use it to recognize protagonists in German novels.

Stats

Language modeling

The Corpus of German-Language Fiction consists of 3,194 documents with 203,516,988 tokens or 1,520,855 types. The publication year of the texts ranges from the 18th to the 20th century:

Results

After one epoch:

Model	Perplexity
Vanilla BERT	6.82
Fine-tuned BERT	4.98

Named entity recognition

The provided model was also fine-tuned for two epochs on 10,799 sentences for training, validated on 547 and tested on 1,845 with three labels: B-PER, I-PER and O.

Results

Dataset	Precision	Recall	F1
Dev	96.4	87.3	91.6
Test	92.8	94.9	93.8

The model has also been evaluated using 10-fold cross validation and compared with a classic Conditional Random Field baseline described in Jannidis et al. (2015):

References

Markus Krug, Lukas Weimer, Isabella Reger, Luisa Macharowsky, Stephan Feldhaus, Frank Puppe, Fotis Jannidis, Description of a Corpus of Character References in German Novels, 2018.

Fotis Jannidis, Isabella Reger, Lukas Weimer, Markus Krug, Martin Toepfer, Frank Puppe, Automatische Erkennung von Figuren in deutschsprachigen Romanen, 2015.