mirror of https://github.com/huggingface/transformers.git synced 2025-08-03 03:31:05 +06:00

History

Philip May 29afb5764f Bert german dbmdz uncased sentence stsb (#6000 ) * Describe usage of sentence model * fix typo usage * add use and description to readme * fix typo in readme * readme formatting * add training procedure to readme * description name and company * readme formatting * dataset training readme * typo * readme	2020-07-23 17:56:45 -04:00
..
README.md	Bert german dbmdz uncased sentence stsb (#6000 )	2020-07-23 17:56:45 -04:00

Bert german dbmdz uncased sentence stsb (#6000 )

* Describe usage of sentence model

* fix typo usage

* add use and description to readme

* fix typo in readme

* readme formatting

* add training procedure to readme

* description name and company

* readme formatting

* dataset training readme

* typo

* readme

2020-07-23 17:56:45 -04:00

README.md

Bert german dbmdz uncased sentence stsb (#6000 )

2020-07-23 17:56:45 -04:00

README.md

language	license
de	mit

bert-german-dbmdz-uncased-sentence-stsb

How to use

The usage description above - provided by Hugging Face - is wrong! Please use this:

Install the sentence-transformers package. See here: https://github.com/UKPLab/sentence-transformers

from sentence_transformers import models
from sentence_transformers import SentenceTransformer

# load BERT model from Hugging Face
word_embedding_model = models.Transformer(
    'T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb')

# Apply mean pooling to get one fixed sized sentence vector
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),
                               pooling_mode_mean_tokens=True,
                               pooling_mode_cls_token=False,
                               pooling_mode_max_tokens=False)

# join BERT model and pooling to get the sentence transformer
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

Model description

This is a German sentence embedding trained on the German STSbenchmark Dataset. It was trained from Philip May and open-sourced by T-Systems-onsite.The base language model is the dbmdz/bert-base-german-uncased from Bayerische Staatsbibliothek .

Intended uses

Sentence-BERT (SBERT) is a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically mean-ingful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.

Source: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Training procedure

We did an automatic hyperprameter optimization with Optuna and found the following hyperprameters:

batch_size = 5
num_epochs = 11
lr = 2.637549780860126e-05
eps = 5.0696075038683e-06
weight_decay = 0.02817210102940054
warmup_steps = 27.342745941760147 % of total steps

The final model was trained on the combination of all three datasets: sts_de_dev.csv, sts_de_test.csv and sts_de_train.csv