mirror of https://github.com/huggingface/transformers.git synced 2025-08-03 03:31:05 +06:00

History

Sylvain Gugger 908a28894c Add new token classification example (#8340 ) * Add new token classification example * Remove txt file * Add test * With actual testing done * Less warmup is better * Update examples/token-classification/run_ner_new.py Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> * Address review comments * Fix test * Make Lysandre happy * Last touches and rename * Rename in tests * Address review comments * More run_ner -> run_ner_old Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>	2020-11-09 11:39:55 -05:00
..
README.md	Add new token classification example (#8340 )	2020-11-09 11:39:55 -05:00

Add new token classification example (#8340 )

* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

2020-11-09 11:39:55 -05:00

README.md

Add new token classification example (#8340 )

2020-11-09 11:39:55 -05:00

README.md

language	thumbnail
es	https://i.imgur.com/jgBdimh.png

Spanish BERT (BETO) + NER

This model is a fine-tuned on NER-C version of the Spanish BERT cased (BETO) for NER downstream task.

Details of the downstream task (NER) - Dataset

Dataset: CONLL Corpora ES

I preprocessed the dataset and split it as train / dev (80/20)

Dataset	# Examples
Train	8.7 K
Dev	2.2 K

Fine-tune on NER script provided by Huggingface
Labels covered:

B-LOC
B-MISC
B-ORG
B-PER
I-LOC
I-MISC
I-ORG
I-PER
O

Metrics on evaluation set:

Metric	# score
F1	90.17
Precision	89.86
Recall	90.47

Comparison:

Model	# F1 score	Size(MB)
bert-base-spanish-wwm-cased (BETO)	88.43	421
bert-spanish-cased-finetuned-ner (this one)	90.17	420
Best Multilingual BERT	87.38	681
TinyBERT-spanish-uncased-finetuned-ner	70.00	55

Model in action

Fast usage with pipelines:

from transformers import pipeline

nlp_ner = pipeline(
    "ner",
    model="mrm8488/bert-spanish-cased-finetuned-ner",
    tokenizer=(
        'mrm8488/bert-spanish-cased-finetuned-ner',  
        {"use_fast": False}
))

text = 'Mis amigos están pensando viajar a Londres este verano'

nlp_ner(text)

#Output: [{'entity': 'B-LOC', 'score': 0.9998720288276672, 'word': 'Londres'}]

Created by Manuel Romero/@mrm8488

Made with ♥ in Spain