.. | ||
README.md | ||
requirements.txt | ||
run_flax_ner.py |
Token classification examples
Fine-tuning the library models for token classification task such as Named Entity Recognition (NER), Parts-of-speech tagging (POS) or phrase extraction (CHUNKS). The main script run_flax_ner.py leverages the 🤗 Datasets library. You can easily customize it to your needs if you need extra processing on your datasets.
It will either run on a datasets hosted on our hub or with your own text files for training and validation, you might just need to add some tweaks in the data preprocessing.
The following example fine-tunes BERT on CoNLL-2003:
To begin with it is recommended to create a model repository to save the trained model and logs.
Here we call the model "bert-ner-conll2003-test"
, but you can change the model name as you like.
You can do this either directly on huggingface.co (assuming that you are logged in) or via the command line:
huggingface-cli repo create bert-ner-conll2003-test
Next we clone the model repository to add the tokenizer and model files.
git clone https://huggingface.co/<your-username>/bert-ner-conll2003-test
Great, we have set up our model repository. During training, we will automatically push the training logs and model weights to the repo.
Next, let's add a symbolic link to the run_flax_ner.py
.
export MODEL_DIR="./bert-ner-conll2003-test"
ln -s ~/transformers/examples/flax/token-classification/run_flax_ner.py run_flax_ner.py
python run_flax_ner.py \
--model_name_or_path bert-base-cased \
--dataset_name conll2003 \
--max_seq_length 128 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--output_dir ${MODEL_DIR} \
--eval_steps 300 \
--push_to_hub
Using the command above, the script will train for 3 epochs and run eval after each epoch.
Metrics and hyperparameters are stored in Tensorflow event files in --output_dir
.
You can see the results by running tensorboard
in that directory:
$ tensorboard --logdir .
or directly on the hub under Training metrics.
sample Metrics - tfhub.dev