transformers/examples/flax/token-classification
Albert Villanova del Moral a14b055b65
Pass datasets trust_remote_code (#31406)
* Pass datasets trust_remote_code

* Pass trust_remote_code in more tests

* Add trust_remote_dataset_code arg to some tests

* Revert "Temporarily pin datasets upper version to fix CI"

This reverts commit b7672826ca.

* Pass trust_remote_code in librispeech_asr_dummy docstrings

* Revert "Pin datasets<2.20.0 for examples"

This reverts commit 833fc17a3e.

* Pass trust_remote_code to all examples

* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects

* Pass trust_remote_code to tests

* Pass trust_remote_code to docstrings

* Fix flax examples tests requirements

* Pass trust_remote_dataset_code arg to tests

* Replace trust_remote_dataset_code with trust_remote_code in one example

* Fix duplicate trust_remote_code

* Replace args.trust_remote_dataset_code with args.trust_remote_code

* Replace trust_remote_dataset_code with trust_remote_code in parser

* Replace trust_remote_dataset_code with trust_remote_code in dataclasses

* Replace trust_remote_dataset_code with trust_remote_code arg
2024-06-17 17:29:13 +01:00
..
README.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
requirements.txt bump flax version (#14343) 2021-11-09 22:15:22 +05:30
run_flax_ner.py Pass datasets trust_remote_code (#31406) 2024-06-17 17:29:13 +01:00

Token classification examples

Fine-tuning the library models for token classification task such as Named Entity Recognition (NER), Parts-of-speech tagging (POS) or phrase extraction (CHUNKS). The main script run_flax_ner.py leverages the 🤗 Datasets library. You can easily customize it to your needs if you need extra processing on your datasets.

It will either run on a datasets hosted on our hub or with your own text files for training and validation, you might just need to add some tweaks in the data preprocessing.

The following example fine-tunes BERT on CoNLL-2003:

python run_flax_ner.py \
  --model_name_or_path google-bert/bert-base-cased \
  --dataset_name conll2003 \
  --max_seq_length 128 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --per_device_train_batch_size 4 \
  --output_dir ./bert-ner-conll2003 \
  --eval_steps 300 \
  --push_to_hub

Using the command above, the script will train for 3 epochs and run eval after each epoch. Metrics and hyperparameters are stored in Tensorflow event files in --output_dir. You can see the results by running tensorboard in that directory:

$ tensorboard --logdir .

or directly on the hub under Training metrics.

sample Metrics - tfhub.dev