mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-07 14:50:07 +06:00

* Reorganize example folder * Continue reorganization * Change requirements for tests * Final cleanup * Finish regroup with tests all passing * Copyright * Requirements and readme * Make a full link for the documentation * Address review comments * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Add symlink * Reorg again * Apply suggestions from code review Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> * Adapt title * Update to new strucutre * Remove test * Update READMEs Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
39 lines
1.4 KiB
Markdown
39 lines
1.4 KiB
Markdown
## Adversarial evaluation of model performances
|
|
|
|
Here is an example on evaluating a model using adversarial evaluation of natural language inference with the Heuristic Analysis for NLI Systems (HANS) dataset [McCoy et al., 2019](https://arxiv.org/abs/1902.01007). The example was gracefully provided by [Nafise Sadat Moosavi](https://github.com/ns-moosavi).
|
|
|
|
The HANS dataset can be downloaded from [this location](https://github.com/tommccoy1/hans).
|
|
|
|
This is an example of using test_hans.py:
|
|
|
|
```bash
|
|
export HANS_DIR=path-to-hans
|
|
export MODEL_TYPE=type-of-the-model-e.g.-bert-roberta-xlnet-etc
|
|
export MODEL_PATH=path-to-the-model-directory-that-is-trained-on-NLI-e.g.-by-using-run_glue.py
|
|
|
|
python run_hans.py \
|
|
--task_name hans \
|
|
--model_type $MODEL_TYPE \
|
|
--do_eval \
|
|
--data_dir $HANS_DIR \
|
|
--model_name_or_path $MODEL_PATH \
|
|
--max_seq_length 128 \
|
|
--output_dir $MODEL_PATH \
|
|
```
|
|
|
|
This will create the hans_predictions.txt file in MODEL_PATH, which can then be evaluated using hans/evaluate_heur_output.py from the HANS dataset.
|
|
|
|
The results of the BERT-base model that is trained on MNLI using batch size 8 and the random seed 42 on the HANS dataset is as follows:
|
|
|
|
```bash
|
|
Heuristic entailed results:
|
|
lexical_overlap: 0.9702
|
|
subsequence: 0.9942
|
|
constituent: 0.9962
|
|
|
|
Heuristic non-entailed results:
|
|
lexical_overlap: 0.199
|
|
subsequence: 0.0396
|
|
constituent: 0.118
|
|
```
|