* No more Tuple, List, Dict
* make fixup
* More style fixes
* Docstring fixes with regex replacement
* Trigger tests
* Redo fixes after rebase
* Fix copies
* [test all]
* update
* [test all]
* update
* [test all]
* make style after rebase
* Patch the hf_argparser test
* Patch the hf_argparser test
* style fixes
* style fixes
* style fixes
* Fix docstrings in Cohere test
* [test all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
1. Fixes evaluation errors popping up when you train/eval on squad v2 (one was newly encountered and one that was previously reported Running SQuAD 1.0 sample command raises IndexError #15401 but not completely fixed).
2. Removes boolean arguments that don't use store_true. Please, don't use these: *ANY non-empty string is being converted to True in this case and this clearly is not the desired behavior (and it creates a LOT of confusion).
3. All no-trainer test scripts are now saving metric values in the same way (with the right prefix eval_), which is consistent with the trainer-based versions.
4. Adds forgotten model.eval() in the no-trainer versions. This improved some results, but not everything (see the discussion in the end). Please, see the F1 scores and the discussion below.
* #12789 Replace assert statements with exceptions
* fix-copies: made copy changes to utils_qa.py in examples/pytorch/question-answering and examples/tensorflow/question-answering
* minor refactor for clarity