* cleanup torch unittests: part 2
* remove trailing comma added by isort, and which breaks flake
* one more comma
* revert odd balls
* part 3: odd cases
* more ["key"] -> .key refactoring
* .numpy() is not needed
* more unncessary .numpy() removed
* more simplification
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* Add new models and fix issues
* Quality improvements
* Add T5
* A bit of cleanup
* Fix for slow tests
* Style
* Add SequenceClassification and MultipleChoice TF models to Electra
* Apply style
* Add summary_proj_to_labels to Electra config
* Finally mirroring the PT version of these models
* Apply style
* Fix Electra test
* Kill model archive maps
* Fixup
* Also kill model_archive_map for MaskedBertPreTrainedModel
* Unhook config_archive_map
* Tokenizers: align with model id changes
* make style && make quality
* Fix CI
There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models
When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.
I'd rather always use the default cache