* Adding warning messages to BERT for missing attention masks
These warning messages when there are pad tokens within the input ids and
no attention masks are given. The warning message should only show up once.
* Adding warning messages to BERT for missing attention masks
These warning messages are shown when the pad_token_id is not None
and no attention masks are given. The warning message should only
show up once.
* Ran fix copies to copy over the changes to some of the other models
* Add logger.warning_once.cache_clear() to the test
* Shows warning when there are no attention masks and input_ids start/end with pad tokens
* Using warning_once() instead and fix indexing in input_ids check
---------
Co-authored-by: JB Lau <hckyn@voyager2.local>
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* update relative positional embedding
* make fix copies
* add `use_cache` to list of arguments
* fixup
* 1line fucntion
* add `test_decoder_model_past_with_large_inputs_relative_pos_emb`
* add relative pos embedding test for more models
* style
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object