* Just import torch AdamW instead
* Update docs too
* Make AdamW undocumented
* make fixup
* Add a basic wrapper class
* Add it back to the docs
* Just remove AdamW entirely
* Remove some AdamW references
* Drop AdamW from the public init
* make fix-copies
* Cleanup some references
* make fixup
* Delete lots of transformers.AdamW references
* Remove extra references to adamw_hf
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* remove SharedDDP as it was drepracated
* apply review suggestion
* make style
* Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer.
* remove the unnecessary conditional statement
* keep the logic of IPEX
* clean code
* mix precision setup & make fixup
---------
Co-authored-by: statelesshz <jihuazhong1@huawei.com>
* Split file_utils in several submodules
* Fixes
* Add back more objects
* More fixes
* Who exactly decided to import that from there?
* Second suggestion to code with code review
* Revert wront move
* Fix imports
* Adapt all imports
* Adapt all imports everywhere
* Revert this import, will fix in a separate commit