* Port core files + ESM (because ESM code is odd)
* Search-replace in modelling code
* Fix up transfo_xl as well
* Fix other core files + tests (still need to add correct import to tests)
* Fix cookiecutter
* make fixup, fix imports in some more core files
* Auto-add imports to tests
* Cleanup, add imports to sagemaker tests
* Use correct exception for importing tf_keras
* Fixes in modeling_tf_utils
* make fixup
* Correct version parsing code
* Ensure the pipeline tests correctly revert to float32 after each test
* Ensure the pipeline tests correctly revert to float32 after each test
* More tf.keras -> keras
* Add dtype cast
* Better imports of tf_keras
* Add a cast for tf.assign, just in case
* Fix callback imports
* add: tokenizer training script for TF TPU LM training.
* add: script for preparing the TFRecord shards.
* add: sequence of execution to readme.
* remove limit from the tfrecord shard name.
* Add initial train_model.py
* Add basic training arguments and model init
* Get up to the point of writing the data collator
* Pushing progress so far!
* Complete first draft of model training code
* feat: grouping of texts efficiently.
Co-authored-by: Matt <rocketknight1@gmail.com>
* Add proper masking collator and get training loop working
* fix: things.
* Read sample counts from filenames
* Read sample counts from filenames
* Draft README
* Improve TPU warning
* Use distribute instead of distribute.experimental
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Modularize loading and add MLM probability as arg
* minor refactoring to better use the cli args.
* readme fillup.
* include tpu and inference sections in the readme.
* table of contents.
* parallelize maps.
* polish readme.
* change script name to run_mlm.py
* address PR feedback (round I).
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>