* Make training args fully immutable
* Working tests, PyTorch
* In test_trainer
* during testing
* Use proper dataclass way
* Fix test
* Another one
* Fix tf
* Lingering slow
* Exception
* Clean
* An end to accursed version-specific imports
* No more K.is_keras_tensor() either
* Update dependency tables
* Use a cleaner call context function getter
* Add a cap to <2.14
* Add cap to examples requirements too
* Proposed fix for TF example now running on safetensors.
* Adding more warnings and returning keys.
* Trigger CI
* Trigger CI
---------
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* add: tokenizer training script for TF TPU LM training.
* add: script for preparing the TFRecord shards.
* add: sequence of execution to readme.
* remove limit from the tfrecord shard name.
* Add initial train_model.py
* Add basic training arguments and model init
* Get up to the point of writing the data collator
* Pushing progress so far!
* Complete first draft of model training code
* feat: grouping of texts efficiently.
Co-authored-by: Matt <rocketknight1@gmail.com>
* Add proper masking collator and get training loop working
* fix: things.
* Read sample counts from filenames
* Read sample counts from filenames
* Draft README
* Improve TPU warning
* Use distribute instead of distribute.experimental
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Modularize loading and add MLM probability as arg
* minor refactoring to better use the cli args.
* readme fillup.
* include tpu and inference sections in the readme.
* table of contents.
* parallelize maps.
* polish readme.
* change script name to run_mlm.py
* address PR feedback (round I).
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
[NumPy] Remove references to deprecated NumPy type aliases.
This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.
Co-authored-by: Peter Hawkins <phawkins@google.com>
Co-authored-by: Peter Hawkins <phawkins@google.com>
* Finished QA example
* Dodge a merge conflict
* Update text classification and LM examples
* Update NER example
* New Keras metrics WIP, fix NER example
* Update NER example
* Update MC, summarization and translation examples
* Add XLA warnings when shapes are variable
* Make sure batch_size is consistently scaled by num_replicas
* Add PushToHubCallback to all models
* Add docs links for KerasMetricCallback
* Add docs links for prepare_tf_dataset and jit_compile
* Correct inferred model names
* Don't assume the dataset has 'lang'
* Don't assume the dataset has 'lang'
* Write metrics in text classification
* Add 'framework' to TrainingArguments and TFTrainingArguments
* Export metrics in all examples and add tests
* Fix training args for Flax
* Update command line args for translation test
* make fixup
* Fix accidentally running other tests in fp16
* Remove do_train/do_eval from run_clm.py
* Remove do_train/do_eval from run_mlm.py
* Add tensorflow tests to circleci
* Fix circleci
* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix save path for tests
* Fix some model card kwargs
* Explain the magical -1000
* Actually enable tests this time
* Skip text classification PR until we fix shape inference
* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Delete valohai.yaml
* NLP => ML
* typo
* website supports https
* datasets
* 60k + modalities
* unrelated link fixing for accelerate
* Ok those links were actually broken
* Fix link
* Make `AutoTokenizer` auto-link
* wording tweak
* add at least one non-nlp task
* Migrate metric to Evaluate library in tf examples
Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.
Fix for #18306
* Migrate metric to Evaluate library in tf examples
Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.
Fix for #18306
* Migrate `metric` to Evaluate for all tf examples
Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.