* fix_torch_device_generate_test
* remove @
* upload
* finish dataset streaming
* adapt readme
* finish
* up
* up
* up
* up
* Apply suggestions from code review
* finish
* make style
* make style2
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Validation split added: custom data files
Validation split added in case of no validation file and loading custom data
* Updated documentation with custom file usage
Updated documentation with custom file usage
* Update README.md
* Update README.md
* Update README.md
* Made some suggested stylistic changes
* Used logger instead of print.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Made similar changes to add validation split
In case of a missing validation file, a validation split will be used now.
* max_train_samples to be used for training only
max_train_samples got misplaced, now corrected so that it is applied on training data only, not whole data.
* styled
* changed ordering
* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fixed styling issue
* Update run_mlm.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add philosophy doc
* fix typos
* update doc
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* address Patricks suggestions
* add a training example and fix typos
* jit the training step
* jit train step
* fix example code
* typo
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Tensorflow MLM example
* Add CLM example
* Style fixes, adding missing checkpoint code from the CLM example
* Fix TPU training, avoid massive dataset warnings
* Fix incorrect training length calculation for multi-GPU training
* Fix incorrect training length calculation for multi-GPU training
* Refactors and nitpicks from the review
* Style pass
* Adding README
Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
* [WIP] Model card defaults
* finetuned_from default value
* Add all mappings to the mapping file
* Be more defensive on finetuned_from arg
* Add default task tag
* Separate tags from tasks
* Edge case for dataset
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Use text_column_name variable instead of "text"
`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
* black formatting
* make style
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
* add readme for flax clm
* use section link for tokenizer
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update metrics
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Pushing partially-complete new GLUE example
* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.
* Fix to the fit() call
* Bugfixes, making sure TPU and multi-GPU support is ready
* Remove logger line that depends on Pytorch
* Style pass
* Deleting old TF GLUE example
* Include label2id and id2label in the saved model config
* Don't clobber the existing model.config.label2id
* Style fixes
* Update examples/tensorflow/text-classification/run_glue.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* updated the original RAG implementation to be compatible with the latest PL version
* updated the requirements.txt file
* execute make style
* code quality test
* code quality
* conflix resolved in requirement.txt
* code quality
* changed the MyDDP class name to CustomDDP
* Fix weight decay masking in `run_flax_glue.py`
Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`
* Fix formatting with black
* adapt results
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* initial
* code quality test
* code quality
* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver
* minor change in test_modeling_rag
* fixed tests
* Update examples/research_projects/rag-end2end-retriever/README.md
typo corrected as suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py
type change suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update src/transformers/models/rag/retrieval_rag.py
Adding this change as mentioned by lhoestq.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* completed the minor changes suggested by the reviewers
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`
`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.
* Remove unused 'flax.linen' import
* add separator for windows
* fixes test_is_copy_consistent on Windows
* fixing writing encoding issue on extended test (for Windows)
* resolving comments
* Adds Flax BERT finetuning example
* fix traced jax tensor type
* Use Optax losses and learning schedulers
* Add 1GPU training results
* merge into master & make style
* fix input
* del file
* Fix bug in loss and add torch runs
* finish bert flax fine-tune
* Update examples/flax/text-classification/README.md
* Update examples/flax/text-classification/run_flax_glue.py
* add requirements
* finalize
* finalize
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Autogenerate model cards from the Trainer
* ModelCard deprecated
* Fix test
* Style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
* Quality
* With all metadata
* Metadata
* Post-merge conflict mess
* Data args and all examples
* Default license and languages when possible
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Set generator in dataloader
* Use generator in all random samplers
* Checkpoint all RNG states
* Final version
* Quality
* Test
* Address review comments
* Quality
* Remove debug util
* Add python and numpy RNGs
* Split states in different files in distributed
* Quality
* local_rank for TPUs
* Only use generator when accepted
* Add test
* Set seed to avoid flakiness
* Make test less flaky
* Quality
* add flax roberta
* make style
* correct initialiazation
* modify model to save weights
* fix copied from
* fix copied from
* correct some more code
* add more roberta models
* Apply suggestions from code review
* merge from master
* finish
* finish docs
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
As the error comes from the inconsistency of variable meaning number of gpus in parser and its actual usage in the train.py script, 'gpus' and 'n_gpu' respectively, the correction makes the example work
* Initial support for upload to hub
* push -> upload
* Fixes + examples
* Fix torchhub test
* Torchhub test I hate you
* push_model_to_hub -> push_to_hub
* Apply mixin to other pretrained models
* Remove ABC inheritance
* Add tests
* Typo
* Run tests
* Install git-lfs
* Change approach
* Add push_to_hub to all
* Staging test suite
* Typo
* Maybe like this?
* More deps
* Cache
* Adapt name
* Quality
* MOAR tests
* Put it in testing_utils
* Docs + torchhub last hope
* Styling
* Wrong method
* Typos
* Update src/transformers/file_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address review comments
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>