* offline mode start
* add specific values
* fix fallback
* add test
* better values check and range
* test that actually works
* document the offline mode
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* more strict check
* cleaner test
* pt-only test
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix run_seq2seq.py; porting DeepSpeed tests to it
* unrefactor
* defensive programming
* defensive programming 2
* port the rest of the trainer tests
* style
* a cleaner scripts dir finder
* cleanup
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric
* fix
* fix
* fix
* push
* fix
* everything works
* fix init
* fix
* special treatment for sepconv1d
* style
* 🙏🏽
* add doc and cleanup
* fix doc
* fix doc again
* fix doc again
* Apply suggestions from code review
* make style
* Proposal that should work
* Remove needless code
* Fix test
* Apply suggestions from code review
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric
* amend README
* removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README.
* removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset()
* removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README
* change tokenizer requirement
* split line
* Correct typo from list to str
* improve style
* make other function pretty as well
* add comment
* correct typo
* add new test
* pass tests for tok without padding token
* Apply suggestions from code review
* MOD: fit chinese wwm to new datasets
* MOD: move wwm to new folder
* MOD: formate code
* Styling
* MOD add param and recover trainer
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Pad to 8x for fp16 multiple choice example (#9752)
* Pad to 8x for fp16 squad trainer example (#9752)
* Pad to 8x for fp16 ner example (#9752)
* Pad to 8x for fp16 swag example (#9752)
* Pad to 8x for fp16 qa beam search example (#9752)
* Pad to 8x for fp16 qa example (#9752)
* Pad to 8x for fp16 seq2seq example (#9752)
* Pad to 8x for fp16 glue example (#9752)
* Pad to 8x for fp16 new ner example (#9752)
* update script template #9752
* Update examples/multiple-choice/run_swag.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa_beam_search.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve code quality #9752
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Auto-resume training from checkpoint
* Update examples/text-classification/run_glue.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Roll out to other examples
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Switch metrics in run_ner to datasets
* Add flag to return all metrics
* Upstream (and rename) sortish_sampler
* Revert "Upstream (and rename) sortish_sampler"
This reverts commit e07d0dcf65.
* Update run_glue for do_predict with local test data (#9442)
* Update run_glue (#9442): fix comments ('files' to 'a file')
* Update run_glue (#9442): reflect the code review
* Update run_glue (#9442): auto format
* Update run_glue (#9442): reflect the code review
* fix a bug in eval_batch_retrieval
* should return parser as well as other staticmethod
* remove duplicate argument
* these kwargs are no longer accepted (cause TypeError in self.generator.generate of modeling_rag.py)
* fixed file paths in README
* moved an arg to add_ray_specific_args
* Add label smoothing in Trainer
* Add options for scheduler and Adafactor in Trainer
* Put Seq2SeqTrainer in the main lib
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments and adapt scripts
* Documentation
* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update the README of the text classification example
* Update examples/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adapt comment from review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add new run_swag example
* Add check
* Add sample
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Very important change to make Lysandre happy
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* trainer and finetune_trainer enhancements and fixes
* add fallback default
* move the fixing of incorrect keys back into finetune trainer
* s/eval/val/ to match the split
* trainer can now use a different prefix than eval_ for metrics
* document new arg
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* use 'eval' as the default for metric_key_prefix
* complete adjust var names + disambiguate
* fix logger
* add clarifying comment
* add clarifying comment
* style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/trainer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* complete removal of optional for metric_key_prefix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Reorganize example folder
* Continue reorganization
* Change requirements for tests
* Final cleanup
* Finish regroup with tests all passing
* Copyright
* Requirements and readme
* Make a full link for the documentation
* Address review comments
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add symlink
* Reorg again
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Adapt title
* Update to new strucutre
* Remove test
* Update READMEs
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Remove "Model" suffix from Flax models to look more 🤗
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Initial working (forward + backward) for Flax MLM training example.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Simply code
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing comments, using module and moving to LM task.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore parameter name "module" wrongly renamed model.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore correct output ordering...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Actually commit the example 😅
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add FlaxBertModelForMaskedLM after rebasing.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make it possible to initialize the training from scratch
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Reuse flax linen example of cross entropy loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added specific data collator for flax
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove todo for data collator
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added evaluation step
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added ability to provide dtype to support bfloat16 on TPU
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable flax tensorboard output
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable jax.pmap support.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Ensure batches are correctly sized to be dispatched with jax.pmap
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable bfloat16 with --fp16 cmdline args
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correctly export metrics to tensorboard
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added dropout and ability to use it.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Effectively enable & disable during training and evaluation steps.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Oops.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable specifying kernel initializer scale
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added warmup step to the learning rate scheduler.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix typo.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Print training loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix linter issue (flake8)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix model matching
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix dummies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix non default dtype on Flax models
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use the same create_position_ids_from_input_ids for FlaxRoberta
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make Roberta attention as Bert
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix copy
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Wording.
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* Add new SQUAD example
* Same with a task-specific Trainer
* Address review comment.
* Small fixes
* Initial work for XLNet
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Final clean up and working XLNet script
* Test and debug
* Final working version
* Add new SQUAD example
* Same with a task-specific Trainer
* Address review comment.
* Small fixes
* Initial work for XLNet
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Final clean up and working XLNet script
* Test and debug
* Final working version
* Add tick
* Update README
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Without this fix, training a `BARTForSequenceClassification` model with `run_pl_glue.py` gives `TypeError: forward() got an unexpected keyword argument 'token_type_ids'`, because BART does not have token_type_ids. I've solved this issue in the same way as it's solved for the "distilbert" model, and I can train BART models on SNLI without errors now.
* fix DP case on multi-gpu
* make executable
* test all 3 modes
* use the correct check for distributed
* dp doesn't need a special case
* restore original name
* cleanup
* implement support for run-time dependency version checking
* try not escaping !
* use findall that works on py36
* small tweaks
* autoformatter worship
* simplify
* shorter names
* add support for non-versioned checks
* add deps
* revert
* tokenizers not required, check version only if installed
* make a proper distutils cmd and add make target
* tqdm must be checked before tokenizers
* workaround the DistributionNotFound peculiar setup
* handle the rest of packages in setup.py
* fully sync setup.py's install_requires - to check them all
* nit
* make install_requires more readable
* typo
* Update setup.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* restyle
* add types
* simplify
* simplify2
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Support BERT relative position embeddings
* Fix typo in README.md
* Address review comment
* Fix failing tests
* [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py
* make fix copies
* fix configs of electra and albert and fix longformer
* remove copy statement from longformer
* fix albert
* fix electra
* Add bert variants forward tests for various position embeddings
* [tiny] Fix style for test_modeling_bert.py
* improve docstring
* [tiny] improve docstring and remove unnecessary dependency
* [tiny] Remove unused import
* re-add to ALBERT
* make embeddings work for ALBERT
* add test for albert
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* replace init_ddp_connection for index init
* style
* add finetune test
* add test data
* move generate tensors to device
* add test on EM metric
* style
* allow multi process test
* keep gloo process group for retrieval
* add multi-gpu test
* use custom accelerator
* clean test finetune
* minor
* style
* style
* typo
* use python call instead of imported main fumction
* return_dict fix in modeling_rag
* use float32 in retrieval
* store as float32 as well in the custom knowledge dataset example
* style
* rename to finetune_rag
* style
* update readme
* rename utils and callbacks to utils_rag and callbacks_rag
* fix test
* patrick's comments
* generate dummy data in the finetue test script
* remove dummy data files
* style