* Fixing roberta for slow-fast tests
* WIP getting equivalence on pipelines
* slow-to-fast equivalence - working on question-answering pipeline
* optional FAISS tests
* Pipeline Q&A
* Move pipeline tests to their own test job again
* update tokenizer to add sequence id methods
* update to tokenizers 0.9.4
* set sentencepiecce as optional
* clean up squad
* clean up pipelines to use sequence_ids
* style/quality
* wording
* Switch to use_fast = True by default
* update tests for use_fast at True by default
* fix rag tokenizer test
* removing protobuf from required dependencies
* fix NER test for use_fast = True by default
* fixing example tests (Q&A examples use slow tokenizers for now)
* protobuf in main deps extras["sentencepiece"] and example deps
* fix protobug install test
* try to fix seq2seq by switching to slow tokenizers for now
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`
* add a multi-gpu job for all example tests
* run only ported tests
* rename
* explain why env is re-activated on each step
* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me
* style
* Apply suggestions from code review
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add training tests
* correct longformer
* fix docs
* fix some tests
* fix some more train tests
* remove ipdb
* fix multiple edge case model training
* fix funnel and prophetnet
* clean gpt models
* undo renaming of albert
* Add new token classification example
* Remove txt file
* Add test
* With actual testing done
* Less warmup is better
* Update examples/token-classification/run_ner_new.py
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Address review comments
* Fix test
* Make Lysandre happy
* Last touches and rename
* Rename in tests
* Address review comments
* More run_ner -> run_ner_old
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* use decorator
* remove hardcoded paths
* make the test use more data and do real quality tests
* shave off 10 secs
* add --eval_beams 2, reformat
* reduce train size, use smaller custom dataset
* change TokenClassificationTask class methods to static methods
Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.
Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.
* Trigger Build
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added
* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts
* add `pytest --make-reports` to all CIs (and artifacts)
* fix
* Make line by line optional in run_mlm
* Add option to disable dynamic padding
* Add option to plm too and update README
* Typos
* More typos
* Even more typos
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Finish the cleanup of the language-modeling examples
* Update main README
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Propagate changes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Styling
* move the helper code into testing_utils
* port test_trainer_distributed to work with pytest
* improve docs
* simplify notes
* doc
* doc
* style
* doc
* further improvements
* torch might not be available
* real fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* New run_clm script
* Formatting
* More comments
* Remove unused imports
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Address review comments
* Change link to the hub
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>