* Optimized banned token masking
* Avoid duplicate EOS masking if in bad_words_id
* Updated mask generation to handle empty banned token list
* Addition of unit tests for the updated bad_words_ids masking
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test (timeout does not work on Windows)
* Moving Marian import to the test context to allow TF only environments to run
* Moving imports to torch_available test
* Updated operations device and test
* Updated operations device and test
* Added docstring and comment for in-place scores modification
* Moving test to own test_generation_utils, use of lighter models for testing
* removed unneded imports in test_modeling_common
* revert formatting change for ModelTesterMixin
* Updated caching, simplified eos token id test, removed unnecessary @require_torch
* formatting compliance
* Warn if debug requested without TPU fixes (#6308)
Check whether a PyTorch compatible TPU is available before attempting to print TPU metrics after training has completed. This way, users who apply `--debug` without reading the documentation aren't suprised by a stacktrace.
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* add pl_glue example test
* for now just test that it runs, next validate results of eval or predict?
* complete the run_pl_glue test to validate the actual outcome
* worked on my machine, CI gets less accuracy - trying higher epochs
* match run_pl.sh hparms
* more epochs?
* trying higher lr
* for now just test that the script runs to a completion
* correct the comment
* if cuda is available, add --fp16 --gpus=1 to cover more bases
* style
* Chunked feed forward for Bert
This is an initial implementation to test applying feed forward chunking for BERT.
Will need additional modifications based on output and benchmark results.
* Black and cleanup
* Feed forward chunking in BertLayer class.
* Isort
* add chunking for all models
* fix docs
* Fix typo
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* improve names and tests longformer
* more and better tests for longformer
* add first tf test
* finalize tf basic op functions
* fix merge
* tf shape test passes
* narrow down discrepancies
* make longformer local attn tf work
* correct tf longformer
* add first global attn function
* add more global longformer func
* advance tf longformer
* finish global attn
* upload big model
* finish all tests
* correct false any statement
* fix common tests
* make all tests pass except keras save load
* fix some tests
* fix torch test import
* finish tests
* fix test
* fix torch tf tests
* add docs
* finish docs
* Update src/transformers/modeling_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply Lysandres suggestions
* reverse to assert statement because function will fail otherwise
* applying sylvains recommendations
* Update src/transformers/modeling_longformer.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
* Add a script to check all models are tested and documented
* Apply suggestions from code review
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Address comments
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>