transformers/tests
Thomas Wolf 9aeacb58ba
Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-10-08 11:32:16 +02:00
..
fixtures Albert pretrain datasets/ datacollator (#6168) 2020-09-10 07:56:29 -04:00
__init__.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
conftest.py ignore FutureWarning in tests (#7079) 2020-09-14 07:50:51 -04:00
test_activations_tf.py Refactoring the TF activations functions (#7150) 2020-09-16 07:03:47 -04:00
test_activations.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_benchmark_tf.py [Benchmarks] Change all args to from no_... to their positive form (#7075) 2020-09-23 13:25:24 -04:00
test_benchmark.py [Benchmarks] Change all args to from no_... to their positive form (#7075) 2020-09-23 13:25:24 -04:00
test_cli.py [transformers-cli] fix logger getter (#6777) 2020-08-27 20:01:17 -04:00
test_configuration_auto.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_configuration_common.py Pass kwargs to configuration (#3147) 2020-03-05 17:16:57 -05:00
test_data_collator.py Mark big downloads slow (#7325) 2020-09-22 12:21:52 -04:00
test_doc_samples.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_hf_api.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_hf_argparser.py parse arguments from dict (#4869) 2020-07-31 04:44:23 -04:00
test_logging.py adding TRANSFORMERS_VERBOSITY env var (#6961) 2020-09-09 04:08:01 -04:00
test_model_card.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
test_model_output.py Add tests and fix various bugs in ModelOutput (#7073) 2020-09-11 12:01:33 -04:00
test_modeling_albert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_auto.py Blenderbot (#7418) 2020-10-07 19:09:23 -04:00
test_modeling_bart.py Fix 3 failing slow bart/blender tests (#7652) 2020-10-07 22:05:03 -04:00
test_modeling_bert_generation.py Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. (#6594) 2020-09-10 16:40:51 +02:00
test_modeling_bert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_blenderbot.py Fix 3 failing slow bart/blender tests (#7652) 2020-10-07 22:05:03 -04:00
test_modeling_camembert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_common.py Blenderbot (#7418) 2020-10-07 19:09:23 -04:00
test_modeling_ctrl.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_deberta.py Add DeBERTa model (#5929) 2020-09-30 07:07:30 -04:00
test_modeling_distilbert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_dpr.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_electra.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_encoder_decoder.py clean naming (#7068) 2020-09-11 09:57:53 +02:00
test_modeling_flaubert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_fsmt.py [Seq2Seq] Fix a couple of bugs and clean examples (#7474) 2020-10-01 17:38:50 +02:00
test_modeling_funnel.py Fix FP16 and attention masks in FunnelTransformer (#7374) 2020-09-25 12:20:39 -04:00
test_modeling_gpt2.py Add GPT2ForSequenceClassification based on DialogRPT (#7501) 2020-10-06 17:31:21 -04:00
test_modeling_layoutlm.py Add LayoutLM Model (#7064) 2020-09-22 09:28:02 -04:00
test_modeling_longformer.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_lxmert.py fix (#6946) 2020-09-04 16:08:54 +02:00
test_modeling_marian.py Fix marian slow test (#6854) 2020-08-31 16:10:43 -04:00
test_modeling_mbart.py Blenderbot (#7418) 2020-10-07 19:09:23 -04:00
test_modeling_mobilebert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_openai.py [Generate] Facilitate PyTorch generate using ModelOutputs (#6735) 2020-09-01 12:38:25 +02:00
test_modeling_pegasus.py Enable pegasus fp16 by clamping large activations (#7243) 2020-10-01 04:48:37 -04:00
test_modeling_rag.py [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372) 2020-09-25 16:12:46 +02:00
test_modeling_reformer.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_roberta.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_squeezebert.py SqueezeBERT architecture (#7083) 2020-10-05 04:25:43 -04:00
test_modeling_t5.py [Seq2Seq] Fix a couple of bugs and clean examples (#7474) 2020-10-01 17:38:50 +02:00
test_modeling_tf_albert.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_auto.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_bert.py Custom TF weights loading (#7422) 2020-10-05 09:58:45 -04:00
test_modeling_tf_camembert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_common.py [Seq2Seq] Fix a couple of bugs and clean examples (#7474) 2020-10-01 17:38:50 +02:00
test_modeling_tf_ctrl.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_distilbert.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_tf_electra.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_flaubert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_funnel.py Fix saving TF custom models (#7291) 2020-09-22 09:31:13 -04:00
test_modeling_tf_gpt2.py [Seq2Seq] Fix a couple of bugs and clean examples (#7474) 2020-10-01 17:38:50 +02:00
test_modeling_tf_longformer.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_tf_lxmert.py Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models (#5793) 2020-09-03 04:02:25 -04:00
test_modeling_tf_mobilebert.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_openai.py [Generate] Facilitate PyTorch generate using ModelOutputs (#6735) 2020-09-01 12:38:25 +02:00
test_modeling_tf_roberta.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_t5.py [Seq2Seq] Fix a couple of bugs and clean examples (#7474) 2020-10-01 17:38:50 +02:00
test_modeling_tf_transfo_xl.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_tf_xlm_roberta.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_xlm.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_xlnet.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_transfo_xl.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_xlm_roberta.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_xlm.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_xlnet.py [Seq2Seq] Fix a couple of bugs and clean examples (#7474) 2020-10-01 17:38:50 +02:00
test_onnx.py Fix flaky ONNX tests (#6531) 2020-08-17 09:04:35 -04:00
test_optimization_tf.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_optimization.py Format 2020-08-27 18:31:51 +02:00
test_pipelines.py Mark big downloads slow (#7325) 2020-09-22 12:21:52 -04:00
test_retrieval_rag.py fix rag retriever save pretrained (#7399) 2020-09-25 19:47:12 +02:00
test_skip_decorators.py [testing] skip decorators: docs, tests, bugs (#7334) 2020-09-23 05:16:19 -04:00
test_tokenization_albert.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_auto.py [from_pretrained] Allow tokenizer_type ≠ model_type (#6995) 2020-09-09 04:22:59 -04:00
test_tokenization_bart.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_bert_generation.py Check decorator order (#7326) 2020-09-24 04:54:37 -04:00
test_tokenization_bert_japanese.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_bert.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_bertweet.py Add new pre-trained models BERTweet and PhoBERT (#6129) 2020-09-18 13:16:43 -04:00
test_tokenization_blenderbot.py Blenderbot (#7418) 2020-10-07 19:09:23 -04:00
test_tokenization_camembert.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_common.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_ctrl.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_deberta.py Add DeBERTa model (#5929) 2020-09-30 07:07:30 -04:00
test_tokenization_distilbert.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_dpr.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_fast.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_fsmt.py [ported model] FSMT (FairSeq MachineTranslation) (#6940) 2020-09-17 11:31:29 -04:00
test_tokenization_funnel.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_gpt2.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_layoutlm.py Add LayoutLM Model (#7064) 2020-09-22 09:28:02 -04:00
test_tokenization_lxmert.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_marian.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_mbart.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_openai.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_pegasus.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_phobert.py Add new pre-trained models BERTweet and PhoBERT (#6129) 2020-09-18 13:16:43 -04:00
test_tokenization_rag.py RAG (#6813) 2020-09-22 18:29:58 +02:00
test_tokenization_reformer.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_roberta.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_squeezebert.py SqueezeBERT architecture (#7083) 2020-10-05 04:25:43 -04:00
test_tokenization_t5.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_transfo_xl.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_utils.py Fixes to make life easier with the nlp library (#6423) 2020-08-12 08:00:56 -04:00
test_tokenization_xlm_roberta.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_xlm.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_tokenization_xlnet.py Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) 2020-10-08 11:32:16 +02:00
test_trainer_callback.py Trainer callbacks (#7596) 2020-10-07 10:50:21 -04:00
test_trainer_distributed.py Add tests to Trainer (#6605) 2020-08-20 11:13:50 -04:00
test_trainer.py Expand test to locate flakiness (#7580) 2020-10-05 09:45:47 -04:00
test_utils_check_copies.py Get a better error when check_copies fails (#7457) 2020-09-30 10:05:14 +02:00