transformers/tests
Ola Piktus c754c41c61
RAG (#6813)
* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867

* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name

Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-09-22 18:29:58 +02:00
..
fixtures Albert pretrain datasets/ datacollator (#6168) 2020-09-10 07:56:29 -04:00
__init__.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
conftest.py ignore FutureWarning in tests (#7079) 2020-09-14 07:50:51 -04:00
test_activations_tf.py Refactoring the TF activations functions (#7150) 2020-09-16 07:03:47 -04:00
test_activations.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_benchmark_tf.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_benchmark.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_cli.py [transformers-cli] fix logger getter (#6777) 2020-08-27 20:01:17 -04:00
test_configuration_auto.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_configuration_common.py Pass kwargs to configuration (#3147) 2020-03-05 17:16:57 -05:00
test_data_collator.py Mark big downloads slow (#7325) 2020-09-22 12:21:52 -04:00
test_doc_samples.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_hf_api.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_hf_argparser.py parse arguments from dict (#4869) 2020-07-31 04:44:23 -04:00
test_logging.py adding TRANSFORMERS_VERBOSITY env var (#6961) 2020-09-09 04:08:01 -04:00
test_model_card.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
test_model_output.py Add tests and fix various bugs in ModelOutput (#7073) 2020-09-11 12:01:33 -04:00
test_modeling_albert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_auto.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_bart.py prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) 2020-08-28 11:15:17 -04:00
test_modeling_bert_generation.py Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. (#6594) 2020-09-10 16:40:51 +02:00
test_modeling_bert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_camembert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_common.py Funnel transformer (#6908) 2020-09-08 08:08:08 -04:00
test_modeling_ctrl.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_distilbert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_dpr.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_electra.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_encoder_decoder.py clean naming (#7068) 2020-09-11 09:57:53 +02:00
test_modeling_flaubert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_fsmt.py [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test (#7224) 2020-09-21 09:13:35 -04:00
test_modeling_funnel.py Funnel transformer (#6908) 2020-09-08 08:08:08 -04:00
test_modeling_gpt2.py [Generate] Facilitate PyTorch generate using ModelOutputs (#6735) 2020-09-01 12:38:25 +02:00
test_modeling_layoutlm.py Add LayoutLM Model (#7064) 2020-09-22 09:28:02 -04:00
test_modeling_longformer.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_lxmert.py fix (#6946) 2020-09-04 16:08:54 +02:00
test_modeling_marian.py Fix marian slow test (#6854) 2020-08-31 16:10:43 -04:00
test_modeling_mbart.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_mobilebert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_openai.py [Generate] Facilitate PyTorch generate using ModelOutputs (#6735) 2020-09-01 12:38:25 +02:00
test_modeling_pegasus.py Pegasus finetune script: add --adafactor (#6811) 2020-08-29 17:43:32 -04:00
test_modeling_rag.py RAG (#6813) 2020-09-22 18:29:58 +02:00
test_modeling_reformer.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_roberta.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_t5.py RAG (#6813) 2020-09-22 18:29:58 +02:00
test_modeling_tf_albert.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_auto.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_bert.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_camembert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_common.py Fix saving TF custom models (#7291) 2020-09-22 09:31:13 -04:00
test_modeling_tf_ctrl.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_distilbert.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_tf_electra.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_flaubert.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_funnel.py Fix saving TF custom models (#7291) 2020-09-22 09:31:13 -04:00
test_modeling_tf_gpt2.py [Generate] Facilitate PyTorch generate using ModelOutputs (#6735) 2020-09-01 12:38:25 +02:00
test_modeling_tf_longformer.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_tf_lxmert.py Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models (#5793) 2020-09-03 04:02:25 -04:00
test_modeling_tf_mobilebert.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_openai.py [Generate] Facilitate PyTorch generate using ModelOutputs (#6735) 2020-09-01 12:38:25 +02:00
test_modeling_tf_roberta.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_t5.py [Generate] Facilitate PyTorch generate using ModelOutputs (#6735) 2020-09-01 12:38:25 +02:00
test_modeling_tf_transfo_xl.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_tf_xlm_roberta.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_tf_xlm.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_tf_xlnet.py test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
test_modeling_transfo_xl.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_xlm_roberta.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_modeling_xlm.py Black 20 release 2020-08-26 17:20:22 +02:00
test_modeling_xlnet.py Black 20 release 2020-08-26 17:20:22 +02:00
test_onnx.py Fix flaky ONNX tests (#6531) 2020-08-17 09:04:35 -04:00
test_optimization_tf.py Update repo to isort v5 (#6686) 2020-08-24 11:03:01 -04:00
test_optimization.py Format 2020-08-27 18:31:51 +02:00
test_pipelines.py Mark big downloads slow (#7325) 2020-09-22 12:21:52 -04:00
test_retrieval_rag.py RAG (#6813) 2020-09-22 18:29:58 +02:00
test_tokenization_albert.py [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510) 2020-06-15 17:12:51 -04:00
test_tokenization_auto.py [from_pretrained] Allow tokenizer_type ≠ model_type (#6995) 2020-09-09 04:22:59 -04:00
test_tokenization_bart.py [tests] fix typos in inputs (#6818) 2020-08-30 18:19:57 +08:00
test_tokenization_bert_generation.py clean naming (#7068) 2020-09-11 09:57:53 +02:00
test_tokenization_bert_japanese.py Support additional dictionaries for BERT Japanese tokenizers (#6515) 2020-08-17 12:00:23 +08:00
test_tokenization_bert.py Add strip_accents to basic BertTokenizer. (#6280) 2020-08-06 18:52:28 +08:00
test_tokenization_bertweet.py Add new pre-trained models BERTweet and PhoBERT (#6129) 2020-09-18 13:16:43 -04:00
test_tokenization_common.py is_pretokenized -> is_split_into_words (#7236) 2020-09-22 09:34:35 -04:00
test_tokenization_ctrl.py [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510) 2020-06-15 17:12:51 -04:00
test_tokenization_distilbert.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_tokenization_dpr.py Fix tests imports dpr (#5576) 2020-07-07 16:35:12 +02:00
test_tokenization_fast.py is_pretokenized -> is_split_into_words (#7236) 2020-09-22 09:34:35 -04:00
test_tokenization_fsmt.py [ported model] FSMT (FairSeq MachineTranslation) (#6940) 2020-09-17 11:31:29 -04:00
test_tokenization_funnel.py Funnel transformer (#6908) 2020-09-08 08:08:08 -04:00
test_tokenization_gpt2.py [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510) 2020-06-15 17:12:51 -04:00
test_tokenization_layoutlm.py Add LayoutLM Model (#7064) 2020-09-22 09:28:02 -04:00
test_tokenization_lxmert.py Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models (#5793) 2020-09-03 04:02:25 -04:00
test_tokenization_marian.py rename prepare_translation_batch -> prepare_seq2seq_batch (#6103) 2020-08-11 15:57:07 -04:00
test_tokenization_mbart.py prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) 2020-08-28 11:15:17 -04:00
test_tokenization_openai.py [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510) 2020-06-15 17:12:51 -04:00
test_tokenization_pegasus.py prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) 2020-08-28 11:15:17 -04:00
test_tokenization_phobert.py Add new pre-trained models BERTweet and PhoBERT (#6129) 2020-09-18 13:16:43 -04:00
test_tokenization_rag.py RAG (#6813) 2020-09-22 18:29:58 +02:00
test_tokenization_reformer.py Black 20 release 2020-08-26 17:20:22 +02:00
test_tokenization_roberta.py prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) 2020-08-28 11:15:17 -04:00
test_tokenization_t5.py [T5Tokenizer] remove prefix_tokens (#7078) 2020-09-11 14:18:45 -04:00
test_tokenization_transfo_xl.py Transformer-XL: Improved tokenization with sacremoses (#6322) 2020-08-28 09:56:17 -04:00
test_tokenization_utils.py Fixes to make life easier with the nlp library (#6423) 2020-08-12 08:00:56 -04:00
test_tokenization_xlm_roberta.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_tokenization_xlm.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_tokenization_xlnet.py Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
test_trainer_distributed.py Add tests to Trainer (#6605) 2020-08-20 11:13:50 -04:00
test_trainer.py Mark big downloads slow (#7325) 2020-09-22 12:21:52 -04:00
test_utils_check_copies.py Copy code from Bert to Roberta and add safeguard script (#7219) 2020-09-22 05:02:27 -04:00