transformers/docs/source
Ola Piktus c754c41c61
RAG (#6813)
* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867

* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name

Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-09-22 18:29:58 +02:00
..
_static Documentation version 2020-09-22 18:04:39 +02:00
imgs Guide to fixed-length model perplexity evaluation (#5449) 2020-07-07 16:04:15 -06:00
internal Doc pipelines (#6175) 2020-08-03 11:44:46 -04:00
main_classes Compute loss method (#7074) 2020-09-11 12:06:31 -04:00
model_doc RAG (#6813) 2020-09-22 18:29:58 +02:00
benchmarks.rst Small docfile fixes (#6328) 2020-08-10 05:37:12 -04:00
bertology.rst [doc] Fix broken links + remove crazy big notebook 2020-05-07 18:44:18 -04:00
conf.py Release: v3.2.0 2020-09-22 17:36:51 +02:00
contributing.md Update installation page and add contributing to the doc (#5084) 2020-06-17 14:01:10 -04:00
converting_tensorflow_models.rst Add ALBERT to the Tensorflow to Pytorch model conversion cli (#3933) 2020-05-11 13:10:00 -04:00
custom_datasets.rst is_pretokenized -> is_split_into_words (#7236) 2020-09-22 09:34:35 -04:00
examples.md per_device instead of per_gpu/error thrown when argument unknown (#4618) 2020-05-27 11:36:55 -04:00
favicon.ico Adding usage examples for common tasks (#2850) 2020-02-25 13:48:24 -05:00
glossary.rst minor docs grammar fixes (#6889) 2020-09-02 06:45:19 -04:00
index.rst RAG (#6813) 2020-09-22 18:29:58 +02:00
installation.md [doc] fix invalid env vars (#6504) 2020-08-17 11:11:40 +08:00
migration.md Add hugs (#5225) 2020-06-24 07:56:14 -04:00
model_sharing.rst add -y to bypass prompt for transformers-cli upload (#7035) 2020-09-10 04:58:29 -04:00
model_summary.rst RAG (#6813) 2020-09-22 18:29:58 +02:00
multilingual.rst Refactor Code samples; Test code samples (#5036) 2020-06-25 16:46:00 -04:00
notebooks.md Update notebooks (#3620) 2020-04-06 14:32:39 -04:00
perplexity.rst tiny ppl doc typo fix (#5751) 2020-07-14 10:39:44 -06:00
philosophy.rst typos (#6505) 2020-08-17 10:57:36 +08:00
preprocessing.rst is_pretokenized -> is_split_into_words (#7236) 2020-09-22 09:34:35 -04:00
pretrained_models.rst Add LayoutLM Model (#7064) 2020-09-22 09:28:02 -04:00
quicktour.rst minor docs grammar fixes (#6889) 2020-09-02 06:45:19 -04:00
serialization.rst fix torchscript docs (#6740) 2020-08-26 04:51:56 -04:00
task_summary.rst replace _ with __ rst links (#6541) 2020-08-17 12:27:02 -04:00
testing.rst @slow has to be last (#7251) 2020-09-20 09:17:29 -04:00
tokenizer_summary.rst [docs] Fix number of 'ug' occurrences in tokenizer_summary (#6574) 2020-08-18 10:23:25 -04:00
training.rst Import accuracy_score (#6480) 2020-08-14 08:16:16 -04:00