transformers/examples/research_projects
NielsRogge 4ef0abb738
Add TAPEX (#16473)
* Add TapexTokenizer

* Improve docstrings and provide option to provide answer

* Remove option for pretokenized inputs

* Add TAPEX to README

* Fix copies

* Remove option for pretokenized inputs

* Initial commit: add tapex fine-tuning examples on both table-based question answering and table-based fact verification.

* - Draft a README file for running the script and introducing some background.
- Remove unused code lines in tabfact script.
- Disable the deafult `pad_to_max_length` option which is memory-consuming.

* * Support `as_target_tokenizer` function for TapexTokenizer.
* Fix the do_lower_case behaviour of TapexTokenizer.
* Add unit tests for target scenarios and cased/uncased scenarios for both source and target.

* * Replace the label BartTokenizer with TapexTokenizer's as_target_tokenizer function.
* Fix typos in tapex example README.

* * fix the evaluation script - remove the property `task_name`

* * Make the label space more clear for tabfact tasks

* * Using a new fine-tuning script for tapex-base on tabfact.

* * Remove the lowercase code outside the tokenizer - we use the tokenizer to control whether do_lower_case
* Guarantee the hyper-parameter can be run without out-of-memory on 16GB card and report the new reproduced number on wikisql

* * Remove the default tokenizer_name option.
* Provide evaluation command.

* * Support for WikiTableQuestion dataset.

* Fix a typo in README.

* * Fix the datasets's key name in WikiTableQuestions

* Run make fixup and move test to folder

* Fix quality

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply some more suggestions from code review

* Improve docstrings

* Overwrite failing test

* Improve comment in example scripts

* Fix rebase

* Add TAPEX to Auto mapping

* Add TAPEX to auto config mappings

* Put TAPEX higher than BART in auto mapping

* Add TAPEX to doc tests

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
Co-authored-by: SivilTaram <qianlxc@outlook.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-04-08 10:57:51 +02:00
..
adversarial Update namespaces inside torch.utils.data to the latest. (#13167) 2021-08-19 14:29:51 +02:00
bert-loses-patience Fix tiny typo (#15884) 2022-03-02 15:37:05 +01:00
bertabs make style (#11442) 2021-04-26 13:50:34 +02:00
bertology [style] consistent nn. and nn.functional: part 4 examples (#12156) 2021-06-14 12:28:24 -07:00
codeparrot Update readme with how to train offline and fix BPE command (#15897) 2022-03-24 11:00:46 +01:00
decision_transformer Decision transformer gym (#15845) 2022-03-23 16:18:43 -04:00
deebert remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
distillation Fix minor comment typos (#15740) 2022-02-21 12:41:27 +01:00
fsner Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
jax-projects [examples] max samples can't be bigger than the len of dataset (#16501) 2022-03-30 12:33:16 -07:00
longform-qa [style] consistent nn. and nn.functional: part 4 examples (#12156) 2021-06-14 12:28:24 -07:00
luke Add (M)Luke model training for Token Classification in the examples (#14880) 2022-01-31 07:58:18 -05:00
lxmert Upgrade black to version ~=22.0 (#15565) 2022-02-09 09:28:57 -05:00
mlm_wwm [urls to hub] Replace outdated model tags with their now-canonical pipeline types (#14617) 2021-12-06 04:35:01 -05:00
mm-imdb Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
movement-pruning Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
onnx/summarization Upgrade black to version ~=22.0 (#15565) 2022-02-09 09:28:57 -05:00
performer [urls to hub] Replace outdated model tags with their now-canonical pipeline types (#14617) 2021-12-06 04:35:01 -05:00
pplm [research_projects] deal with security alerts (#15594) 2022-02-11 14:31:09 -05:00
quantization-qdqbert [examples] max samples can't be bigger than the len of dataset (#16501) 2022-03-30 12:33:16 -07:00
rag Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
rag-end2end-retriever Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
robust-speech-event Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
seq2seq-distillation Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
tapex Add TAPEX (#16473) 2022-04-08 10:57:51 +02:00
visual_bert Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
wav2vec2 [examples] max samples can't be bigger than the len of dataset (#16501) 2022-03-30 12:33:16 -07:00
xtreme-s [research] link to the XTREME-S paper (#16519) 2022-03-31 23:26:50 +04:00
zero-shot-distillation Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
README.md Reorganize examples (#9010) 2020-12-11 10:07:02 -05:00

Research projects

This folder contains various research projects using 🤗 Transformers. They are not maintained and require a specific version of 🤗 Transformers that is indicated in the requirements file of each folder. Updating them to the most recent version of the library will require some work.

To use any of them, just run the command

pip install -r requirements.txt

inside the folder of your choice.

If you need help with any of those, contact the author(s), indicated at the top of the README of each folder.