Commit Graph

97 Commits

Author SHA1 Message Date
Patrick von Platen
dca34695d0
Reformer (#3351)
* first copy & past commit from Bert and morgans LSH code

* add easy way to compare to trax original code

* translate most of function

* make trax lsh self attention deterministic with numpy seed + copy paste code

* add same config

* add same config

* make layer init work

* implemented hash_vectors function for lsh attention

* continue reformer translation

* hf LSHSelfAttentionLayer gives same output as trax layer

* refactor code

* refactor code

* refactor code

* refactor

* refactor + add reformer config

* delete bogus file

* split reformer attention layer into two layers

* save intermediate step

* save intermediate step

* make test work

* add complete reformer block layer

* finish reformer layer

* implement causal and self mask

* clean reformer test and refactor code

* fix merge conflicts

* fix merge conflicts

* update init

* fix device for GPU

* fix chunk length init for tests

* include morgans optimization

* improve memory a bit

* improve comment

* factorize num_buckets

* better testing parameters

* make whole model work

* make lm model work

* add t5 copy paste tokenizer

* add chunking feed forward

* clean config

* add improved assert statements

* make tokenizer work

* improve test

* correct typo

* extend config

* add complexer test

* add new axial position embeddings

* add local block attention layer

* clean tests

* refactor

* better testing

* save intermediate progress

* clean test file

* make shorter input length work for model

* allow variable input length

* refactor

* make forward pass for pretrained model work

* add generation possibility

* finish dropout and init

* make style

* refactor

* add first version of RevNet Layers

* make forward pass work and add convert file

* make uploaded model forward pass work

* make uploaded model forward pass work

* refactor code

* add namedtuples and cache buckets

* correct head masks

* refactor

* made reformer more flexible

* make style

* remove set max length

* add attention masks

* fix up tests

* fix lsh attention mask

* make random seed optional for the moment

* improve memory in reformer

* add tests

* make style

* make sure masks work correctly

* detach gradients

* save intermediate

* correct backprob through gather

* make style

* change back num hashes

* rename to labels

* fix rotation shape

* fix detach

* update

* fix trainer

* fix backward dropout

* make reformer more flexible

* fix conflict

* fix

* fix

* add tests for fixed seed in reformer layer

* fix trainer typo

* fix typo in activations

* add fp16 tests

* add fp16 training

* support fp16

* correct gradient bug in reformer

* add fast gelu

* re-add dropout for embedding dropout

* better naming

* better naming

* renaming

* finalize test branch

* finalize tests

* add more tests

* finish tests

* fix

* fix type trainer

* fix fp16 tests

* fix tests

* fix tests

* fix tests

* fix issue with dropout

* fix dropout seeds

* correct random seed on gpu

* finalize random seed for dropout

* finalize random seed for dropout

* remove duplicate line

* correct half precision bug

* make style

* refactor

* refactor

* docstring

* remove sinusoidal position encodings for reformer

* move chunking to modeling_utils

* make style

* clean config

* make style

* fix tests

* fix auto tests

* pretrained models

* fix docstring

* update conversion file

* Update pretrained_models.rst

* fix rst

* fix rst

* update copyright

* fix test path

* fix test path

* fix small issue in test

* include reformer in generation tests

* add docs for axial position encoding

* finish docs

* Update convert_reformer_trax_checkpoint_to_pytorch.py

* remove isort

* include sams comments

* remove wrong comment in utils

* correct typos

* fix typo

* Update reformer.rst

* applied morgans optimization

* make style

* make gpu compatible

* remove bogus file

* big test refactor

* add example for chunking

* fix typo

* add to README
2020-05-07 10:17:01 +02:00
Patrick von Platen
fa49b9afea
Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (#3383)
* change encoder decoder style to bart & t5 style

* make encoder decoder generation dummy work for bert

* make style

* clean init config in encoder decoder

* add tests for encoder decoder models

* refactor and add last tests

* refactor and add last tests

* fix attn masks for bert encoder decoder

* make style

* refactor prepare inputs for Bert

* refactor

* finish encoder decoder

* correct typo

* add docstring to config

* finish

* add tests

* better naming

* make style

* fix flake8

* clean docstring

* make style

* rename
2020-04-28 15:11:09 +02:00
Patrick von Platen
d22894dfd4
[Docs] Add DialoGPT (#3755)
* add dialoGPT

* update README.md

* fix conflict

* update readme

* add code links to docs

* Update README.md

* Update dialo_gpt2.rst

* Update pretrained_models.rst

* Update docs/source/model_doc/dialo_gpt2.rst

Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* change filename of dialogpt

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-16 09:04:32 +02:00
Lysandre Debut
d5d7d88612
ELECTRA (#3257)
* Electra wip

* helpers

* Electra wip

* Electra v1

* ELECTRA may be saved/loaded

* Generator & Discriminator

* Embedding size instead of halving the hidden size

* ELECTRA Tokenizer

* Revert BERT helpers

* ELECTRA Conversion script

* Archive maps

* PyTorch tests

* Start fixing tests

* Tests pass

* Same configuration for both models

* Compatible with base + large

* Simplification + weight tying

* Archives

* Auto + Renaming to standard names

* ELECTRA is uncased

* Tests

* Slight API changes

* Update tests

* wip

* ElectraForTokenClassification

* temp

* Simpler arch + tests

Removed ElectraForPreTraining which will be in a script

* Conversion script

* Auto model

* Update links to S3

* Split ElectraForPreTraining and ElectraForTokenClassification

* Actually test PreTraining model

* Remove num_labels from configuration

* wip

* wip

* From discriminator and generator to electra

* Slight API changes

* Better naming

* TensorFlow ELECTRA tests

* Accurate conversion script

* Added to conversion script

* Fast ELECTRA tokenizer

* Style

* Add ELECTRA to README

* Modeling Pytorch Doc + Real style

* TF Docs

* Docs

* Correct links

* Correct model intialized

* random fixes

* style

* Addressing Patrick's and Sam's comments

* Correct links in docs
2020-04-03 14:10:54 -04:00
Patrick von Platen
fa9af2468a
Add T5 to docs (#3461)
* add t5 docs basis

* improve docs

* add t5 docs

* improve t5 docstring

* add t5 tokenizer docstring

* finish docstring

* make style

* add pretrained models

* correct typo

* make examples work

* finalize docs
2020-03-27 10:57:16 -04:00
Lysandre Debut
d3eb7d23a4
Pipeline doc (#3055)
* Pipeline doc initial commit

* pipeline abstraction

* Remove modelcard argument from pipeline

* Task-specific pipelines can be instantiated with no model or tokenizer

* All pipelines doc
2020-03-02 14:07:10 -05:00
Lysandre Debut
65e7c90a77
Adding usage examples for common tasks (#2850)
* Usage: Sequence Classification & Question Answering

* Pipeline example

* Language modeling

* TensorFlow code for Sequence classification

* Custom TF/PT toggler in docs

* QA + LM for TensorFlow

* Finish Usage for both PyTorch and TensorFlow

* Addressing Julien's comments

* More assertive

* cleanup

* Favicon
- added favicon option in conf.py along with the favicon image
- udpated 🤗 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth)

Co-authored-by: joshchagani <joshua@joshuachagani.com>
2020-02-25 13:48:24 -05:00
Sam Shleifer
53ce3854a1
New BartModel (#2745)
* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs
2020-02-20 18:11:13 -05:00
Hang Le
b43cb09aaa Add layerdrop 2020-01-30 12:05:01 -05:00
Lysandre
73306d028b FlauBERT documentation 2020-01-30 10:04:18 -05:00
Lysandre
980211a63a XLM-RoBERTa 2020-01-23 09:38:45 -05:00
Lysandre
9bab9b83d2 Glossary 2020-01-23 09:38:45 -05:00
alberduris
81d6841b4b GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
alberduris
dd4df80f0b Moved the encoded_prompts to correct device 2020-01-06 15:11:12 +01:00
Stefan Schweter
f09d999641 docs: fix numbering 😅 2019-12-18 19:49:33 +01:00
Stefan Schweter
d35405b7a3 docs: add XLM-RoBERTa to index page 2019-12-18 19:45:10 +01:00
Julien Chaumond
855ff0e91d [doc] Model upload and sharing
ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?
2019-12-16 12:42:22 -05:00
Pierric Cistac
5c877fe94a
fix albert links 2019-12-09 18:53:00 -05:00
Lysandre
ee4647bd5c CamemBERT & ALBERT doc 2019-11-26 15:10:51 -05:00
LysandreJik
82f6abd98a Benchmark section added to the documentation 2019-10-18 17:27:10 -04:00
LysandreJik
7fe98d8c18 Update CTRL documentation 2019-10-09 12:12:36 -04:00
LysandreJik
8fcc6507ce Multilingual 2019-10-07 15:02:42 -04:00
VictorSanh
e2ae9c0b73 fix links in doc index 2019-10-03 11:42:21 -04:00
LysandreJik
93f0c5fc72 Repository link in the documentation 2019-09-26 11:45:00 -04:00
LysandreJik
de5e4864cb Documentation 2019-09-26 08:04:54 -04:00
LysandreJik
c4ac7a76db GLUE processors 2019-09-26 07:45:40 -04:00
LysandreJik
cf5c5c9e1c Documentation 2019-09-26 07:43:13 -04:00
thomwolf
31c23bd5ee [BIG] pytorch-transformers => transformers 2019-09-26 10:15:53 +02:00
LysandreJik
09363f2a8b Fix documentation index 2019-08-30 19:48:32 -04:00
LysandreJik
e0caab0cf0 fix link 2019-08-30 10:09:17 -04:00
LysandreJik
a600b30cc3 Fix index number in documentation 2019-08-30 10:08:14 -04:00
LysandreJik
20c06fa37d Added DistilBERT to documentation index 2019-08-30 10:06:51 -04:00
LysandreJik
1dc43e56c9 Documentation additions 2019-08-28 09:37:27 -04:00
LysandreJik
572dcfd1db Doc 2019-08-14 14:56:14 -04:00
thomwolf
13936a9621 update doc and tests 2019-08-05 18:48:16 +02:00
thomwolf
00132b7a7a updating docs - adding few tests to tokenizers 2019-08-04 22:42:55 +02:00
thomwolf
009273dbdd big doc update [WIP] 2019-08-04 12:14:57 +02:00
thomwolf
43e0e8fa04 updates to readme and doc 2019-07-16 13:56:47 +02:00
thomwolf
2397f958f9 updating examples and doc 2019-07-14 23:20:10 +02:00
LysandreJik
c82b74b996 Fixed Sphinx errors and warnings 2019-07-10 15:30:19 -04:00
LysandreJik
f773faa258 Fixed all links. Removed TPU. Changed CLI to Converting TF models. Many minor formatting adjustments. Added "TODO Lysandre filled" where necessary. 2019-07-10 14:45:56 -04:00
LysandreJik
83fb311ef7 Patched warnings + Refactored XLNet's Docstrings 2019-07-09 16:38:30 -04:00
LysandreJik
269e73b601 Adding example detailing how to add a new file to the documentation + adding fonts. 2019-07-09 10:11:29 -04:00
LysandreJik
6847e30e1c New page detailing the use of TorchScript. 2019-07-08 17:34:24 -04:00
LysandreJik
ab30651802 Hugging Face theme. 2019-07-08 16:05:26 -04:00
LysandreJik
df759114c9 Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 17:35:26 -04:00
LysandreJik
03de9686a7 Initial folder structure for the documentation. A draft of documentation change has been made in the BertModel class. 2019-07-05 17:11:13 -04:00