Commit Graph

5342 Commits

Author SHA1 Message Date
Teven
9e9a1fb8c7
Adding gradient checkpointing to GPT2 (#7446)
* GPT2 gradient checkpointing

* find_unused_parameters removed if checkpointing

* find_unused_parameters removed if checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Added a test for generation with checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-29 12:26:26 -04:00
Sylvain Gugger
52e8392b7e
Add automatic best model loading to Trainer (#7431)
* Add automatic best model loading to Trainer

* Some small fixes

* Formatting
2020-09-29 10:41:18 -04:00
Sylvain Gugger
1fc4de69ed
Document new features of make fixup (#7434) 2020-09-29 03:56:57 -04:00
GmailB
205bf0b7ea
Update README.md (#7444)
Hi, just corrected the example code, add 2 links and fixed some typos
2020-09-29 03:18:01 -04:00
Sam Shleifer
74d8d69bd4
[s2s] consistent output format across eval scripts (#7435) 2020-09-28 23:20:03 -04:00
Typicasoft
671b278e25
Create README.md (#7436)
* Create README.md

MagBERT-NER : Added widget (Text)

* Rename model_cards/README.md to model_cards/TypicaAI/magbert-ner/README.md
2020-09-28 18:25:25 -04:00
Manuel Romero
a1a8ffa512
Update README.md (#7429)
Add links to models fine-tuned on a downstream task
2020-09-28 13:40:09 -04:00
Stas Bekman
f62f2ffdcc
[makefile] 10x speed up checking/fixing (#7403)
* [makefile] check/fix only modified since branching files

* fix phonies

* parametrize dirs

* have only one source for dirs to check

* look ma, no autoformatters here
2020-09-28 10:45:42 -04:00
Lysandre
16c213820e Update docs to version v3.3.0 2020-09-28 16:32:00 +02:00
Lysandre
0613f05226 Release: v3.3.0 2020-09-28 16:24:43 +02:00
Sylvain Gugger
ca3fc36de3
Reorganize documentation navbar (#7423)
* Reorganize documentation navbar

* Update css to have clear sections
2020-09-28 16:22:58 +02:00
Lysandre Debut
7f4115c099
Pull request template (#7392)
co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-09-28 09:51:49 -04:00
Sylvain Gugger
0611eab5e3
Document RAG again (#7377)
Do not merge before Monday
2020-09-28 08:31:46 -04:00
Sylvain Gugger
7563d5a3cf
Catch PyTorch warning when saving/loading scheduler (#7401) 2020-09-28 08:20:10 -04:00
Boris Dayma
1749ca317e
docs: fix model sharing file names (#5855)
* docs: fix model sharing file names

* Update docs/source/model_sharing.rst

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* docs(model_sharing.rst): fix new line

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-28 08:17:30 -04:00
Patrick von Platen
8279471506
correct RAG model cards (#7420) 2020-09-28 11:08:39 +02:00
Marcin Zabłocki
4083a55ab0
Flos fix (#7384) 2020-09-28 04:09:26 -04:00
Ola Piktus
ae3e84f3ba
[RAG] Clean Rag readme in examples (#7413)
* Improve README + consolidation script

* Reformat README

* Reformat README

Co-authored-by: Your Name <you@example.com>
2020-09-28 10:06:39 +02:00
Sam Shleifer
748425d47d
[T5] allow config.decoder_layers to control decoder size (#7409)
* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
2020-09-28 03:08:04 -04:00
Sam Shleifer
7296fea1d6
[s2s] rougeLSum expects \n between sentences (#7410)
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
2020-09-27 16:27:19 -04:00
Suraj Patil
eab5f59682
[s2s] add create student script (#7290)
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-27 15:10:46 -04:00
Patrick von Platen
e50a931c11
[Longformer, Bert, Roberta, ...] Fix multi gpu training (#7272)
* fix multi-gpu

* fix longformer

* force to delete unnecessary layers

* fix notifications

* fix warning

* fix roberta

* fix tests

* remove hasattr

* fix tests

* fix roberta

* merge and clean authorized keys
2020-09-25 20:33:21 +02:00
Patrick von Platen
2c8ecdf8a8
fix rag retriever save pretrained (#7399) 2020-09-25 19:47:12 +02:00
Patrick von Platen
1a14687e6f
Update README.md 2020-09-25 19:43:48 +02:00
Patrick von Platen
3327c2b0f6
Update README.md 2020-09-25 19:43:36 +02:00
Ola Piktus
fe326bd5cf
Remove dependency on examples/seq2seq from rag (#7395)
Co-authored-by: Your Name <you@example.com>
2020-09-25 18:20:49 +02:00
Sylvain Gugger
ad39271ae8
Fix FP16 and attention masks in FunnelTransformer (#7374)
* Fix #7371

* Fix training

* Fix test values

* Apply the fix to TF as well
2020-09-25 12:20:39 -04:00
Patrick von Platen
4e5b036bdd
Update README.md 2020-09-25 18:16:46 +02:00
Patrick von Platen
55eccfbb49
Update README.md 2020-09-25 18:16:44 +02:00
Sylvain Gugger
e2e77f02c2
Fix BartModel output documentation (#7390) 2020-09-25 11:48:13 -04:00
Sylvain Gugger
bbb07830ff
Speedup check_copies script (#7394) 2020-09-25 11:47:22 -04:00
Stas Bekman
8859c4f841
[code quality] new make target that combines style and quality targets (#7310)
* [code quality] merge style and quality targets

Any reason why we don't run `flake8` in `make style`? I find myself needing to run `make style` and `make quality` all the time, but I need the latter just for the last 2 checks. Since we have no control over the source code why bother with separating checking and fixing - let's just have one target that fixes and then performs the remaining checks, as we know the first two have been done already.

This PR suggests to merge the 2 targets into one efficient target.

I will edit the docs if this change resonates with the team.

* move checks into style, re-use target

* better name

* add fixup target

* document new target
2020-09-25 11:37:40 -04:00
Sam Shleifer
38a1b03f4d
Remove unhelpful bart warning (#7391) 2020-09-25 11:01:07 -04:00
Patrick von Platen
5ff0d6d7d0
Update README.md 2020-09-25 16:58:29 +02:00
Quentin Lhoest
cf1c88e092
[RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372)
* Fix retrieval offset in RAG's HfIndex

* update slow tests

* style

* fix new test

* style

* add better tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-25 16:12:46 +02:00
Patrick von Platen
571c7a11c1
[Rag] Fix wrong usage of num_beams and bos_token_id in Rag Sequence generation (#7386)
* fix_rag_sequence

* add second bug fix
2020-09-25 14:35:49 +02:00
Suraj Patil
415071b4c2
doc changes (#7385) 2020-09-25 08:00:36 -04:00
Patrick von Platen
2dd652d757
[RAG] Add missing doc and attention_mask to rag (#7382)
* add docs

* add missing docs and attention_mask in fine-tune
2020-09-25 11:23:55 +02:00
Lysandre Debut
7cdd9da5bf
Check config type using type instead of isinstance (#7363)
* Check config type instead of instance


Bad merge

* Remove for loops

* Style
2020-09-25 05:09:09 -04:00
Sam Shleifer
3c6bf8998f
modeling_bart: 3 small cleanups that dont change outputs (#7381)
* Mbart passing

* boom boom

* cleaner assert

* add assert

* Fix tests
2020-09-25 04:24:14 -04:00
Suraj Patil
9e68d075a4
Seq2SeqTrainer (#6769)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 18:46:58 -04:00
Sam Shleifer
d9d0f1140b
[s2s] distributed eval allows num_return_sequences > 1 (#7254) 2020-09-24 17:30:09 -04:00
Patrick von Platen
0804d077c6
correct attention mask (#7373) 2020-09-24 23:22:04 +02:00
Stas Bekman
a8cbc4269c
[fsmt] build/test scripts (#7257)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 17:10:26 -04:00
Sylvain Gugger
a8e7982f84
Remove mentions of RAG from the docs (#7376)
* Remove mentions of  RAG from the docs

* Deactivate check
2020-09-24 17:07:14 -04:00
Stas Bekman
eadd870b2f
[seq2seq] make it easier to run the scripts (#7274) 2020-09-24 15:23:48 -04:00
Lysandre Debut
8d3bb781ee
Formatter (#7368)
* Formatter

* Docs
2020-09-24 10:59:21 -04:00
Teven
7dfdf793bb
Fixing case in which Trainer hung while saving model in distributed training (#7365)
* remote debugging

* remote debugging

* moved _store_flos call

* moved _store_flos call

* moved _store_flos call

* removed debugging artefacts
2020-09-24 09:56:40 -04:00
Sylvain Gugger
0ccb6f5c6d
Clean RAG docs and template docs (#7348)
* Clean RAG docs and template docs

* Fix typo

* Better doc
2020-09-24 09:24:41 -04:00
Sylvain Gugger
27174bd4fe
Make PyTorch model files independent from each other (#7352) 2020-09-24 08:53:54 -04:00