transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-24 23:08:57 +06:00

Author	SHA1	Message	Date
Teven	9e9a1fb8c7	Adding gradient checkpointing to GPT2 (#7446 ) * GPT2 gradient checkpointing * find_unused_parameters removed if checkpointing * find_unused_parameters removed if checkpointing * Update src/transformers/configuration_gpt2.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Added a test for generation with checkpointing * Update src/transformers/configuration_gpt2.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-09-29 12:26:26 -04:00
Sylvain Gugger	52e8392b7e	Add automatic best model loading to Trainer (#7431 ) * Add automatic best model loading to Trainer * Some small fixes * Formatting	2020-09-29 10:41:18 -04:00
Sylvain Gugger	1fc4de69ed	Document new features of make fixup (#7434 )	2020-09-29 03:56:57 -04:00
GmailB	205bf0b7ea	Update README.md (#7444 ) Hi, just corrected the example code, add 2 links and fixed some typos	2020-09-29 03:18:01 -04:00
Sam Shleifer	74d8d69bd4	[s2s] consistent output format across eval scripts (#7435 )	2020-09-28 23:20:03 -04:00
Typicasoft	671b278e25	Create README.md (#7436 ) * Create README.md MagBERT-NER : Added widget (Text) * Rename model_cards/README.md to model_cards/TypicaAI/magbert-ner/README.md	2020-09-28 18:25:25 -04:00
Manuel Romero	a1a8ffa512	Update README.md (#7429 ) Add links to models fine-tuned on a downstream task	2020-09-28 13:40:09 -04:00
Stas Bekman	f62f2ffdcc	[makefile] 10x speed up checking/fixing (#7403 ) * [makefile] check/fix only modified since branching files * fix phonies * parametrize dirs * have only one source for dirs to check * look ma, no autoformatters here	2020-09-28 10:45:42 -04:00
Lysandre	16c213820e	Update docs to version v3.3.0	2020-09-28 16:32:00 +02:00
Lysandre	0613f05226	Release: v3.3.0	2020-09-28 16:24:43 +02:00
Sylvain Gugger	ca3fc36de3	Reorganize documentation navbar (#7423 ) * Reorganize documentation navbar * Update css to have clear sections	2020-09-28 16:22:58 +02:00
Lysandre Debut	7f4115c099	Pull request template (#7392 ) co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: sgugger <sylvain.gugger@gmail.com>	2020-09-28 09:51:49 -04:00
Sylvain Gugger	0611eab5e3	Document RAG again (#7377 ) Do not merge before Monday	2020-09-28 08:31:46 -04:00
Sylvain Gugger	7563d5a3cf	Catch PyTorch warning when saving/loading scheduler (#7401 )	2020-09-28 08:20:10 -04:00
Boris Dayma	1749ca317e	docs: fix model sharing file names (#5855 ) * docs: fix model sharing file names * Update docs/source/model_sharing.rst Co-authored-by: Julien Chaumond <chaumond@gmail.com> * docs(model_sharing.rst): fix new line Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-09-28 08:17:30 -04:00
Patrick von Platen	8279471506	correct RAG model cards (#7420 )	2020-09-28 11:08:39 +02:00
Marcin Zabłocki	4083a55ab0	Flos fix (#7384 )	2020-09-28 04:09:26 -04:00
Ola Piktus	ae3e84f3ba	[RAG] Clean Rag readme in examples (#7413 ) * Improve README + consolidation script * Reformat README * Reformat README Co-authored-by: Your Name <you@example.com>	2020-09-28 10:06:39 +02:00
Sam Shleifer	748425d47d	[T5] allow config.decoder_layers to control decoder size (#7409 ) * Working assymmetrical T5 * rename decoder_layers -> num_decoder_layers * Fix docstring * Allow creation of asymmetric t5 students	2020-09-28 03:08:04 -04:00
Sam Shleifer	7296fea1d6	[s2s] rougeLSum expects \n between sentences (#7410 ) Co-authored-by: Swetha Mandava <smandava@nvidia.com>	2020-09-27 16:27:19 -04:00
Suraj Patil	eab5f59682	[s2s] add create student script (#7290 ) Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-09-27 15:10:46 -04:00
Patrick von Platen	e50a931c11	[Longformer, Bert, Roberta, ...] Fix multi gpu training (#7272 ) * fix multi-gpu * fix longformer * force to delete unnecessary layers * fix notifications * fix warning * fix roberta * fix tests * remove hasattr * fix tests * fix roberta * merge and clean authorized keys	2020-09-25 20:33:21 +02:00
Patrick von Platen	2c8ecdf8a8	fix rag retriever save pretrained (#7399 )	2020-09-25 19:47:12 +02:00
Patrick von Platen	1a14687e6f	Update README.md	2020-09-25 19:43:48 +02:00
Patrick von Platen	3327c2b0f6	Update README.md	2020-09-25 19:43:36 +02:00
Ola Piktus	fe326bd5cf	Remove dependency on examples/seq2seq from rag (#7395 ) Co-authored-by: Your Name <you@example.com>	2020-09-25 18:20:49 +02:00
Sylvain Gugger	ad39271ae8	Fix FP16 and attention masks in FunnelTransformer (#7374 ) * Fix #7371 * Fix training * Fix test values * Apply the fix to TF as well	2020-09-25 12:20:39 -04:00
Patrick von Platen	4e5b036bdd	Update README.md	2020-09-25 18:16:46 +02:00
Patrick von Platen	55eccfbb49	Update README.md	2020-09-25 18:16:44 +02:00
Sylvain Gugger	e2e77f02c2	Fix BartModel output documentation (#7390 )	2020-09-25 11:48:13 -04:00
Sylvain Gugger	bbb07830ff	Speedup check_copies script (#7394 )	2020-09-25 11:47:22 -04:00
Stas Bekman	8859c4f841	[code quality] new make target that combines style and quality targets (#7310 ) * [code quality] merge style and quality targets Any reason why we don't run `flake8` in `make style`? I find myself needing to run `make style` and `make quality` all the time, but I need the latter just for the last 2 checks. Since we have no control over the source code why bother with separating checking and fixing - let's just have one target that fixes and then performs the remaining checks, as we know the first two have been done already. This PR suggests to merge the 2 targets into one efficient target. I will edit the docs if this change resonates with the team. * move checks into style, re-use target * better name * add fixup target * document new target	2020-09-25 11:37:40 -04:00
Sam Shleifer	38a1b03f4d	Remove unhelpful bart warning (#7391 )	2020-09-25 11:01:07 -04:00
Patrick von Platen	5ff0d6d7d0	Update README.md	2020-09-25 16:58:29 +02:00
Quentin Lhoest	cf1c88e092	[RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372 ) * Fix retrieval offset in RAG's HfIndex * update slow tests * style * fix new test * style * add better tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-09-25 16:12:46 +02:00
Patrick von Platen	571c7a11c1	[Rag] Fix wrong usage of `num_beams` and `bos_token_id` in Rag Sequence generation (#7386 ) * fix_rag_sequence * add second bug fix	2020-09-25 14:35:49 +02:00
Suraj Patil	415071b4c2	doc changes (#7385 )	2020-09-25 08:00:36 -04:00
Patrick von Platen	2dd652d757	[RAG] Add missing doc and attention_mask to rag (#7382 ) * add docs * add missing docs and attention_mask in fine-tune	2020-09-25 11:23:55 +02:00
Lysandre Debut	7cdd9da5bf	Check config type using `type` instead of `isinstance` (#7363 ) * Check config type instead of instance Bad merge * Remove for loops * Style	2020-09-25 05:09:09 -04:00
Sam Shleifer	3c6bf8998f	modeling_bart: 3 small cleanups that dont change outputs (#7381 ) * Mbart passing * boom boom * cleaner assert * add assert * Fix tests	2020-09-25 04:24:14 -04:00
Suraj Patil	9e68d075a4	Seq2SeqTrainer (#6769 ) Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-09-24 18:46:58 -04:00
Sam Shleifer	d9d0f1140b	[s2s] distributed eval allows num_return_sequences > 1 (#7254 )	2020-09-24 17:30:09 -04:00
Patrick von Platen	0804d077c6	correct attention mask (#7373 )	2020-09-24 23:22:04 +02:00
Stas Bekman	a8cbc4269c	[fsmt] build/test scripts (#7257 ) Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-09-24 17:10:26 -04:00
Sylvain Gugger	a8e7982f84	Remove mentions of RAG from the docs (#7376 ) * Remove mentions of RAG from the docs * Deactivate check	2020-09-24 17:07:14 -04:00
Stas Bekman	eadd870b2f	[seq2seq] make it easier to run the scripts (#7274 )	2020-09-24 15:23:48 -04:00
Lysandre Debut	8d3bb781ee	Formatter (#7368 ) * Formatter * Docs	2020-09-24 10:59:21 -04:00
Teven	7dfdf793bb	Fixing case in which `Trainer` hung while saving model in distributed training (#7365 ) * remote debugging * remote debugging * moved _store_flos call * moved _store_flos call * moved _store_flos call * removed debugging artefacts	2020-09-24 09:56:40 -04:00
Sylvain Gugger	0ccb6f5c6d	Clean RAG docs and template docs (#7348 ) * Clean RAG docs and template docs * Fix typo * Better doc	2020-09-24 09:24:41 -04:00
Sylvain Gugger	27174bd4fe	Make PyTorch model files independent from each other (#7352 )	2020-09-24 08:53:54 -04:00

1 2 3 4 5 ...

5342 Commits