transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-16 02:58:23 +06:00

Author	SHA1	Message	Date
Patrick von Platen	5ff0d6d7d0	Update README.md	2020-09-25 16:58:29 +02:00
Quentin Lhoest	cf1c88e092	[RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372 ) * Fix retrieval offset in RAG's HfIndex * update slow tests * style * fix new test * style * add better tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-09-25 16:12:46 +02:00
Patrick von Platen	571c7a11c1	[Rag] Fix wrong usage of `num_beams` and `bos_token_id` in Rag Sequence generation (#7386 ) * fix_rag_sequence * add second bug fix	2020-09-25 14:35:49 +02:00
Suraj Patil	415071b4c2	doc changes (#7385 )	2020-09-25 08:00:36 -04:00
Patrick von Platen	2dd652d757	[RAG] Add missing doc and attention_mask to rag (#7382 ) * add docs * add missing docs and attention_mask in fine-tune	2020-09-25 11:23:55 +02:00
Lysandre Debut	7cdd9da5bf	Check config type using `type` instead of `isinstance` (#7363 ) * Check config type instead of instance Bad merge * Remove for loops * Style	2020-09-25 05:09:09 -04:00
Sam Shleifer	3c6bf8998f	modeling_bart: 3 small cleanups that dont change outputs (#7381 ) * Mbart passing * boom boom * cleaner assert * add assert * Fix tests	2020-09-25 04:24:14 -04:00
Suraj Patil	9e68d075a4	Seq2SeqTrainer (#6769 ) Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-09-24 18:46:58 -04:00
Sam Shleifer	d9d0f1140b	[s2s] distributed eval allows num_return_sequences > 1 (#7254 )	2020-09-24 17:30:09 -04:00
Patrick von Platen	0804d077c6	correct attention mask (#7373 )	2020-09-24 23:22:04 +02:00
Stas Bekman	a8cbc4269c	[fsmt] build/test scripts (#7257 ) Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-09-24 17:10:26 -04:00
Sylvain Gugger	a8e7982f84	Remove mentions of RAG from the docs (#7376 ) * Remove mentions of RAG from the docs * Deactivate check	2020-09-24 17:07:14 -04:00
Stas Bekman	eadd870b2f	[seq2seq] make it easier to run the scripts (#7274 )	2020-09-24 15:23:48 -04:00
Lysandre Debut	8d3bb781ee	Formatter (#7368 ) * Formatter * Docs	2020-09-24 10:59:21 -04:00
Teven	7dfdf793bb	Fixing case in which `Trainer` hung while saving model in distributed training (#7365 ) * remote debugging * remote debugging * moved _store_flos call * moved _store_flos call * moved _store_flos call * removed debugging artefacts	2020-09-24 09:56:40 -04:00
Sylvain Gugger	0ccb6f5c6d	Clean RAG docs and template docs (#7348 ) * Clean RAG docs and template docs * Fix typo * Better doc	2020-09-24 09:24:41 -04:00
Sylvain Gugger	27174bd4fe	Make PyTorch model files independent from each other (#7352 )	2020-09-24 08:53:54 -04:00
Julien Plu	d161ed1682	Update the TF models to remove their interdependencies (#7238 ) * Refacto the models to remove their interdependencies * Fix Flaubert model * Fix Flaubert * Fix XLM * Fix Albert * Fix Roberta * Fix Albert * Fix Flaubert * Apply style + remove unused imports * Fix Distilbert * remove unused import * fix Distilbert * Fix Flaubert * Apply style * Fix Flaubert * Add the copy comments for the check_copies script * Fix MobileBert model name * Address Morgan's comments * Fix typo * Oops typo	2020-09-24 08:30:59 -04:00
Jabin Huang	0cffa424f8	Updata tokenization_auto.py (#6870 ) Updata tokenization_auto.py to handle Inherited tokenizer	2020-09-24 06:52:10 -04:00
Daquan Lin	03fb8e79c6	Update modeling_tf_longformer.py (#7359 ) correct a very small mistake	2020-09-24 11:37:29 +02:00
Sylvain Gugger	1ff5bd38a3	Check decorator order (#7326 ) * Check decorator order * Adapt for parametrized decorators * Fix typos	2020-09-24 04:54:37 -04:00
Sylvain Gugger	0be5f4a00c	Expand a bit the documentation doc (#7350 )	2020-09-24 04:34:18 -04:00
Sam Shleifer	38f1703795	wip: Code to add lang tags to marian model cards (#6586 )	2020-09-23 18:11:06 -04:00
Theo Linnemann	129fdae040	Remove reference to args in XLA check (#7344 ) Previously, the TFTrainingArguments object did a check to see if XLA was enabled, but did this by referencing `self.args.xla`, when it should be `self.xla`, because it is the args object. This can be verified a few lines above, where the XLA field is set.	2020-09-23 13:56:21 -04:00
Felipe Curti	d266613635	[Benchmarks] Change all args to from `no_...` to their positive form (#7075 ) * Changed name to all no_... arguments and all references to them, inverting the boolean condition * Change benchmark tests to use new Benchmark Args * Update src/transformers/benchmark/benchmark_args_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/benchmark/benchmark.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Fix Style. Add --no options in help * fix some part of tests * Update src/transformers/benchmark/benchmark_args_utils.py * Update src/transformers/benchmark/benchmark_args_utils.py * Update src/transformers/benchmark/benchmark_args_utils.py * fix all tests * make style * add backwards compability * make backwards compatible Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>	2020-09-23 13:25:24 -04:00
Doug Blank	8c697d58ef	Ensure that integrations are imported before transformers or ml libs (#7330 ) * Ensure that intergrations are imported before transformers or ml libs * Black reformatter wanted a newline * isort requests * black requests * flake8 requests	2020-09-23 13:23:45 -04:00
Sylvain Gugger	3323146e90	Models doc (#7345 ) * Clean up model documentation * Formatting * Preparation work * Long lines * Main work on rst files * Cleanup all config files * Syntax fix * Clean all tokenizers * Work on first models * Models beginning * FaluBERT * All PyTorch models * All models * Long lines again * Fixes * More fixes * Update docs/source/model_doc/bert.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/electra.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Last fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-09-23 13:20:45 -04:00
Wissam Antoun	58405a527b	Fixed evaluation_strategy on epoch end bug (#7340 ) * Fixed evaluation_strategy on epoch end bug move the evaluation script outside the the iteration loop * black formatting	2020-09-23 13:17:00 -04:00
Stas Bekman	28cf873036	[testing] skip decorators: docs, tests, bugs (#7334 ) * skip decorators: docs, tests, bugs * another important note * style * bloody style * add @pytest.mark.parametrize * add note * no idea what it wants :(	2020-09-23 05:16:19 -04:00
Stas Bekman	df53643807	[code quality] fix confused flake8 (#7309 ) * fix confused flake We run `black --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run: ``` black --line-length 119 --target-version py38 examples templates tests src utils ``` it indeed reformats 63 files. The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds. Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version. * adjust the other files that will now rely on black's config file	2020-09-22 22:12:36 -04:00
Sam Shleifer	78387cc63e	[s2s] only save metrics.json from rank zero (#7331 )	2020-09-22 18:27:28 -04:00
Sam Shleifer	e53138a1b9	[s2s] add src_lang kwarg for distributed eval (#7300 )	2020-09-22 18:26:37 -04:00
blinovpd	a9c7849cfa	[model_cards] blinoff/roberta-base-russian-v0 (#7317 )	2020-09-22 18:26:13 -04:00
Sylvain Gugger	f5518e5631	Formatting	2020-09-22 14:55:12 -04:00
Chady Kamar	17099ebd58	Add num workers cli arg (#7322 ) * Add dataloader_num_workers to TrainingArguments This argument is meant to be used to set the number of workers for the PyTorch DataLoader. * Pass num_workers argument on DataLoader init	2020-09-22 14:44:42 -04:00
Sam Shleifer	25b0463d0b	[s2s] add supported architecures to MD (#7252 )	2020-09-22 13:09:35 -04:00
Pavel Soriano	d6bc72c469	Fixed results of SQuAD-FR evaluation (#7313 ) The score for the F1 metric was reported as the Exact Match and vice-versa.	2020-09-22 12:39:07 -04:00
Huang Lianzhe	6303b5a718	[Bug Fix] The actual batch_size is inconsistent with the settings. (#7235 ) * [bug fix] fixed the bug that the actual batch_size is inconsistent with the parameter settings * reformat * reformat * reformat * add support for dict and BatchEncoding * add support for dict and BatchEncoding * add documentation for DataCollatorForNextSentencePrediction * Some more nits for the docstring Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Some more nits for the docstring Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Some more nits for the docstring Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Some more nits for the docstring Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Some more nits for the docstring Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename variables Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-09-22 12:31:21 -04:00
Ola Piktus	c754c41c61	RAG (#6813 ) * added rag WIP * path fix * Formatting / renaming prior to actual work * added rag WIP * path fix * Formatting / renaming prior to actual work * added rag WIP * path fix * Formatting / renaming prior to actual work * added rag WIP * Formatting / renaming prior to actual work * First commit * improve comments * Retrieval evaluation scripts * refactor to include modeling outputs + MPI retriever * Fix rag-token model + refactor * Various fixes + finetuning logic * use_bos fix * Retrieval refactor * Finetuning refactoring and cleanup * Add documentation and cleanup * Remove set_up_rag_env.sh file * Fix retrieval wit HF index * Fix import errors * Fix quality errors * Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867 * fix quality * Fix RAG Sequence generation * minor cleanup plus initial tests * fix test * fix tests 2 * Comments fix * post-merge fixes * Improve readme + post-rebase refactor * Extra dependencied for tests * Fix tests * Fix tests 2 * Refactor test requirements * Fix tests 3 * Post-rebase refactor * rename nlp->datasets * RAG integration tests * add tokenizer to slow integration test and allow retriever to run on cpu * add tests; fix position ids warning * change structure * change structure * add from encoder generator * save working solution * make all integration tests pass * add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained * don't save paths * delete unnecessary imports * pass config to AutoTokenizer.from_pretrained for Rag tokenizers * init wiki_dpr only once * hardcode legacy index and passages paths (todo: add the right urls) * finalize config * finalize retriver api and config api * LegacyIndex index download refactor * add dpr to autotokenizer * make from pretrained more flexible * fix ragfortokengeneration * small name changes in tokenizer * add labels to models * change default index name * add retrieval tests * finish token generate * align test with previous version and make all tests pass * add tests * finalize tests * implement thoms suggestions * add first version of test * make first tests work * make retriever platform agnostic * naming * style * add legacy index URL * docstrings + simple retrieval test for distributed * clean model api * add doc_ids to retriever's outputs * fix retrieval tests * finish model outputs * finalize model api * fix generate problem for rag * fix generate for other modles * fix some tests * save intermediate * set generate to default * big refactor generate * delete rag_api * correct pip faiss install * fix auto tokenization test * fix faiss install * fix test * move the distributed logic to examples * model page * docs * finish tests * fix dependencies * fix import in __init__ * Refactor eval_rag and finetune scripts * start docstring * add psutil to test * fix tf test * move require torch to top * fix retrieval test * align naming * finish automodel * fix repo consistency * test ragtokenizer save/load * add rag model output docs * fix ragtokenizer save/load from pretrained * fix tokenizer dir * remove torch in retrieval * fix docs * fixe finetune scripts * finish model docs * finish docs * remove auto model for now * add require torch * remove solved todos * integrate sylvains suggestions * sams comments * correct mistake on purpose * improve README * Add generation test cases * fix rag token * clean token generate * fix test * add note to test * fix attention mask * add t5 test for rag * Fix handling prefix in finetune.py * don't overwrite index_name Co-authored-by: Patrick Lewis <plewis@fb.com> Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair> Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair> Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair> Co-authored-by: Your Name <you@example.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>	2020-09-22 18:29:58 +02:00
Sylvain Gugger	1ee2194fb6	Mark big downloads slow (#7325 ) * Make big downloads as slow * Add import * Right order for slow decorator * More slow tests	2020-09-22 12:21:52 -04:00
Julien Plu	585217c87f	Add generic text classification example in TF (#5716 ) * Add new example with nlp * Update README * replace nlp by datasets * Update examples/text-classification/README.md Add Lysandre's suggestion. Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-09-22 12:05:05 -04:00
Lysandre	6e21f24220	Documentation version	2020-09-22 18:04:39 +02:00
Lysandre	3ebb1b3a2b	Release: v3.2.0	2020-09-22 17:36:51 +02:00
Sylvain Gugger	01f0fd0bab	Fixes for LayoutLM (#7318 )	2020-09-22 10:37:11 -04:00
Julien Plu	702a76ff92	Create an XLA parameter and fix the mixed precision (#7311 ) * Create an XLA parameter and fix mixed precision creation * Fix issue brought by intellisense * Complete docstring	2020-09-22 10:19:34 -04:00
Sylvain Gugger	596342c2b9	Support for Windows in check_copies (#7316 )	2020-09-22 10:17:48 -04:00
Sylvain Gugger	89edf504bf	Add possibility to evaluate every epoch (#7302 ) * Add possibility to evaluate every epoch * Remove multitype arg * Remove needless import * Use a proper enum * Apply suggestions from @LysandreJik Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * One else and formatting Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-09-22 09:52:29 -04:00
Sylvain Gugger	21ca148090	is_pretokenized -> is_split_into_words (#7236 ) * is_pretokenized -> is_split_into_words * Fix tests	2020-09-22 09:34:35 -04:00
Julien Plu	324f361e91	Fix saving TF custom models (#7291 ) * Fix #7277 * Apply style * Add a full training pipeline test * Apply style	2020-09-22 09:31:13 -04:00
Minghao Li	cd9a0585ea	Add LayoutLM Model (#7064 ) * first version * finish test docs readme model/config/tokenization class * apply make style and make quality * fix layoutlm GitHub link * fix conflict in index.rst and add layoutlm to pretrained_models.rst * fix bug in test_parents_and_children_in_mappings * reformat modeling_auto.py and tokenization_auto.py * fix bug in test_modeling_layoutlm.py * Update docs/source/model_doc/layoutlm.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/layoutlm.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove inh, add tokenizer fast, and update some doc * copy and rename necessary class from modeling_bert to modeling_layoutlm * Update src/transformers/configuration_layoutlm.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/configuration_layoutlm.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/configuration_layoutlm.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/configuration_layoutlm.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/modeling_layoutlm.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/modeling_layoutlm.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/modeling_layoutlm.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * add mish to activations.py, import ACT2FN and import logging from utils Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-09-22 09:28:02 -04:00

... 8 9 10 11 12 ...

5759 Commits