transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Thomas Wolf	ba8c4d0ac0	[Dependencies\|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659 ) * splitting fast and slow tokenizers [WIP] * [WIP] splitting sentencepiece and tokenizers dependencies * update dummy objects * add name_or_path to models and tokenizers * prefix added to file names * prefix * styling + quality * spliting all the tokenizer files - sorting sentencepiece based ones * update tokenizer version up to 0.9.0 * remove hard dependency on sentencepiece 🎉 * and removed hard dependency on tokenizers 🎉 * update conversion script * update missing models * fixing tests * move test_tokenization_fast to main tokenization tests - fix bugs * bump up tokenizers * fix bert_generation * update ad fix several tokenizers * keep sentencepiece in deps for now * fix funnel and deberta tests * fix fsmt * fix marian tests * fix layoutlm * fix squeezebert and gpt2 * fix T5 tokenization * fix xlnet tests * style * fix mbart * bump up tokenizers to 0.9.2 * fix model tests * fix tf models * fix seq2seq examples * fix tests without sentencepiece * fix slow => fast conversion without sentencepiece * update auto and bert generation tests * fix mbart tests * fix auto and common test without tokenizers * fix tests without tokenizers * clean up tests lighten up when tokenizers + sentencepiece are both off * style quality and tests fixing * add sentencepiece to doc/examples reqs * leave sentencepiece on for now * style quality split hebert and fix pegasus * WIP Herbert fast * add sample_text_no_unicode and fix hebert tokenization * skip FSMT example test for now * fix style * fix fsmt in example tests * update following Lysandre and Sylvain's comments * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/testing_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-18 20:51:24 +02:00
Raza Habib	c65863ce53	Remove duplicated mish activation function (#7856 ) * Remove duplicated mish activation function * Update activations.py	2020-10-17 17:31:53 -04:00
Patrick von Platen	f5c45a19e6	Fix Rag example docstring (#7872 ) * fix rag examples * fix token generate example	2020-10-17 22:46:47 +02:00
Stas Bekman	9f7b2b2432	[s2s testing] turn all to unittests, use auto-delete temp dirs (#7859 )	2020-10-17 14:33:21 -04:00
Patrick von Platen	dc552b9b70	Fix typo in sequence model card	2020-10-16 16:05:06 +02:00
Stas Bekman	1652ddad35	[seq2seq testing] improve readability (#7845 )	2020-10-16 09:05:29 -04:00
Quentin Lhoest	466115b279	Fix missing reference titles in retrieval evaluation of RAG (#7817 )	2020-10-16 10:15:49 +02:00
Stas Bekman	464b53f5e4	[testing] disable FutureWarning in examples tests (#7842 ) * [testing] disable FutureWarning in examples tests same as tests/conftest.py, we can't resolve those warning, so turn the noise off. * fix	2020-10-16 03:35:39 -04:00
Sylvain Gugger	eb186bc14e	Small fixes to HP search (#7839 )	2020-10-16 03:23:44 -04:00
Stas Bekman	d8ca57d2ce	fix/hide warnings (#7837 ) s	2020-10-16 03:19:51 -04:00
vblagoje	c6e865ac2b	Remove masked_lm_labels from returned dictionary (#7818 )	2020-10-16 03:12:10 -04:00
Sam Shleifer	96e47d9229	[cleanup] assign todos, faster bart-cnn test (#7835 ) * 2 beam output * unassign/remove TODOs * remove one more	2020-10-16 03:11:18 -04:00
rmroczkowski	7b13bd01df	Herbert polish model (#7798 ) * HerBERT transformer model for Polish language understanding. * HerbertTokenizerFast generated with HerbertConverter * Herbert base and large model cards * Herbert model cards with tags * Herbert tensorflow models * Herbert model tests based on Bert test suit * src/transformers/tokenization_herbert.py edited online with Bitbucket * src/transformers/tokenization_herbert.py edited online with Bitbucket * docs/source/model_doc/herbert.rst edited online with Bitbucket * Herbert tokenizer tests and bug fixes * src/transformers/configuration_herbert.py edited online with Bitbucket * Copyrights and tests for TFHerbertModel * model_cards/allegro/herbert-base-cased/README.md edited online with Bitbucket * model_cards/allegro/herbert-large-cased/README.md edited online with Bitbucket * Bug fixes after testing * Reformat modified_only_fixup * Proper order of configuration * Herbert proper documentation formatting * Formatting with make modified_only_fixup * Dummies fixed * Adding missing models to documentation * Removing HerBERT model as it is a simple extension of BERT * Update model_cards/allegro/herbert-base-cased/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com> * Update model_cards/allegro/herbert-large-cased/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com> * HerbertTokenizer deprecated configuration removed Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-10-16 03:06:51 -04:00
Julien Chaumond	99898dcd27	[Pipelines] Fix links to model lists (#7826 )	2020-10-16 02:57:02 -04:00
Lysandre Debut	52c9e84285	Fix DeBERTa integration tests (#7729 )	2020-10-16 02:49:13 -04:00
Stas Bekman	2255c2c7a0	[seq2seq] get_git_info fails gracefully (#7843 ) Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-10-16 00:22:43 -04:00
Katarina Slama	dfa4c26bc0	Typo and fix the input of labels to `cross_entropy` (#7841 ) The current version caused some errors. The changes fixed it for me. Hope this is helpful!	2020-10-15 19:36:31 -04:00
Stas Bekman	a5a8eeb772	fix DeprecationWarning (#7834 ) in `tests/test_utils_check_copies.py` I was getting intermittently: ``` utils/check_copies.py:52 /mnt/nvme1/code/transformers-comet/utils/check_copies.py:52: DeprecationWarning: invalid escape sequence \s while line_index < len(lines) and re.search(f"^{indent}(class\|def)\s+{name}", lines[line_index]) is None: ``` So this should fix it.	2020-10-15 16:21:09 -04:00
David S. Lim	9c71cca316	model card for bert-base-NER (#7799 ) * model card for bert-base-NER * add meta data up top Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-10-15 21:55:00 +02:00
Stas Bekman	4dbca50022	fix wandb/comet problems (#7830 ) * fix wandb/comet problems * simplify * Update src/transformers/integrations.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-15 15:23:24 -04:00
Julien Chaumond	e7aa64838c	[model_cards] facebook/bart-large-mnli: register ZSC for the inference API cc @Narsil @mfuntowicz @joeddav	2020-10-15 19:02:10 +02:00
Sylvain Gugger	2ce3ddab2d	Small fixes to NotebookProgressCallback (#7813 )	2020-10-15 10:30:34 -04:00
Julien Chaumond	6f45dd2fac	[model_cards] Fix yaml for Facebook/wmt19-* see `d99ed7ad61`	2020-10-15 16:14:08 +02:00
Julien Chaumond	d99ed7ad61	[model_cards] Facebook: add thumbnail	2020-10-15 12:53:29 +02:00
Lysandre	2485b8b0ac	Set XLA example time to 500s	2020-10-15 12:34:29 +02:00
Lysandre	2dba7d5702	Notebook catch all errors	2020-10-15 12:21:32 +02:00
Nicolas Patry	9ade8e7499	Upgrading TFAutoModelWithLMHead to (#7730 ) - TFAutoModelForCausalLM - TFAutoModelForMaskedLM - TFAutoModelForSeq2SeqLM as per deprecation warning. No tests as it simply removes current warnings from tests.	2020-10-15 05:26:08 -04:00
Sylvain Gugger	62b5622e6b	Add specific notebook ProgressCalback (#7793 )	2020-10-15 05:05:08 -04:00
Nicolas Patry	0911b6bd86	Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. (#7728 ) * Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. * Actually changing the default resolution order to account for model defaults Adding a new tests for each pipeline to check that pipeline(task) works too without manually adding the framework too.	2020-10-15 09:42:07 +02:00
Julien Plu	3a134f7c67	Fix TF savedmodel in Roberta (#7795 ) * Remove wrong parameter. * Same in Longformer	2020-10-14 23:48:50 +02:00
Nils Reimers	3032de9369	Model Card (#7752 ) * Create README.md * Update model_cards/sentence-transformers/LaBSE/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-10-14 13:30:58 -04:00
sarahlintang	3fdbeba83c	[model_cards] sarahlintang/IndoBERT (#7748 ) * Create README.md * Update model_cards/sarahlintang/IndoBERT/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-10-14 13:10:31 -04:00
Julien Chaumond	ba654270b3	[model_cards] rename to correct model name	2020-10-14 19:02:48 +02:00
Zhuosheng Zhang	08978487e7	Create README.md (#7722 )	2020-10-14 12:56:12 -04:00
Sagor Sarker	3557509127	added evaluation results for classification task (#7790 )	2020-10-14 12:50:43 -04:00
Sylvain Gugger	bb9559a7f9	Don't use `store_xxx` on optional bools (#7786 ) * Don't use `store_xxx` on optional bools * Refine test * Refine test	2020-10-14 12:05:02 -04:00
Sylvain Gugger	a1d1b332d0	Add predict step accumulation (#7767 ) * Add eval_accumulation_step and clean distributed eval * Add TPU test * Add TPU stuff * Fix arg name * Fix Seq2SeqTrainer * Fix total_size * Update src/transformers/trainer_pt_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Doc and add test to TPU * Add unit test * Adapt name Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-10-14 11:41:45 -04:00
Sam Shleifer	8feb0cc967	fix examples/rag imports, tests (#7712 )	2020-10-14 11:35:00 -04:00
XiaoqiJiao	890e790e16	[model_cards] TinyBERT (HUAWEI Noah's Ark Lab) (#7775 )	2020-10-14 09:31:01 -04:00
Jonathan Chang	121dd4332b	Add batch inferencing support for GPT2LMHeadModel (#7552 ) * Add support for gpt2 batch inferencing * add test * remove typo Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-10-14 13:40:24 +02:00
Quentin Lhoest	0c64b18840	Fix bert position ids in DPR convert script (#7776 ) * fix bert position ids in DPR convert script * style	2020-10-14 05:30:02 -04:00
Sylvain Gugger	7968051aba	Fix typo	2020-10-13 17:30:46 -04:00
Sam Shleifer	2977bd528f	Faster pegasus tokenization test with reduced data size (#7762 )	2020-10-13 16:22:29 -04:00
François Lagunas	2d6e2ad4fa	Adding optional trial argument to model_init (#7759 ) * Adding optional trial argument to model_init Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-13 17:07:02 +02:00
Tiger	7e73c12805	fixed lots of typos. (#7758 )	2020-10-13 10:00:20 -04:00
Noam Wies	8cb4ecca25	Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 (#7742 ) * use DDP no_sync when possible * fix is_nlp_available addition mistake * reformat trainer.py * reformat trainer.py * drop support for pytorch < 1.2 * return support for pytorch < 1.2	2020-10-13 09:46:44 -04:00
Lysandre Debut	52f7d74398	Do not softmax when num_labels==1 (#7726 ) * Do not softmax when num_labels==1 * Update src/transformers/pipelines.py Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com> Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>	2020-10-13 09:42:27 -04:00
Patrick von Platen	82b09a8481	[Rag] Fix loading of pretrained Rag Tokenizer (#7756 ) * fix rag * Update tokenizer save_pretrained Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>	2020-10-13 14:34:22 +02:00
Patrick von Platen	2d4e928d97	Update PULL_REQUEST_TEMPLATE.md Putting my name on a couple more issues to directly redirect them to me	2020-10-13 12:18:31 +02:00
Felipe Curti	dcba9ee03b	Gpt1 for sequence classification (#7683 ) * Add Documentation for GPT-1 Classification * Add GPT-1 with Classification head * Add tests for GPT-1 Classification * Add GPT-1 For Classification to auto models * Remove authorized missing keys, change checkpoint to openai-gpt	2020-10-13 05:06:15 -04:00

1 2 3 4 5 ...

5531 Commits