transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Sylvain Gugger	a1d1b332d0	Add predict step accumulation (#7767 ) * Add eval_accumulation_step and clean distributed eval * Add TPU test * Add TPU stuff * Fix arg name * Fix Seq2SeqTrainer * Fix total_size * Update src/transformers/trainer_pt_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Doc and add test to TPU * Add unit test * Adapt name Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-10-14 11:41:45 -04:00
Sam Shleifer	8feb0cc967	fix examples/rag imports, tests (#7712 )	2020-10-14 11:35:00 -04:00
XiaoqiJiao	890e790e16	[model_cards] TinyBERT (HUAWEI Noah's Ark Lab) (#7775 )	2020-10-14 09:31:01 -04:00
Jonathan Chang	121dd4332b	Add batch inferencing support for GPT2LMHeadModel (#7552 ) * Add support for gpt2 batch inferencing * add test * remove typo Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-10-14 13:40:24 +02:00
Quentin Lhoest	0c64b18840	Fix bert position ids in DPR convert script (#7776 ) * fix bert position ids in DPR convert script * style	2020-10-14 05:30:02 -04:00
Sylvain Gugger	7968051aba	Fix typo	2020-10-13 17:30:46 -04:00
Sam Shleifer	2977bd528f	Faster pegasus tokenization test with reduced data size (#7762 )	2020-10-13 16:22:29 -04:00
François Lagunas	2d6e2ad4fa	Adding optional trial argument to model_init (#7759 ) * Adding optional trial argument to model_init Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-13 17:07:02 +02:00
Tiger	7e73c12805	fixed lots of typos. (#7758 )	2020-10-13 10:00:20 -04:00
Noam Wies	8cb4ecca25	Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 (#7742 ) * use DDP no_sync when possible * fix is_nlp_available addition mistake * reformat trainer.py * reformat trainer.py * drop support for pytorch < 1.2 * return support for pytorch < 1.2	2020-10-13 09:46:44 -04:00
Lysandre Debut	52f7d74398	Do not softmax when num_labels==1 (#7726 ) * Do not softmax when num_labels==1 * Update src/transformers/pipelines.py Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com> Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>	2020-10-13 09:42:27 -04:00
Patrick von Platen	82b09a8481	[Rag] Fix loading of pretrained Rag Tokenizer (#7756 ) * fix rag * Update tokenizer save_pretrained Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>	2020-10-13 14:34:22 +02:00
Patrick von Platen	2d4e928d97	Update PULL_REQUEST_TEMPLATE.md Putting my name on a couple more issues to directly redirect them to me	2020-10-13 12:18:31 +02:00
Felipe Curti	dcba9ee03b	Gpt1 for sequence classification (#7683 ) * Add Documentation for GPT-1 Classification * Add GPT-1 with Classification head * Add tests for GPT-1 Classification * Add GPT-1 For Classification to auto models * Remove authorized missing keys, change checkpoint to openai-gpt	2020-10-13 05:06:15 -04:00
Lysandre Debut	f34b4cd1bd	ElectraTokenizerFast (#7754 )	2020-10-13 04:50:41 -04:00
Sam Shleifer	9c2b2db2cd	[marian] Automate Tatoeba-Challenge conversion (#7709 )	2020-10-12 12:24:25 -04:00
Alex Combessie	aacac8f708	Add license info to nlptown/bert-base-multilingual-uncased-sentiment (#7738 )	2020-10-12 11:56:10 -04:00
Lysandre Debut	1f1d950b28	Fix #7331 (#7732 )	2020-10-12 09:10:52 -04:00
Julien Plu	d9ffb87efb	Fix tf text class (#7724 ) * Fix test * fix generic text classification * fix test * Fix tests	2020-10-12 08:45:15 -04:00
sgugger	d6175a4268	Fix code quality	2020-10-12 08:22:27 -04:00
Jonathan Chang	1d5ea34f6a	Fix trainer callback (#7720 ) Fix a bug that happends when subclassing Trainer and overwriting evaluate() without calling prediciton_loop()	2020-10-12 07:45:12 -04:00
Kelvin	f176e70723	The input training data files (multiple files in glob format). (#7717 ) Very often splitting large files to smaller files can prevent tokenizer going out of memory in environment like Colab that does not have swap memory	2020-10-12 07:44:02 -04:00
AndreaSottana	34fcfb44e3	Update tokenization_utils_base.py (#7696 ) Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural.	2020-10-12 06:09:20 -04:00
fteufel	2f34bcf3e7	check for tpu availability in save_pretrained (#7699 ) Added is_torch_tpu_available() to the condition for saving a model as xla model. "xla_device" property of config can also be True on a non-xla device, when loading a checkpointthat was trained on xla before. Resolves #7695	2020-10-12 04:10:17 -04:00
Sylvain Gugger	13c1857718	Fix typo in all model docs (#7714 )	2020-10-12 04:06:59 -04:00
Berowne	83086858f8	fixed typo in warning line 207. (#7718 ) replace 'men_len' with 'mem_len' to match parameter name	2020-10-12 03:58:58 -04:00
Miguel Victor	03ec02a667	Corrected typo: maked → masked (#7703 )	2020-10-11 16:45:00 -04:00
Sam Shleifer	827c519494	[examples] bump pl=0.9.0 (#7053 )	2020-10-11 16:39:38 -04:00
Alexandr Maslov	ba4bbd92bc	Fix docstring in AutoModel class (#7694 )	2020-10-10 21:08:08 -04:00
Andrew Kane	26d5475d4b	Added license information for default and distilbert models (#7688 )	2020-10-10 03:55:11 -04:00
Sylvain Gugger	c6e18de9f8	Fix flaky test in test_trainer (#7689 )	2020-10-09 20:01:15 -04:00
Sylvain Gugger	2c9e83f7b8	Fix title level in Blenderbot doc (#7687 )	2020-10-09 19:24:10 -04:00
Doug Blank	9618cd6964	Import integration libraries first (#7650 ) * Import intergration libraries first * isort and black happiness * flake8 happiness * Add a test * Black reformat * Ignore import order in tests * A heavy-handed method of disabling comet for tests * Remove comet_ml tests * Run black on setup.py	2020-10-09 12:13:22 -04:00
sgugger	4dcc424de3	Complete release instruction	2020-10-09 12:12:03 -04:00
Sylvain Gugger	a3cea6a8cc	Better links for models in READMED and doc index (#7680 )	2020-10-09 11:17:16 -04:00
Sam Shleifer	0af53b1ef9	Delete extra test file (#7681 )	2020-10-09 11:16:35 -04:00
Stas Bekman	b0f05e0c4c	[pegasus] Faster tokenizer tests (#7672 )	2020-10-09 11:10:32 -04:00
sgugger	bc00b37a0d	Revert "Better model links in the README and index" This reverts commit `76e05518bb`.	2020-10-09 10:56:13 -04:00
sgugger	76e05518bb	Better model links in the README and index	2020-10-09 10:54:40 -04:00
Julien Plu	9ad830596d	Fix dataset cardinality (#7678 ) * Fix test * Fix cardinality issue * Fix test	2020-10-09 10:38:25 -04:00
Joe Davison	a1ac082879	add license to xlm-roberta-large-xnli card	2020-10-09 09:16:06 -04:00
Funtowicz Morgan	21ed3a6b99	Reintroduce clean_text on BertTokenizer call which was removed by mistake in #4723 (#5749 ) * Reintroduce clean_text call which was removed by mistake in #4723 Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Added unittest for clean_text parameter on Bert tokenizer. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Better unittest name. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Adapt unittest to use untrained tokenizer. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Code quality + update test Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2020-10-09 08:07:28 -04:00
Noah Trenaman	5668fdb09e	Update XLM-RoBERTa details (#7669 )	2020-10-09 05:16:58 -04:00
guhur	0578a91300	fix nn.DataParallel compatibility with PyTorch 1.5 (#7671 ) The same type of errors as in https://github.com/huggingface/transformers/pull/4300	2020-10-09 05:15:08 -04:00
Sam Shleifer	297233fa92	[s2s] Switch README urls to cdn (#7670 )	2020-10-08 21:22:22 -04:00
Sam Shleifer	a1ecc90d6b	[pseudo] Switch URLS to CDN (#7661 )	2020-10-08 14:12:39 -04:00
Suraj Patil	06a973fd2a	[s2s] configure lr_scheduler from command line (#7641 )	2020-10-08 13:06:35 -04:00
Lysandre Debut	4a00613c24	Fix RobertaForCausalLM docs (#7642 ) * Fix RobertaForCausalLM docs * Apply review suggestion Co-authored-by: sgugger <sylvain.gugger@gmail,com> Co-authored-by: sgugger <sylvain.gugger@gmail,com>	2020-10-08 08:36:00 -04:00
Thomas Wolf	55cb2ee62e	Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) (#7658 ) * pin torch-hub test * add protobuf dep	2020-10-08 13:21:15 +02:00
Thomas Wolf	9aeacb58ba	Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141 ) * [WIP] SP tokenizers * fixing tests for T5 * WIP tokenizers * serialization * update T5 * WIP T5 tokenization * slow to fast conversion script * Refactoring to move tokenzier implementations inside transformers * Adding gpt - refactoring - quality * WIP adding several tokenizers to the fast world * WIP Roberta - moving implementations * update to dev4 switch file loading to in-memory loading * Updating and fixing * advancing on the tokenizers - updating do_lower_case * style and quality * moving forward with tokenizers conversion and tests * MBart, T5 * dumping the fast version of transformer XL * Adding to autotokenizers + style/quality * update init and space_between_special_tokens * style and quality * bump up tokenizers version * add protobuf * fix pickle Bert JP with Mecab * fix newly added tokenizers * style and quality * fix bert japanese * fix funnel * limite tokenizer warning to one occurence * clean up file * fix new tokenizers * fast tokenizers deep tests * WIP adding all the special fast tests on the new fast tokenizers * quick fix * adding more fast tokenizers in the fast tests * all tokenizers in fast version tested * Adding BertGenerationFast * bump up setup.py for CI * remove BertGenerationFast (too early) * bump up tokenizers version * Clean old docstrings * Typo * Update following Lysandre comments Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>	2020-10-08 11:32:16 +02:00

1 2 3 4 5 ...

5495 Commits