transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Suraj Patil	18177a1a60	lm_labels => labels (#5080 )	2020-06-18 09:16:29 +02:00
Lysandre	efeb75b805	Remove misleading comment closes #4958	2020-06-17 18:24:35 -04:00
Saurabh Misra	bb154ac50c	Fixing TPU training by disabling wandb.watch gradients logging for TPU (#4926 )	2020-06-17 18:04:11 -04:00
Suraj Patil	fb6cccb863	fix qa example (#4929 )	2020-06-17 17:54:16 -04:00
Karthikeyan Singaravelan	38bba9cdd5	Fix deprecation warnings due to invalid escape sequences. (#4924 )	2020-06-17 17:46:58 -04:00
Sam Shleifer	f1a3d03741	add pandas to setup.cfg (#5093 )	2020-06-17 16:39:17 -04:00
Sam Shleifer	90c833870c	[MarianTokenizer] Switch to sacremoses for punc normalization (#5092 )	2020-06-17 16:31:05 -04:00
Pranav Dayanand Pawar	049e14f0e3	very minor spelling correction in script command (#5090 ) actual script name - counts_parameters.py	2020-06-17 16:08:43 -04:00
Sylvain Gugger	20fa828984	Make default_data_collator more flexible and deprecate old behavior (#5060 ) * Make default_data_collator more flexible * Accept tensors for all features * Document code * Refactor * Formatting	2020-06-17 15:24:51 -04:00
Yacine Jernite	5e06963394	Some changes to simplify the generation function (#5031 ) * moving logits post-processing out of beam search * moving logits post-processing out of beam search * first step cache * fix_Encoder_Decoder * patrick_version_postprocess * add_keyword_arg	2020-06-17 14:48:06 -04:00
Sylvain Gugger	204ebc25e6	Update installation page and add contributing to the doc (#5084 ) * Update installation page and add contributing to the doc * Remove mention of symlinks	2020-06-17 14:01:10 -04:00
Sam Shleifer	043f9f51f9	[examples] SummarizationModule improvements (#4951 )	2020-06-17 13:51:34 -04:00
Sylvain Gugger	cd40f6564e	Add header and fix command (#5082 )	2020-06-17 11:45:05 -04:00
Julien Chaumond	70bc3ead4f	[TextClassificationPipeline] Hotfix: make json serializable	2020-06-17 15:09:27 +00:00
Sylvain Gugger	7291ea0bff	Reorganize documentation (#5064 ) * Reorganize topics and add all models	2020-06-17 07:55:20 -04:00
Sylvain Gugger	e4aaa45805	Update pipeline examples to doctest syntax (#5030 )	2020-06-16 18:14:58 -04:00
Sylvain Gugger	011cc0be51	Fix all sphynx warnings (#5068 )	2020-06-16 16:50:02 -04:00
flozi00	af497b5672	Typo (#5069 )	2020-06-16 16:46:20 -04:00
Yacine Jernite	49c5202522	Eli5 examples (#4968 ) * add eli5 examples * add dense query script * query_di * merging * merging * add_utils * adds nearest neighbor wikipedia * batch queries * training_retriever * new notebooks * moved retriever traiing script * finished wiki40b * max_len_fix * train_s2s * retriever_batch_checkpointing * cleanup * merge * dim_fix * fix_indexer * fix_wiki40b_snippets * fix_embed_for_r * fp32 index * fix_sparse_q * joint_training * remove obsolete datasets * add_passage_nn_results * add_passage_nn_results * add_batch_nn * add_batch_nn * add_data_scripts * notebook * notebook * notebook * fix_multi_gpu * add_app * full_caching * full_caching * notebook * sparse_done * images * notebook * add_image_gif * with_Gif * add_contr_image * notebook * notebook * notebook * train_functions * notebook * min_retrieval_length * pandas_option * notebook * min_retrieval_length * notebook * notebook * eval_Retriever * notebook * images * notebook * add_example * add_example * notebook * fireworks * notebook * notebook * joe's notebook comments * app_update * notebook * notebook_link * captions * notebook * assing RetriBert model * add RetriBert to Auto * change AutoLMHead to AutoSeq2Seq * notebook downloads from hf models * style_black * style_black * app_update * app_update * fix_app_update * style * style * isort * Delete WikiELI5training.ipynb * Delete evaluate_eli5.py * Delete WikiELI5explore.ipynb * Delete ExploreWikiELI5Support.html * Delete explainlikeimfive.py * Delete wiki_snippets.py * children before parent * children before parent * style_black * style_black_only * isort * isort_new * Update src/transformers/modeling_retribert.py Co-authored-by: Julien Chaumond <chaumond@gmail.com> * typo fixes * app_without_asset * cleanup * Delete ELI5animation.gif * Delete ELI5contrastive.svg * Delete ELI5wiki_index.svg * Delete choco_bis.svg * Delete fireworks.gif * Delete huggingface_logo.jpg * Delete huggingface_logo.svg * Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb * Delete eli5_app.py * Delete eli5_utils.py * readme * Update README.md * unused imports * moved_info * default_beam * ftuned model * disclaimer * Update src/transformers/modeling_retribert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * black * add_doc * names * isort_Examples * isort_Examples * Add doc to index Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2020-06-16 16:36:58 -04:00
Sam Shleifer	c3e607496c	[cleanup] examples test_run_squad uses tiny model (#5059 )	2020-06-16 14:06:45 -04:00
Sylvain Gugger	439aa1d6e9	Remove old section + caching in install (#5027 )	2020-06-16 13:03:41 -04:00
Sam Shleifer	3d495c61ef	Fix marian tokenizer save pretrained (#5043 )	2020-06-16 09:48:19 -04:00
Sylvain Gugger	d5477baf7d	Convert hans to Trainer (#5025 ) * Convert hans to Trainer * Tick box	2020-06-16 08:06:31 -04:00
Amil Khare	c852036b4a	[cleanup] Hoist ModelTester objects to top level (#4939 ) Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-06-16 08:03:43 -04:00
Manuel Romero	0c55a384f8	Add reference to NLP dataset (#5028 ) * Add reference to NLP dataset * Update README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-06-16 04:19:09 -04:00
Manuel Romero	0946d1209d	Add reference to NLP (package) dataset (#5029 ) * Add reference to NLP (package) dataset * Update README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-06-16 04:17:46 -04:00
Boris Dayma	edcb3ac59a	refactor(wandb): consolidate import (#5044 )	2020-06-16 03:40:43 -04:00
Funtowicz Morgan	9e03364999	Ability to pickle/unpickle BatchEncoding pickle (reimport) (#5039 ) * Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer. * Added __get_state__() & __set_state__() to be pickable. * Correct tokens() return type from List[int] to List[str] * Added unittest for BatchEncoding pickle/unpickle * Added unittest for BatchEncoding is_fast * More careful checking on BatchEncoding unpickle tests. * Formatting. * is_fast should assertTrue on Rust tokenizers. * Ensure tensorflow has correct way of checking array_equal * More formatting.	2020-06-16 09:25:25 +02:00
Sylvain Gugger	f9f8a5312e	Add DistilBertForMultipleChoice (#5032 ) * Add `DistilBertForMultipleChoice`	2020-06-15 18:31:41 -04:00
Anthony MOI	36434220fc	[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510 ) * Use tokenizers pre-tokenized pipeline * failing pretrokenized test * Fix is_pretokenized in python * add pretokenized tests * style and quality * better tests for batched pretokenized inputs * tokenizers clean up - new padding_strategy - split the files * [HUGE] refactoring tokenizers - padding - truncation - tests * style and quality * bump up requied tokenizers version to 0.8.0-rc1 * switched padding/truncation API - simpler better backward compat * updating tests for custom tokenizers * style and quality - tests on pad * fix QA pipeline * fix backward compatibility for max_length only * style and quality * Various cleans up - add verbose * fix tests * update docstrings * Fix tests * Docs reformatted * __call__ method documented Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2020-06-15 17:12:51 -04:00
Patrick von Platen	ebba39e4e1	[Bart] Question Answering Model is added to tests (#5024 ) * fix test * Update tests/test_modeling_common.py * Update tests/test_modeling_common.py	2020-06-15 22:50:09 +02:00
Sylvain Gugger	bbad4c6989	Add position_ids (#5021 )	2020-06-15 15:50:17 -04:00
Boris Dayma	1bf4098e03	feat(TFTrainer): improve logging (#4946 ) * feat(tftrainer): improve logging * fix(trainer): consider case with evaluation only * refactor(tftrainer): address comments * refactor(tftrainer): move self.epoch_logging to __init__	2020-06-15 14:06:17 -04:00
Funtowicz Morgan	7b5a1e7d51	Fix importing transformers on Windows (#4997 )	2020-06-15 19:36:57 +02:00
Sam Shleifer	a9f1fc6c94	Add bart-base (#5014 )	2020-06-15 13:29:26 -04:00
Funtowicz Morgan	7b685f5229	Increase pipeline support for ONNX export. (#5005 ) * Increase pipeline support for ONNX export. * Style.	2020-06-15 19:13:58 +02:00
Sylvain Gugger	1affde2f10	Make DataCollator a callable (#5015 ) * Make DataCollator a callable * Update src/transformers/data/data_collator.py Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-06-15 11:58:33 -04:00
Bram Vanroy	f7c93b3cee	Possible fix to make AMP work with DDP in the trainer (#4728 ) * manually set device in trainer args * check if current device is cuda before set_device * Explicitly set GPU ID when using single GPU This addresses https://github.com/huggingface/transformers/issues/4657#issuecomment-642228099	2020-06-15 10:10:26 -04:00
ipuneetrathore	66bcfbb130	Create README.md (#4975 ) * Create README.md * Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-06-15 08:43:50 -04:00
Stefan Schweter	d812e6d76e	NER: fix construction of input examples for RoBERTa (#4943 ) * utils_ner: do not add extra sep token for RoBERTa model * run_pl_ner: do not add extra sep token for RoBERTa model	2020-06-15 08:30:40 -04:00
Suraj Patil	ebab096e86	[model card] model card for bart-large-finetuned-squadv1 (#4977 ) * [model card] model card for bart-large-finetuned-squadv1 * add metadata link to the dataset	2020-06-15 05:39:41 -04:00
Funtowicz Morgan	9ad36ad57f	Improve ONNX logging (#4999 ) * Improve ONNX export logging to give more information about the generated graph. * Correctly handle input and output in the logging.	2020-06-15 11:04:51 +02:00
ZhuBaohe	9931f817b7	fix (#4976 )	2020-06-14 21:36:14 +02:00
Suraj Patil	9208f57b16	BartTokenizerFast (#4878 )	2020-06-14 13:04:49 -04:00
Sylvain Gugger	403d309857	Hans data (#4854 ) * Update hans data to be able to use Trainer * Fixes * Deal with tokenizer that don't have token_ids * Clean up things * Simplify data use * Fix the input dict * Formatting + proper path in README	2020-06-13 09:35:13 -04:00
Julien Chaumond	ca5e1cdf8e	model_cards: we can now tag datasets see corresponding model pages to see how it's rendered	2020-06-12 23:19:07 +02:00
Suraj Patil	e93ccb3290	BartForQuestionAnswering (#4908 )	2020-06-12 15:47:57 -04:00
Sylvain Gugger	538531cde5	Add AlbertForMultipleChoice (#4959 ) * Add AlbertForMultipleChoice * Make up to date and add all models to common tests	2020-06-12 14:20:19 -04:00
Manuel Romero	fe24139702	Create README.md (#4865 )	2020-06-12 09:03:43 -04:00
Yannis Papanikolaou	9aa219a1fe	Create README.md (#4872 )	2020-06-12 09:03:13 -04:00

1 2 3 4 5 ...

4230 Commits