transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Sylvain Gugger	f3065abdb8	Doc tokenizer (#6110 ) * Start doc tokenizers * Tokenizer documentation * Start doc tokenizers * Tokenizer documentation * Formatting after rebase * Formatting after merge * Update docs/source/main_classes/tokenizer.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address comment * Update src/transformers/tokenization_utils_base.py Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> * Address Thom's comments Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>	2020-07-30 14:51:19 -04:00
guillaume-be	e642c78908	Addition of a DialoguePipeline (#5516 ) * initial commit for pipeline implementation Addition of input processing and history concatenation * Conversation pipeline tested and working for single & multiple conversation inputs * Added docstrings for dialogue pipeline * Addition of dialogue pipeline integration tests * Delete test_t5.py * Fixed max code length * Updated styling * Fixed test broken by formatting tools * Removed unused import * Added unit test for DialoguePipeline * Fixed Tensorflow compatibility * Fixed multi-framework support using framework flag * - Fixed docstring - Added `min_length_for_response` as an initialization parameter - Renamed `args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]` - Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input - renamed pipeline name from dialogue to conversational - removed hardcoded default value of 1000 and use config.max_length instead - added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation - fixed bug in history truncation method * - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised) * - Simplified input tensor conversion * - Updated attention_mask value for Tensorflow compatibility * - Updated last dialogue reference to conversational & fixed integration tests * Fixed conflict with master * Updates following review comments * Updated formatting * Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs * Update src/transformers/pipelines.py Updated docsting following review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-07-30 14:11:39 -04:00
Lysandre Debut	ec0267475c	Fix FlauBERT GPU test (#6142 ) * Fix GPU test * Remove legacy constructor	2020-07-30 11:11:48 -04:00
Sylvain Gugger	91cb95461e	Switch from return_tuple to return_dict (#6138 ) * Switch from return_tuple to return_dict * Fix test * [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614) * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests * AutoModels Tiny tweaks * Style * Final changes before merge * Re-order for simpler review * Final fixes * Addressing @sgugger's comments * Test MultipleChoice * Rework TF trainer (#6038) * Fully rework training/prediction loops * fix method name * Fix variable name * Fix property name * Fix scope * Fix method name * Fix tuple index * Fix tuple index * Fix indentation * Fix variable name * fix eval before log * Add drop remainder for test dataset * Fix step number + fix logging datetime * fix eval loss value * use global step instead of step + fix logging at step 0 * Fix logging datetime * Fix global_step usage * Fix breaking loop + logging datetime * Fix step in prediction loop * Fix step breaking * Fix train/test loops * Force TF at least 2.2 for the trainer * Use assert_cardinality to facilitate the dataset size computation * Log steps per epoch * Make tfds compliant with TPU * Make tfds compliant with TPU * Use TF dataset enumerate instead of the Python one * revert previous commit * Fix data_dir * Apply style * rebase on master * Address Sylvain's comments * Address Sylvain's and Lysandre comments * Trigger CI * Remove unused import * Switch from return_tuple to return_dict * Fix test * Add recent model Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Plu <plu.julien@gmail.com>	2020-07-30 09:17:00 -04:00
Sylvain Gugger	562b6369c4	Tf trainer cleanup (#6143 ) * Clean up TFTrainer * Add import * Fix conflicts	2020-07-30 09:13:16 -04:00
Oren Amsalem	c127d055e6	add another e.g. to avoid confusion (#6055 )	2020-07-30 08:53:35 -04:00
Oren Amsalem	d24ea708d7	Actually the extra_id are from 0-99 and not from 1-100 (#5967 ) a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True) print(a) >tensor([[ 62, 530, 3, 9, 32000]]) a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True) print(a) >tensor([[ 62, 530, 3, 9, 3, 2, 25666, 834, 23, 26, 834, 2915, 3155]])	2020-07-30 06:13:29 -04:00
Stas Bekman	3212b8850d	[s2s] add support for overriding config params (#6149 )	2020-07-30 01:09:46 -04:00
Julien Plu	54f9fbeff8	Rework TF trainer (#6038 ) * Fully rework training/prediction loops * fix method name * Fix variable name * Fix property name * Fix scope * Fix method name * Fix tuple index * Fix tuple index * Fix indentation * Fix variable name * fix eval before log * Add drop remainder for test dataset * Fix step number + fix logging datetime * fix eval loss value * use global step instead of step + fix logging at step 0 * Fix logging datetime * Fix global_step usage * Fix breaking loop + logging datetime * Fix step in prediction loop * Fix step breaking * Fix train/test loops * Force TF at least 2.2 for the trainer * Use assert_cardinality to facilitate the dataset size computation * Log steps per epoch * Make tfds compliant with TPU * Make tfds compliant with TPU * Use TF dataset enumerate instead of the Python one * revert previous commit * Fix data_dir * Apply style * rebase on master * Address Sylvain's comments * Address Sylvain's and Lysandre comments * Trigger CI * Remove unused import	2020-07-29 14:32:01 -04:00
Lysandre Debut	3f94170a10	[WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614 ) * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests * AutoModels Tiny tweaks * Style * Final changes before merge * Re-order for simpler review * Final fixes * Addressing @sgugger's comments * Test MultipleChoice	2020-07-29 14:26:26 -04:00
Sylvain Gugger	8a8ae27617	Use google style to document properties (#6130 ) * Use google style to document properties * Update src/transformers/configuration_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-07-29 12:28:12 -04:00
Julien Plu	fc64559c45	Fix TF CTRL model naming (#6134 )	2020-07-29 12:20:00 -04:00
Lysandre Debut	641b873c13	XLNet PLM Readme (#6121 )	2020-07-29 11:38:15 -04:00
Timo Moeller	8d157c930b	add deepset/xlm-roberta-large-squad2 model card (#6128 ) * Add xlm-r QA model card * Add tags	2020-07-29 17:34:16 +02:00
Funtowicz Morgan	6c002853a6	Added capability to quantize a model while exporting through ONNX. (#6089 ) * Added capability to quantize a model while exporting through ONNX. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> We do not support multiple extensions Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Reformat files Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * More quality Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Ensure test_generate_identified_name compares the same object types Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added documentation everywhere on ONNX exporter Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use pathlib.Path instead of plain-old string Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use f-string everywhere Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use the correct parameters for black formatting Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use Python 3 super() style. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use packaging.version to ensure installed onnxruntime version match requirements Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fixing imports sorting order. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Missing raise(s) Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added quantization documentation Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix some spelling. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix bad list header format Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-07-29 13:21:29 +02:00
Sylvain Gugger	25de74ccfe	Use FutureWarning to deprecate (#6111 )	2020-07-29 05:20:53 -04:00
Funtowicz Morgan	640550fc7a	ONNX documentation (#5992 ) * Move torchscript and add ONNX documentation under modle_export Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Remove previously introduced tree element Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * WIP doc Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * ONNX documentation Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix invalid link Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Improve spelling Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Final wording pass Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-07-29 11:02:35 +02:00
Sam Shleifer	92f8ce2ed6	Fix deebert tests (#6102 )	2020-07-28 18:30:16 -04:00
Sam Shleifer	c49cd927f7	[Fix] position_ids tests again (#6100 )	2020-07-28 18:29:35 -04:00
Sam Shleifer	40796c5801	[fix] add bart to LM_MAPPING (#6099 )	2020-07-28 18:29:18 -04:00
Sam Shleifer	5abe50381a	Fix #6096 : MBartTokenizer's mask token (#6098 )	2020-07-28 18:27:58 -04:00
Joe Davison	b1c8b76907	Fix zero-shot pipeline single seq output shape (#6104 )	2020-07-28 14:46:03 -04:00
Lysandre Debut	06834bc332	Logs should not be hidden behind a logger.info (#6097 )	2020-07-28 12:44:25 -04:00
Sam Shleifer	dafa296c95	[s2s] Delete useless method, log tokens_per_batch (#6081 )	2020-07-28 11:24:23 -04:00
Tanmay Thakur	dc4755c6d5	create model-card for lordtt13/emo-mobilebert (#6030 )	2020-07-28 10:00:23 -04:00
Sylvain Gugger	28931f81b7	Fix #6092 (#6093 ) * Fix #6092 * Format	2020-07-28 09:48:39 -04:00
Manuel Romero	5e97c82940	Create README.md (#6076 )	2020-07-28 09:36:00 -04:00
Clement	54f49af4ae	Add inference widget examples (#5825 )	2020-07-28 09:14:00 -04:00
Sylvain Gugger	0206efb4cf	Make all data collators accept dict (#6065 ) * Make all data collators accept dict * Style	2020-07-28 09:08:20 -04:00
Sam Shleifer	31a5486e42	github issue template suggests who to tag (#5790 ) Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Teven <teven.lescao@gmail.com>	2020-07-28 08:41:27 -04:00
Stas Bekman	f0c70085c2	link to README.md (#6068 ) * add a link to README.md * Update README.md	2020-07-28 20:34:58 +08:00
Pavel Soriano	4f814fd587	[Model Card] camembert-base-squadFR-fquad-piaf (#6087 )	2020-07-28 20:33:52 +08:00
Sam Shleifer	3c7fbf35a6	MBART: support summarization tasks where max_src_len > max_tgt_len (#6003 ) * MBART: support summarization tasks * fix test * Style * add tokenizer test	2020-07-28 08:18:11 -04:00
Tanmay Thakur	842eb45606	New Community NB Add (#5824 ) Signed-off-by: lordtt13 <thakurtanmay72@yahoo.com>	2020-07-28 04:25:12 -04:00
Andrés Felipe Cruz	018d61fa24	Moving transformers package import statements to relative imports in some files (#5796 ) * Moving rom transformers statements to relative imports in some files under src/ * Import order Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-07-28 04:19:17 -04:00
Lysandre Debut	7214954db4	Should return a tuple for serialization (#6061 )	2020-07-28 03:14:31 -04:00
Sam Shleifer	7a68d40138	[s2s] Don't mention packed data in README (#6079 )	2020-07-27 20:07:21 -04:00
Sam Shleifer	b7345d22d0	[fix] no warning for position_ids buffer (#6063 )	2020-07-27 20:00:44 -04:00
Sam Shleifer	1e00ef681d	[s2s] dont document packing because it hurts performance (#6077 )	2020-07-27 18:26:00 -04:00
sgugger	9d0d3a6645	Pin TF while we wait for a fix	2020-07-27 18:03:09 -04:00
Ramsri Goutham Golla	769e6ba01f	Create README.md (#6032 ) Adding model card - readme	2020-07-27 16:25:37 -04:00
Sylvain Gugger	fd347e0da7	Add fire to setup.cfg to make isort happy (#6066 )	2020-07-27 15:17:33 -04:00
Sam Shleifer	11792d7826	CL util to convert models to fp16 before upload (#5953 )	2020-07-27 12:21:25 -04:00
Sam Shleifer	4302ace5bd	[pack_dataset] don't sort before packing, only pack train (#5954 )	2020-07-27 12:14:23 -04:00
Suraj Patil	c8bdf7f4ec	Add new AutoModel classes in pipeline (#6062 ) * use new AutoModel classed * make style and quality	2020-07-27 11:50:08 -04:00
Cola	5779e5434d	✏️ Fix typo (#5734 )	2020-07-27 10:55:15 -04:00
Suraj Patil	d1d15d6f2d	[examples (seq2seq)] fix preparing decoder_input_ids for T5 (#5994 )	2020-07-27 10:10:43 -04:00
Joe Davison	3deffc1d67	Zero shot classification pipeline (#5760 ) * add initial zero-shot pipeline * change default args * update default template * add label string splitting * add str labels support, remove nli from name * style * add input validation and working tf defaults * tests * quality check * add docstring to __call__ * add slow tests * Change truncation to only_first also lower precision on tests for readibility * style	2020-07-27 09:42:58 -04:00
Sylvain Gugger	1246b20f6d	Fix the return documentation rendering for all model outputs (#6022 ) * Fix the return documentation rendering for all model outputs * Formatting	2020-07-27 09:18:59 -04:00
Sylvain Gugger	3b64ad5d5c	Remove unused file (#6023 )	2020-07-27 08:31:24 -04:00

1 2 3 4 5 ...

4689 Commits