transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-14 01:58:22 +06:00

History

Suraj Patil 88ef8893cd Add caching mechanism to BERT, RoBERTa (#9183 ) * add past_key_values * add use_cache option * make mask before cutting ids * adjust position_ids according to past_key_values * flatten past_key_values * fix positional embeds * fix _reorder_cache * set use_cache to false when not decoder, fix attention mask init * add test for caching * add past_key_values for Roberta * fix position embeds * add caching test for roberta * add doc * make style * doc, fix attention mask, test * small fixes * adress patrick's comments * input_ids shouldn't start with pad token * use_cache only when decoder * make consistent with bert * make copies consistent * add use_cache to encoder * add past_key_values to tapas attention * apply suggestions from code review * make coppies consistent * add attn mask in tests * remove copied from longformer * apply suggestions from code review * fix bart test * nit * simplify model outputs * fix doc * fix output ordering		2020-12-23 23:01:32 +05:30
..
callback.rst	Copyright (#8970 )	2020-12-07 18:36:34 -05:00
configuration.rst	Copyright (#8970 )	2020-12-07 18:36:34 -05:00
logging.rst	Copyright (#8970 )	2020-12-07 18:36:34 -05:00
model.rst	[Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init (#9054 )	2020-12-16 13:03:32 +01:00
optimizer_schedules.rst	Seq2seq trainer (#9241 )	2020-12-22 11:33:44 -05:00
output.rst	Add caching mechanism to BERT, RoBERTa (#9183 )	2020-12-23 23:01:32 +05:30
pipelines.rst	TableQuestionAnsweringPipeline (#9145 )	2020-12-16 12:31:50 -05:00
processors.rst	Copyright (#8970 )	2020-12-07 18:36:34 -05:00
tokenizer.rst	Copyright (#8970 )	2020-12-07 18:36:34 -05:00
trainer.rst	Seq2seq trainer (#9241 )	2020-12-22 11:33:44 -05:00