transformers/docs/source/main_classes
Suraj Patil 88ef8893cd
Add caching mechanism to BERT, RoBERTa (#9183)
* add past_key_values

* add use_cache option

* make mask before cutting ids

* adjust position_ids according to past_key_values

* flatten past_key_values

* fix positional embeds

* fix _reorder_cache

* set use_cache to false when not decoder, fix attention mask init

* add test for caching

* add past_key_values for Roberta

* fix position embeds

* add caching test for roberta

* add doc

* make style

* doc, fix attention mask, test

* small fixes

* adress patrick's comments

* input_ids shouldn't start with pad token

* use_cache only when decoder

* make consistent with bert

* make copies consistent

* add use_cache to encoder

* add past_key_values to tapas attention

* apply suggestions from code review

* make coppies consistent

* add attn mask in tests

* remove copied from longformer

* apply suggestions from code review

* fix bart test

* nit

* simplify model outputs

* fix doc

* fix output ordering
2020-12-23 23:01:32 +05:30
..
callback.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
configuration.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
logging.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
model.rst [Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init (#9054) 2020-12-16 13:03:32 +01:00
optimizer_schedules.rst Seq2seq trainer (#9241) 2020-12-22 11:33:44 -05:00
output.rst Add caching mechanism to BERT, RoBERTa (#9183) 2020-12-23 23:01:32 +05:30
pipelines.rst TableQuestionAnsweringPipeline (#9145) 2020-12-16 12:31:50 -05:00
processors.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
tokenizer.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
trainer.rst Seq2seq trainer (#9241) 2020-12-22 11:33:44 -05:00