[doc] Add blurb about large-scale model downloads

cc @n1t0 @lysandrejik @thomwolf
This commit is contained in:
Julien Chaumond 2019-09-02 12:27:11 -04:00
parent 7b0c99add9
commit 2dcc5a1629

View File

@ -52,6 +52,12 @@ If you want to reproduce the original tokenization process of the ``OpenAI GPT``
If you don't install ``ftfy`` and ``SpaCy``\ , the ``OpenAI GPT`` tokenizer will default to tokenize using BERT's ``BasicTokenizer`` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
Note on model downloads (Continuous Integration or large-scale deployments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help.
Do you want to run a Transformer model on a mobile device?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^