mirror of
https://github.com/huggingface/transformers.git
synced 2025-08-01 02:31:11 +06:00
Model sharing rst (#8439)
* Update RST * Finer details * Re-organize * Style
This commit is contained in:
parent
ad2303a401
commit
9cebee38ad
@ -18,39 +18,65 @@ done something similar on your task, either using the model directly in your own
|
|||||||
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on the
|
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on the
|
||||||
`model hub <https://huggingface.co/models>`__.
|
`model hub <https://huggingface.co/models>`__.
|
||||||
|
|
||||||
|
Model versioning
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Since version v3.5.0, the model hub has built-in model versioning based on git and git-lfs. It is based on the paradigm
|
||||||
|
that one model *is* one repo.
|
||||||
|
|
||||||
|
This allows:
|
||||||
|
|
||||||
|
- built-in versioning
|
||||||
|
- access control
|
||||||
|
- scalability
|
||||||
|
|
||||||
|
This is built around *revisions*, which is a way to pin a specific version of a model, using a commit hash, tag or
|
||||||
|
branch.
|
||||||
|
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
>>> tokenizer = AutoTokenizer.from_pretrained(
|
||||||
|
>>> "julien-c/EsperBERTo-small",
|
||||||
|
>>> revision="v2.0.1" # tag name, or branch name, or commit hash
|
||||||
|
>>> )
|
||||||
|
|
||||||
Basic steps
|
Basic steps
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
..
|
In order to upload a model, you'll need to first create a git repo. This repo will live on the model hub, allowing
|
||||||
When #5258 is merged, we can remove the need to create the directory.
|
users to clone it and you (and your organization members) to push to it. First, you should ensure you are logged in the
|
||||||
|
``transformers-cli``:
|
||||||
|
|
||||||
First, pick a directory with the name you want your model to have on the model hub (its full name will then be
|
Go in a terminal and run the following command. It should be in the virtual environment where you installed 🤗
|
||||||
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`) and create it with either
|
Transformers, since that command :obj:`transformers-cli` comes from the library.
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
mkdir path/to/awesome-name-you-picked
|
transformers-cli login
|
||||||
|
|
||||||
or in python
|
|
||||||
|
Once you are logged in with your model hub credentials, you can start building your repositories. To create a repo:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
import os
|
transformers-cli repo create your-model-name
|
||||||
os.makedirs("path/to/awesome-name-you-picked")
|
|
||||||
|
|
||||||
then you can save your model and tokenizer with:
|
This creates a repo on the model hub, which can be cloned. You can then add/remove from that repo as you would with any
|
||||||
|
other git repo.
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
model.save_pretrained("path/to/awesome-name-you-picked")
|
git clone https://huggingface.co/username/your-model-name
|
||||||
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
|
|
||||||
|
|
||||||
Or, if you're using the Trainer API
|
# Then commit as usual
|
||||||
|
cd your-model-name
|
||||||
|
echo "hello" >> README.md
|
||||||
|
git add . && git commit -m "Update from $USER"
|
||||||
|
|
||||||
.. code-block::
|
We are intentionally not wrapping git too much, so as to stay intuitive and easy-to-use.
|
||||||
|
|
||||||
trainer.save_model("path/to/awesome-name-you-picked")
|
|
||||||
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
|
|
||||||
|
|
||||||
Make your model work on all frameworks
|
Make your model work on all frameworks
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
@ -71,13 +97,13 @@ or removing TF. For instance, if you trained a :class:`~transformers.DistilBertF
|
|||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
from transformers import TFDistilBertForSequenceClassification
|
>>> from transformers import TFDistilBertForSequenceClassification
|
||||||
|
|
||||||
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to type
|
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to type
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
from transformers import DistilBertForSequenceClassification
|
>>> from transformers import DistilBertForSequenceClassification
|
||||||
|
|
||||||
This will give back an error if your model does not exist in the other framework (something that should be pretty rare
|
This will give back an error if your model does not exist in the other framework (something that should be pretty rare
|
||||||
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.
|
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.
|
||||||
@ -87,20 +113,20 @@ model class:
|
|||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
|
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
|
||||||
tf_model.save_pretrained("path/to/awesome-name-you-picked")
|
>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
|
||||||
|
|
||||||
and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
|
and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
|
||||||
model class:
|
model class:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
|
>>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
|
||||||
pt_model.save_pretrained("path/to/awesome-name-you-picked")
|
>>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
|
||||||
|
|
||||||
That's all there is to it!
|
That's all there is to it!
|
||||||
|
|
||||||
Check the directory before uploading
|
Check the directory before pushing to the model hub.
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Make sure there are no garbage files in the directory you'll upload. It should only have:
|
Make sure there are no garbage files in the directory you'll upload. It should only have:
|
||||||
@ -116,62 +142,46 @@ Make sure there are no garbage files in the directory you'll upload. It should o
|
|||||||
|
|
||||||
Other files can safely be deleted.
|
Other files can safely be deleted.
|
||||||
|
|
||||||
Upload your model with the CLI
|
|
||||||
|
Uploading your files
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Now go in a terminal and run the following command. It should be in the virtual environment where you installed 🤗
|
Once the repo is cloned, you can add the model, configuration and tokenizer files. For instance, saving the model and
|
||||||
Transformers, since that command :obj:`transformers-cli` comes from the library.
|
tokenizer files:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
transformers-cli login
|
>>> model.save_pretrained("path/to/repo/clone/your-model-name")
|
||||||
|
>>> tokenizer.save_pretrained("path/to/repo/clone/your-model-name")
|
||||||
|
|
||||||
Then log in using the same credentials as on huggingface.co. To upload your model, just type
|
Or, if you're using the Trainer API
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
transformers-cli upload path/to/awesome-name-you-picked/
|
>>> trainer.save_model("path/to/awesome-name-you-picked")
|
||||||
|
|
||||||
This will upload the folder containing the weights, tokenizer and configuration we prepared in the previous section.
|
You can then add these files to the staging environment and verify that they have been correctly staged with the ``git
|
||||||
|
status`` command:
|
||||||
By default you will be prompted to confirm that you want these files to be uploaded. If you are uploading multiple
|
|
||||||
models and need to script that process, you can add `-y` to bypass the prompt. For example:
|
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
transformers-cli upload -y path/to/awesome-name-you-picked/
|
git add --all
|
||||||
|
git status
|
||||||
|
|
||||||
|
Finally, the files should be comitted:
|
||||||
If you want to upload a single file (a new version of your model, or the other framework checkpoint you want to add),
|
|
||||||
just type:
|
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
transformers-cli upload path/to/awesome-name-you-picked/that-file
|
git commit -m "First version of the your-model-name model and tokenizer."
|
||||||
|
|
||||||
or
|
And pushed to the remote:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
transformers-cli upload path/to/awesome-name-you-picked/that-file --filename awesome-name-you-picked/new_name
|
git push
|
||||||
|
|
||||||
if you want to change its filename.
|
This will upload the folder containing the weights, tokenizer and configuration we have just prepared.
|
||||||
|
|
||||||
This uploads the model to your personal account. If you want your model to be namespaced by your organization name
|
|
||||||
rather than your username, add the following flag to any command:
|
|
||||||
|
|
||||||
.. code-block::
|
|
||||||
|
|
||||||
--organization organization_name
|
|
||||||
|
|
||||||
so for instance:
|
|
||||||
|
|
||||||
.. code-block::
|
|
||||||
|
|
||||||
transformers-cli upload path/to/awesome-name-you-picked/ --organization organization_name
|
|
||||||
|
|
||||||
Your model will then be accessible through its identifier, which is, as we saw above,
|
|
||||||
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`.
|
|
||||||
|
|
||||||
Add a model card
|
Add a model card
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
@ -203,20 +213,15 @@ Anyone can load it from code:
|
|||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
|
>>> tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
|
||||||
model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
|
>>> model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
|
||||||
|
|
||||||
Additional commands
|
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
|
|
||||||
You can list all the files you uploaded on the hub like this:
|
You may specify a revision by using the ``revision`` flag in the ``from_pretrained`` method:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
transformers-cli s3 ls
|
>>> tokenizer = AutoTokenizer.from_pretrained(
|
||||||
|
>>> "julien-c/EsperBERTo-small",
|
||||||
You can also delete unneeded files with
|
>>> revision="v2.0.1" # tag name, or branch name, or commit hash
|
||||||
|
>>> )
|
||||||
.. code-block::
|
|
||||||
|
|
||||||
transformers-cli s3 rm awesome-name-you-picked/filename
|
|
||||||
|
Loading…
Reference in New Issue
Block a user