Finish Making Quick Tour respect the model object (#11467)

* finish quicktour

* fix import

* fix print

* explain config default better

* Update docs/source/quicktour.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Hamel Husain 2021-04-27 07:04:12 -07:00 committed by GitHub
parent 88ac60f7b5
commit 7ceff67e1a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -285,16 +285,24 @@ We can see we get the numbers from before:
tensor([[2.2043e-04, 9.9978e-01],
[5.3086e-01, 4.6914e-01]], grad_fn=<SoftmaxBackward>)
If you have labels, you can provide them to the model, it will return a tuple with the loss and the final activations.
If you provide the model with labels in addition to inputs, the model output object will also contain a ``loss``
attribute:
.. code-block::
>>> ## PYTORCH CODE
>>> import torch
>>> pt_outputs = pt_model(**pt_batch, labels = torch.tensor([1, 0]))
>>> print(pt_outputs)
SequenceClassifierOutput(loss=tensor(0.3167, grad_fn=<NllLossBackward>), logits=tensor([[-4.0833, 4.3364],
[ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
>>> ## TENSORFLOW CODE
>>> import tensorflow as tf
>>> tf_outputs = tf_model(tf_batch, labels = tf.constant([1, 0]))
>>> print(tf_outputs)
TFSequenceClassifierOutput(loss=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([2.2051287e-04, 6.3326043e-01], dtype=float32)>, logits=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-4.0832963 , 4.3364143 ],
[ 0.081807 , -0.04178282]], dtype=float32)>, hidden_states=None, attentions=None)
Models are standard `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ or `tf.keras.Model
<https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ so you can use them in your usual training loop. 🤗
@ -322,6 +330,7 @@ loading a saved PyTorch model in a TensorFlow model, use :func:`~transformers.TF
.. code-block::
from transformers import TFAutoModel
tokenizer = AutoTokenizer.from_pretrained(save_directory)
model = TFAutoModel.from_pretrained(save_directory, from_pt=True)
@ -329,6 +338,7 @@ and if you are loading a saved TensorFlow model in a PyTorch model, you should u
.. code-block::
from transformers import AutoModel
tokenizer = AutoTokenizer.from_pretrained(save_directory)
model = AutoModel.from_pretrained(save_directory, from_tf=True)
@ -339,10 +349,12 @@ Lastly, you can also ask the model to return all hidden states and all attention
>>> ## PYTORCH CODE
>>> pt_outputs = pt_model(**pt_batch, output_hidden_states=True, output_attentions=True)
>>> all_hidden_states, all_attentions = pt_outputs[-2:]
>>> all_hidden_states = pt_outputs.hidden_states
>>> all_attentions = pt_outputs.attentions
>>> ## TENSORFLOW CODE
>>> tf_outputs = tf_model(tf_batch, output_hidden_states=True, output_attentions=True)
>>> all_hidden_states, all_attentions = tf_outputs[-2:]
>>> all_hidden_states = tf_outputs.hidden_states
>>> all_attentions = tf_outputs.attentions
Accessing the code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -375,16 +387,16 @@ directly instantiate model and tokenizer without the auto magic:
Customizing the model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to change how the model itself is built, you can define your custom configuration class. Each architecture
comes with its own relevant configuration (in the case of DistilBERT, :class:`~transformers.DistilBertConfig`) which
allows you to specify any of the hidden dimension, dropout rate, etc. If you do core modifications, like changing the
hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would then
instantiate the model directly from this configuration.
If you want to change how the model itself is built, you can define a custom configuration class. Each architecture
comes with its own relevant configuration. For example, :class:`~transformers.DistilBertConfig` allows you to specify
parameters such as the hidden dimension, dropout rate, etc for DistilBERT. If you do core modifications, like changing
the hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would
then instantiate the model directly from this configuration.
Here we use the predefined vocabulary of DistilBERT (hence load the tokenizer with the
:func:`~transformers.DistilBertTokenizer.from_pretrained` method) and initialize the model from scratch (hence
instantiate the model from the configuration instead of using the
:func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method).
Below, we load a predefined vocabulary for a tokenizer with the
:func:`~transformers.DistilBertTokenizer.from_pretrained` method. However, unlike the tokenizer, we wish to initialize
the model from scratch. Therefore, we instantiate the model from a configuration instead of using the
:func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method.
.. code-block::
@ -401,9 +413,9 @@ instantiate the model from the configuration instead of using the
For something that only changes the head of the model (for instance, the number of labels), you can still use a
pretrained model for the body. For instance, let's define a classifier for 10 different labels using a pretrained body.
We could create a configuration with all the default values and just change the number of labels, but more easily, you
can directly pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the
default configuration with it:
Instead of creating a new configuration with all the default values just to change the number of labels, we can instead
pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the default
configuration appropriately:
.. code-block::