diff --git a/docs/source/en/quicktour.mdx b/docs/source/en/quicktour.mdx index 4c7defc039e..221cf91ff85 100644 --- a/docs/source/en/quicktour.mdx +++ b/docs/source/en/quicktour.mdx @@ -532,12 +532,12 @@ All models are a standard [`tf.keras.Model`](https://www.tensorflow.org/api_docs ... ) # doctest: +SKIP ``` -5. When you're ready, you can call `compile` and `fit` to start training: +5. When you're ready, you can call `compile` and `fit` to start training. Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> from tensorflow.keras.optimizers import Adam - >>> model.compile(optimizer=Adam(3e-5)) + >>> model.compile(optimizer=Adam(3e-5)) # No loss argument! >>> model.fit(tf_dataset) # doctest: +SKIP ``` diff --git a/docs/source/en/tasks/language_modeling.mdx b/docs/source/en/tasks/language_modeling.mdx index ea25e17efa4..676156ede86 100644 --- a/docs/source/en/tasks/language_modeling.mdx +++ b/docs/source/en/tasks/language_modeling.mdx @@ -306,12 +306,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPre ... ) ``` -Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method): +Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> import tensorflow as tf ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` This can be done by specifying where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]: diff --git a/docs/source/en/tasks/masked_language_modeling.mdx b/docs/source/en/tasks/masked_language_modeling.mdx index c41d6493c58..c81b27f33c8 100644 --- a/docs/source/en/tasks/masked_language_modeling.mdx +++ b/docs/source/en/tasks/masked_language_modeling.mdx @@ -301,12 +301,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPre ... ) ``` -Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method): +Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> import tensorflow as tf ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` This can be done by specifying where to push your model and tokenizer in the [`~transformers.PushToHubCallback`]: diff --git a/docs/source/en/tasks/multiple_choice.mdx b/docs/source/en/tasks/multiple_choice.mdx index 4e4ff06e7f7..5a06c24d110 100644 --- a/docs/source/en/tasks/multiple_choice.mdx +++ b/docs/source/en/tasks/multiple_choice.mdx @@ -335,10 +335,10 @@ Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPre ... ) ``` -Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method): +Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` The last two things to setup before you start training is to compute the accuracy from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](../main_classes/keras_callbacks). diff --git a/docs/source/en/tasks/semantic_segmentation.mdx b/docs/source/en/tasks/semantic_segmentation.mdx index eac2507c9a1..89abe14c757 100644 --- a/docs/source/en/tasks/semantic_segmentation.mdx +++ b/docs/source/en/tasks/semantic_segmentation.mdx @@ -377,7 +377,7 @@ Start by defining the hyperparameters, optimizer and learning rate schedule: ``` Then, load SegFormer with [`TFAutoModelForSemanticSegmentation`] along with the label mappings, and compile it with the -optimizer: +optimizer. Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> from transformers import TFAutoModelForSemanticSegmentation @@ -387,7 +387,7 @@ optimizer: ... id2label=id2label, ... label2id=label2id, ... ) ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` Convert your datasets to the `tf.data.Dataset` format using the [`~datasets.Dataset.to_tf_dataset`] and the [`DefaultDataCollator`]: diff --git a/docs/source/en/tasks/sequence_classification.mdx b/docs/source/en/tasks/sequence_classification.mdx index fa15b5be30b..db7bdc15b33 100644 --- a/docs/source/en/tasks/sequence_classification.mdx +++ b/docs/source/en/tasks/sequence_classification.mdx @@ -259,12 +259,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPre ... ) ``` -Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method): +Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> import tensorflow as tf ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` The last two things to setup before you start training is to compute the accuracy from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](../main_classes/keras_callbacks). diff --git a/docs/source/en/tasks/summarization.mdx b/docs/source/en/tasks/summarization.mdx index a9b890dcea9..ce480bccb75 100644 --- a/docs/source/en/tasks/summarization.mdx +++ b/docs/source/en/tasks/summarization.mdx @@ -267,12 +267,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPre ... ) ``` -Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method): +Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> import tensorflow as tf ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` The last two things to setup before you start training is to compute the ROUGE score from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](../main_classes/keras_callbacks). diff --git a/docs/source/en/tasks/token_classification.mdx b/docs/source/en/tasks/token_classification.mdx index d85272d0f9f..90af3a21bb7 100644 --- a/docs/source/en/tasks/token_classification.mdx +++ b/docs/source/en/tasks/token_classification.mdx @@ -361,12 +361,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPre ... ) ``` -Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method): +Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> import tensorflow as tf ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` The last two things to setup before you start training is to compute the seqeval scores from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](../main_classes/keras_callbacks). diff --git a/docs/source/en/tasks/translation.mdx b/docs/source/en/tasks/translation.mdx index 361efbf54cc..530390f1c33 100644 --- a/docs/source/en/tasks/translation.mdx +++ b/docs/source/en/tasks/translation.mdx @@ -276,12 +276,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPre ... ) ``` -Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method): +Configure the model for training with [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py >>> import tensorflow as tf ->>> model.compile(optimizer=optimizer) +>>> model.compile(optimizer=optimizer) # No loss argument! ``` The last two things to setup before you start training is to compute the SacreBLEU metric from the predictions, and provide a way to push your model to the Hub. Both are done by using [Keras callbacks](../main_classes/keras_callbacks). diff --git a/docs/source/en/training.mdx b/docs/source/en/training.mdx index 52b4f157a34..24afa907aec 100644 --- a/docs/source/en/training.mdx +++ b/docs/source/en/training.mdx @@ -191,7 +191,7 @@ tokenized_data = dict(tokenized_data) labels = np.array(dataset["label"]) # Label is already an array of 0 and 1 ``` -Finally, load, [`compile`](https://keras.io/api/models/model_training_apis/#compile-method), and [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) the model: +Finally, load, [`compile`](https://keras.io/api/models/model_training_apis/#compile-method), and [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) the model. Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to: ```py from transformers import TFAutoModelForSequenceClassification @@ -200,7 +200,7 @@ from tensorflow.keras.optimizers import Adam # Load and compile our model model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased") # Lower learning rates are often better for fine-tuning transformers -model.compile(optimizer=Adam(3e-5)) +model.compile(optimizer=Adam(3e-5)) # No loss argument! model.fit(tokenized_data, labels) ``` @@ -261,7 +261,7 @@ list of samples into a batch and apply any preprocessing you want. See our Once you've created a `tf.data.Dataset`, you can compile and fit the model as before: ```py -model.compile(optimizer=Adam(3e-5)) +model.compile(optimizer=Adam(3e-5)) # No loss argument! model.fit(tf_dataset) ``` diff --git a/examples/tensorflow/contrastive-image-text/run_clip.py b/examples/tensorflow/contrastive-image-text/run_clip.py index 2c696244f62..42e6dd76b38 100644 --- a/examples/tensorflow/contrastive-image-text/run_clip.py +++ b/examples/tensorflow/contrastive-image-text/run_clip.py @@ -561,6 +561,8 @@ def main(): weight_decay_rate=training_args.weight_decay, adam_global_clipnorm=training_args.max_grad_norm, ) + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, jit_compile=training_args.xla) if not training_args.do_eval: diff --git a/examples/tensorflow/image-classification/run_image_classification.py b/examples/tensorflow/image-classification/run_image_classification.py index 6a4b7df4d0a..ea70d8f53d8 100644 --- a/examples/tensorflow/image-classification/run_image_classification.py +++ b/examples/tensorflow/image-classification/run_image_classification.py @@ -497,6 +497,8 @@ def main(): collate_fn=collate_fn, ).with_options(dataset_options) + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, jit_compile=training_args.xla, metrics=["accuracy"]) push_to_hub_model_id = training_args.push_to_hub_model_id diff --git a/examples/tensorflow/language-modeling-tpu/run_mlm.py b/examples/tensorflow/language-modeling-tpu/run_mlm.py index 30923b982e1..e9e9862a6da 100644 --- a/examples/tensorflow/language-modeling-tpu/run_mlm.py +++ b/examples/tensorflow/language-modeling-tpu/run_mlm.py @@ -235,8 +235,10 @@ def main(args): num_warmup_steps=total_train_steps // 20, init_lr=args.learning_rate, weight_decay_rate=args.weight_decay_rate, - # TODO Add the other Adam parameters? ) + + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, metrics=["accuracy"]) def decode_fn(example): diff --git a/examples/tensorflow/language-modeling/run_clm.py b/examples/tensorflow/language-modeling/run_clm.py index 645dae55be8..650459d4181 100755 --- a/examples/tensorflow/language-modeling/run_clm.py +++ b/examples/tensorflow/language-modeling/run_clm.py @@ -537,7 +537,8 @@ def main(): adam_global_clipnorm=training_args.max_grad_norm, ) - # no user-specified loss = will use the model internal loss + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, jit_compile=training_args.xla) # endregion diff --git a/examples/tensorflow/language-modeling/run_mlm.py b/examples/tensorflow/language-modeling/run_mlm.py index cff0f51df09..89d68ade4d4 100755 --- a/examples/tensorflow/language-modeling/run_mlm.py +++ b/examples/tensorflow/language-modeling/run_mlm.py @@ -559,8 +559,9 @@ def main(): adam_global_clipnorm=training_args.max_grad_norm, ) - # no user-specified loss = will use the model internal loss - model.compile(optimizer=optimizer, jit_compile=training_args.xla, run_eagerly=True) + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). + model.compile(optimizer=optimizer, jit_compile=training_args.xla) # endregion # region Preparing push_to_hub and model card diff --git a/examples/tensorflow/multiple-choice/run_swag.py b/examples/tensorflow/multiple-choice/run_swag.py index d3ddca3f134..7abceab543a 100644 --- a/examples/tensorflow/multiple-choice/run_swag.py +++ b/examples/tensorflow/multiple-choice/run_swag.py @@ -455,6 +455,8 @@ def main(): ) else: optimizer = None + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, metrics=["accuracy"], jit_compile=training_args.xla) # endregion diff --git a/examples/tensorflow/question-answering/run_qa.py b/examples/tensorflow/question-answering/run_qa.py index a42d5111814..15c6884d94a 100755 --- a/examples/tensorflow/question-answering/run_qa.py +++ b/examples/tensorflow/question-answering/run_qa.py @@ -656,7 +656,8 @@ def main(): adam_global_clipnorm=training_args.max_grad_norm, ) - # no user-specified loss = will use the model internal loss + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, jit_compile=training_args.xla, metrics=["accuracy"]) else: diff --git a/examples/tensorflow/summarization/run_summarization.py b/examples/tensorflow/summarization/run_summarization.py index f3195d39d96..21b77ed639e 100644 --- a/examples/tensorflow/summarization/run_summarization.py +++ b/examples/tensorflow/summarization/run_summarization.py @@ -674,6 +674,8 @@ def main(): # endregion # region Training + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, jit_compile=training_args.xla) eval_metrics = None if training_args.do_train: diff --git a/examples/tensorflow/text-classification/run_glue.py b/examples/tensorflow/text-classification/run_glue.py index df062c342e5..13f840ce174 100644 --- a/examples/tensorflow/text-classification/run_glue.py +++ b/examples/tensorflow/text-classification/run_glue.py @@ -453,6 +453,8 @@ def main(): metrics = [] else: metrics = ["accuracy"] + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, metrics=metrics, jit_compile=training_args.xla) # endregion diff --git a/examples/tensorflow/text-classification/run_text_classification.py b/examples/tensorflow/text-classification/run_text_classification.py index 64799eda3c0..81d5683d1ee 100644 --- a/examples/tensorflow/text-classification/run_text_classification.py +++ b/examples/tensorflow/text-classification/run_text_classification.py @@ -487,6 +487,8 @@ def main(): metrics = [] else: metrics = ["accuracy"] + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, metrics=metrics) # endregion diff --git a/examples/tensorflow/token-classification/run_ner.py b/examples/tensorflow/token-classification/run_ner.py index a358cd12dd6..2d5ed748fe9 100644 --- a/examples/tensorflow/token-classification/run_ner.py +++ b/examples/tensorflow/token-classification/run_ner.py @@ -454,7 +454,8 @@ def main(): weight_decay_rate=training_args.weight_decay, adam_global_clipnorm=training_args.max_grad_norm, ) - + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, jit_compile=training_args.xla) # endregion diff --git a/examples/tensorflow/translation/run_translation.py b/examples/tensorflow/translation/run_translation.py index 1f31c69245f..551d89d0335 100644 --- a/examples/tensorflow/translation/run_translation.py +++ b/examples/tensorflow/translation/run_translation.py @@ -643,6 +643,8 @@ def main(): # region Training eval_metrics = None + # Transformers models compute the right loss for their task by default when labels are passed, and will + # use this for training unless you specify your own loss function in compile(). model.compile(optimizer=optimizer, jit_compile=training_args.xla) if training_args.do_train: diff --git a/src/transformers/modeling_tf_utils.py b/src/transformers/modeling_tf_utils.py index bac575e249d..41ee0402b70 100644 --- a/src/transformers/modeling_tf_utils.py +++ b/src/transformers/modeling_tf_utils.py @@ -1498,7 +1498,7 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, Pu def compile( self, optimizer="rmsprop", - loss="passthrough", + loss="auto_with_warning", metrics=None, loss_weights=None, weighted_metrics=None, @@ -1510,13 +1510,16 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, Pu This is a thin wrapper that sets the model's loss output head as the loss if the user does not specify a loss function themselves. """ - if loss == "passthrough": - logger.warning( + if loss in ("auto_with_warning", "passthrough"): # "passthrough" for workflow backward compatibility + logger.info( "No loss specified in compile() - the model's internal loss computation will be used as the " "loss. Don't panic - this is a common way to train TensorFlow models in Transformers! " "To disable this behaviour please pass a loss argument, or explicitly pass " - "`loss=None` if you do not want your model to compute a loss." + "`loss=None` if you do not want your model to compute a loss. You can also specify `loss='auto'` to " + "get the internal loss without printing this info string." ) + loss = "auto" + if loss == "auto": loss = dummy_loss self._using_dummy_loss = True else: