transformers/docs/source/main_classes/optimizer_schedules.rst
Sylvain Gugger 3323146e90
Models doc (#7345)
* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-23 13:20:45 -04:00

78 lines
2.6 KiB
ReStructuredText

Optimization
-----------------------------------------------------------------------------------------------------------------------
The ``.optimization`` module provides:
- an optimizer with weight decay fixed that can be used to fine-tuned models, and
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
- a gradient accumulation class to accumulate the gradients of multiple batches
AdamW (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamW
:members:
AdaFactor (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Adafactor
AdamWeightDecay (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamWeightDecay
.. autofunction:: transformers.create_optimizer
Schedules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Learning Rate Schedules (Pytorch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: transformers.get_constant_schedule
.. autofunction:: transformers.get_constant_schedule_with_warmup
.. image:: /imgs/warmup_constant_schedule.png
:target: /imgs/warmup_constant_schedule.png
:alt:
.. autofunction:: transformers.get_cosine_schedule_with_warmup
.. image:: /imgs/warmup_cosine_schedule.png
:target: /imgs/warmup_cosine_schedule.png
:alt:
.. autofunction:: transformers.get_cosine_with_hard_restarts_schedule_with_warmup
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
:target: /imgs/warmup_cosine_hard_restarts_schedule.png
:alt:
.. autofunction:: transformers.get_linear_schedule_with_warmup
.. image:: /imgs/warmup_linear_schedule.png
:target: /imgs/warmup_linear_schedule.png
:alt:
Warmup (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.WarmUp
:members:
Gradient Strategies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GradientAccumulator (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.GradientAccumulator