mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-05 05:40:05 +06:00

* Start doc tokenizers * Tokenizer documentation * Start doc tokenizers * Tokenizer documentation * Formatting after rebase * Formatting after merge * Update docs/source/main_classes/tokenizer.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address comment * Update src/transformers/tokenization_utils_base.py Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> * Address Thom's comments Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
39 lines
1.3 KiB
ReStructuredText
39 lines
1.3 KiB
ReStructuredText
Utilities for Tokenizers
|
|
------------------------
|
|
|
|
This page lists all the utility functions used by the tokenizers, mainly the class
|
|
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that implements the common methods between
|
|
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` and the mixin
|
|
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
|
|
|
|
Most of those are only useful if you are studying the code of the tokenizers in the library.
|
|
|
|
``PreTrainedTokenizerBase``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.tokenization_utils_base.PreTrainedTokenizerBase
|
|
:special-members: __call__
|
|
:members:
|
|
|
|
|
|
``SpecialTokensMixin``
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. autoclass:: transformers.tokenization_utils_base.SpecialTokensMixin
|
|
:members:
|
|
|
|
|
|
Enums and namedtuples
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
.. autoclass:: transformers.tokenization_utils_base.ExplicitEnum
|
|
|
|
.. autoclass:: transformers.tokenization_utils_base.PaddingStrategy
|
|
|
|
.. autoclass:: transformers.tokenization_utils_base.TensorType
|
|
|
|
.. autoclass:: transformers.tokenization_utils_base.TruncationStrategy
|
|
|
|
.. autoclass:: transformers.tokenization_utils_base.CharSpan
|
|
|
|
.. autoclass:: transformers.tokenization_utils_base.TokenSpan
|