transformers/docs/source/en/internal/tokenization_utils.md
Sylvain Gugger eb849f6604
Migrate doc files to Markdown. (#24376)
* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-06-20 18:07:47 -04:00

1.5 KiB

Utilities for Tokenizers

This page lists all the utility functions used by the tokenizers, mainly the class [~tokenization_utils_base.PreTrainedTokenizerBase] that implements the common methods between [PreTrainedTokenizer] and [PreTrainedTokenizerFast] and the mixin [~tokenization_utils_base.SpecialTokensMixin].

Most of those are only useful if you are studying the code of the tokenizers in the library.

PreTrainedTokenizerBase

autodoc tokenization_utils_base.PreTrainedTokenizerBase - call - all

SpecialTokensMixin

autodoc tokenization_utils_base.SpecialTokensMixin

Enums and namedtuples

autodoc tokenization_utils_base.TruncationStrategy

autodoc tokenization_utils_base.CharSpan

autodoc tokenization_utils_base.TokenSpan