* kinda works
* update
* add tests
* update
* use special tokens in processors
* typo
* fix copies
* fix
* fix moshi after rebase
* update
* fix tests
* update
* Update docs/source/en/main_classes/tokenizer.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs
* test for load time adding tokens
* fix some more tests which are now fetched better
* one more fix
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* PR SPLIT: moving origina changes for adding user defined symbols
* adding gemma test and generalizing gemma converter
* ruff
* update common test
* update serialization test
* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
* removing commented lines
* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
* add comment referencing sentencepiece
* Added test cases for rembert refering to albert and reformer test_tokenization
* removed CURL_CA_BUNDLE='
* Added flag test_sentencepiece_ignore_case and space_between_special_tokens to True
* Overrided test_added_tokens_serialization
* As slow->fast token failed due to the different initialization for [MASK] for slow and fast, Therefore it required to make the initialization for [MASK] token uniform between fast and slow token
* Added few more test cases in test_encode_decode_round_trip and modefied the slow token (mask_token) to have AddedToken instance with lstrip=True
* Added few test cases in test_encoder_decoder round trip and also modified slow tokenizer of rembert to have mask_token as AddedToken with lstrip = True
* Cleaned the code and added fmt: skip to avoid line breaks after make style + added comments to indicate from the copied test cases
* Corrected few comments
* Fixed quality issue
* Ran fix-copies
* Fixed few minor issues as (make fix-copies) broke few test cases while stripping the text
* Reverted the changes made by repo-consistancy
---------
Co-authored-by: Kokane <kokanen@apac.corpdir.net>