mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
[lamaTokenizerFast] Update documentation (#24132)
* Update documentation * nits
This commit is contained in:
parent
62fe753325
commit
5af3a1aa48
@ -65,6 +65,7 @@ This model was contributed by [zphang](https://huggingface.co/zphang) with contr
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- update_post_processor
|
||||
- save_vocabulary
|
||||
|
||||
## LlamaModel
|
||||
|
@ -48,6 +48,12 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
|
||||
>>> [1, 15043, 445, 338, 263, 1243]
|
||||
```
|
||||
|
||||
If you want to change the `bos_token` or the `eos_token`, make sure to specify them when initializing the model, or
|
||||
call `tokenizer.update_post_processor()` to make sure that the post-processing is correctly done (otherwise the
|
||||
values of the first token and final token of an encoded sequence will not be correct). For more details, checkout
|
||||
[post-processors] (https://huggingface.co/docs/tokenizers/api/post-processors) documentation.
|
||||
|
||||
|
||||
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
|
||||
refer to this superclass for more information regarding those methods.
|
||||
|
||||
@ -108,6 +114,9 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
|
||||
self.can_save_slow_tokenizer = False if not self.vocab_file else True
|
||||
|
||||
def update_post_processor(self):
|
||||
"""
|
||||
Updates the underlying post processor with the current `bos_token` and `eos_token`.
|
||||
"""
|
||||
bos = self.bos_token
|
||||
bos_token_id = self.bos_token_id
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user