mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-30 17:52:35 +06:00
![]() * fix: Apostraphe splitting in the BasicTokenizer for CLIPTokenizer * account for apostrophe at start of new word * remove _run_split_on_punc, use re.findall instead * remove debugging, make style and quality * use pattern and punc splitting, repo-consistency will fail * remove commented out debugging * adds bool args to BasicTokenizer, remove pattern * do_split_on_punc default True * clean stray comments and line breaks * rebase, repo-consistency * update to just do punctuation split * add unicode normalizing back * remove redundant line |
||
---|---|---|
.. | ||
__init__.py | ||
test_modeling_bert.py | ||
test_modeling_flax_bert.py | ||
test_modeling_tf_bert.py | ||
test_tokenization_bert_tf.py | ||
test_tokenization_bert.py |