transformers/tests/models/bert
Connor Henderson 5739726fcc
fix: Text splitting in the BasicTokenizer (#22280)
* fix: Apostraphe splitting in the BasicTokenizer for CLIPTokenizer

* account for apostrophe at start of new word

* remove _run_split_on_punc, use re.findall instead

* remove debugging, make style and quality

* use pattern and punc splitting, repo-consistency will fail

* remove commented out debugging

* adds bool args to BasicTokenizer, remove pattern

* do_split_on_punc default True

* clean stray comments and line breaks

* rebase, repo-consistency

* update to just do punctuation split

* add unicode normalizing back

* remove redundant line
2023-07-11 11:07:58 -04:00
..
__init__.py Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
test_modeling_bert.py Fix flaky test_for_warning_if_padding_and_no_attention_mask (#24706) 2023-07-07 11:55:21 +02:00
test_modeling_flax_bert.py Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
test_modeling_tf_bert.py Speed up TF tests by reducing hidden layer counts (#24595) 2023-06-30 16:30:33 +01:00
test_tokenization_bert_tf.py Fix past CI (#20967) 2023-01-12 18:04:21 +01:00
test_tokenization_bert.py fix: Text splitting in the BasicTokenizer (#22280) 2023-07-11 11:07:58 -04:00