Anthony MOI
|
36434220fc
|
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline
* failing pretrokenized test
* Fix is_pretokenized in python
* add pretokenized tests
* style and quality
* better tests for batched pretokenized inputs
* tokenizers clean up - new padding_strategy - split the files
* [HUGE] refactoring tokenizers - padding - truncation - tests
* style and quality
* bump up requied tokenizers version to 0.8.0-rc1
* switched padding/truncation API - simpler better backward compat
* updating tests for custom tokenizers
* style and quality - tests on pad
* fix QA pipeline
* fix backward compatibility for max_length only
* style and quality
* Various cleans up - add verbose
* fix tests
* update docstrings
* Fix tests
* Docs reformatted
* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
|
2020-06-15 17:12:51 -04:00 |
|