Commit Graph

9 Commits

Author SHA1 Message Date
Thomas Wolf
9aeacb58ba
Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-10-08 11:32:16 +02:00
Sam Shleifer
13deb95a40
Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
Sam Shleifer
07dd7c2fd8
[cleanup] test_tokenization_common.py (#4390) 2020-05-19 10:46:55 -04:00
Lysandre Debut
6435b9f908
Updating the TensorFlow models to work as expected with tokenizers v3.0.0 (#3684)
* Updating modeling tf files; adding tests

* Merge `encode_plus` and `batch_encode_plus`
2020-04-08 16:22:44 -04:00
alberduris
81d6841b4b GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
alberduris
dd4df80f0b Moved the encoded_prompts to correct device 2020-01-06 15:11:12 +01:00
Aymeric Augustin
c824d15aa1 Remove __future__ imports. 2019-12-22 17:47:54 +01:00
Aymeric Augustin
7e98e211f0 Remove unittest.main() in test modules.
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
2019-12-22 14:42:03 +01:00
Aymeric Augustin
ced0a94204 Switch test files to the standard test_*.py scheme. 2019-12-22 14:15:13 +01:00