Commit Graph

5034 Commits

Author SHA1 Message Date
Thomas Wolf
9f5f646442
Merge pull request #2211 from huggingface/fast-tokenizers
Fast tokenizers
2019-12-27 10:24:29 +01:00
Anthony MOI
2818e50569
Add tests for fast tokenizers 2019-12-24 13:29:01 -05:00
Aymeric Augustin
e6c0019c80 Remove unused variables in tests. 2019-12-23 22:38:18 +01:00
Aymeric Augustin
1c62e87b34 Use built-in open().
On Python 3, `open is io.open`.
2019-12-22 18:38:56 +01:00
Aymeric Augustin
798b3b3899 Remove sys.version_info[0] == 2 or 3. 2019-12-22 18:38:42 +01:00
Aymeric Augustin
8af25b1664 Remove six. 2019-12-22 17:56:09 +01:00
Aymeric Augustin
c824d15aa1 Remove __future__ imports. 2019-12-22 17:47:54 +01:00
Aymeric Augustin
00204f2b4c Replace CommonTestCases for tokenizers with a mixin.
This is the same change as for (TF)CommonTestCases for modeling.
2019-12-22 15:35:25 +01:00
Aymeric Augustin
a3c5883f2c Rename file for consistency. 2019-12-22 15:35:25 +01:00
Aymeric Augustin
daf8bebcdd Remove unused GPTModelTester.
It isn't imported anywhere.
2019-12-22 15:35:25 +01:00
Aymeric Augustin
345c23a60f Replace (TF)CommonTestCases for modeling with a mixin.
I suspect the wrapper classes were created in order to prevent the
abstract base class (TF)CommonModelTester from being included in test
discovery and running, because that would fail.

I solved this by replacing the abstract base class with a mixin.

Code changes are just de-indenting and automatic reformattings
performed by black to use the extra line space.
2019-12-22 15:35:18 +01:00
Aymeric Augustin
7e98e211f0 Remove unittest.main() in test modules.
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
2019-12-22 14:42:03 +01:00
Aymeric Augustin
ced0a94204 Switch test files to the standard test_*.py scheme. 2019-12-22 14:15:13 +01:00
Aymeric Augustin
067395d5c5 Move tests outside of library. 2019-12-22 13:47:17 +01:00
thomwolf
1484d67de9 [LARGE] updating all tests and API 2019-07-02 12:13:17 +02:00
thomwolf
4f8b5f687c add fix for serialization of tokenizer 2019-06-29 23:35:21 +02:00
thomwolf
d9184620f9 fix tests and new API 2019-06-29 23:10:40 +02:00
thomwolf
7e3070ae4f add from_pretrained method to all configuration classes 2019-06-26 11:12:00 +02:00
thomwolf
93e9971c54 fix tests 2019-06-26 10:02:45 +02:00
thomwolf
62d78aa37e updating GLUE utils for compatibility with XLNet 2019-06-24 14:36:11 +02:00
thomwolf
c946bb51a6 fix xlnet tokenizer and python2 2019-06-22 22:28:49 +02:00
thomwolf
ebd2cb8d74 update from_pretrained to load XLNetModel as well 2019-06-21 21:08:44 +02:00
thomwolf
483cbc36a9 test deviation with tf model: max ~1e-3 should be ok 2019-06-21 16:38:01 +02:00
thomwolf
24d8068982 weights loading script ok 2019-06-21 12:33:44 +02:00
thomwolf
32da75486b add tokenizer and tests 2019-06-21 11:09:51 +02:00
thomwolf
45709d7532 model running with simple inputs 2019-06-21 00:28:42 +02:00
thomwolf
34d706a0e1 pruning in bertology 2019-06-19 15:25:49 +02:00
thomwolf
33d3db5c43 updating head masking, readme and docstrings 2019-06-17 15:51:28 +02:00
thomwolf
965f172de6 output all hidden layers states in GPT/GPT-2 2019-06-17 14:34:12 +02:00
thomwolf
f12007e421 add head masking and pruning to openai GPT 2019-06-17 14:19:40 +02:00
thomwolf
b860e47cf5 add head masking and pruning to gpt-2 2019-06-17 14:12:10 +02:00
thomwolf
7220d47a1c adding head pruning and tests 2019-06-17 13:20:45 +02:00
thomwolf
96c4d3d988 add head masking tests 2019-06-17 12:17:26 +02:00
thomwolf
5e1207b8ad add attention to all bert models and add test 2019-06-14 16:28:25 +02:00
thomwolf
bcc9e93e6f fix test 2019-06-14 15:38:20 +02:00
thomwolf
a3274ac40b adding attention outputs in bert 2019-06-03 16:11:45 -05:00
thomwolf
c30139a013 add special tokens to gpt-2 2019-04-30 10:45:26 +02:00
lukovnikov
56a47ce2b7 - replaced OpenAIGPTAdam with OpenAIAdam in docs 2019-04-25 16:05:28 +02:00
lukovnikov
704037ad51 - updated docs for new LR API
- added some images for illustration
- updated comments in optimization
2019-04-25 15:59:39 +02:00
lukovnikov
bb7557d3ab - removed __all__ in optimization
- removed unused plotting code
- using ABC for LRSchedule
- added some schedule object init tests
2019-04-21 13:48:33 +02:00
lukovnikov
34ccc8ebf4 Merge remote-tracking branch 'upstream/master' 2019-04-21 13:16:15 +02:00
thomwolf
34ae5bf838 small clean up in tests 2019-04-17 14:52:12 +02:00
thomwolf
265550ec34 relax network connection requirements 2019-04-17 14:22:35 +02:00
thomwolf
31d387604c adding s3 model tests with --runslow 2019-04-17 11:58:27 +02:00
thomwolf
bc70779bf0 fixed GPT-2 tokenization on python 2 2019-04-17 10:56:15 +02:00
thomwolf
18a8a15f78 improving GPT2 tokenization and adding tests 2019-04-16 17:00:55 +02:00
thomwolf
9761aa4845 add to_json_file method to configuration classes 2019-04-15 14:12:08 +02:00
thomwolf
e8568a3b17 fixing tests 2019-04-15 12:55:38 +02:00
thomwolf
870b734bfd added tokenizers serialization tests 2019-04-15 12:03:56 +02:00
lukovnikov
20686b78fc schedule fix 2019-04-03 18:13:52 +02:00
lukovnikov
1b4ce76c38 schedule fix 2019-04-03 17:40:12 +02:00
lukovnikov
23bd2eebf5 schedule fix 2019-04-03 17:10:34 +02:00
lukovnikov
91a073f804 schedule fix 2019-04-03 17:10:08 +02:00
lukovnikov
b64cc63a77 optimization schedule test update 2019-04-03 16:42:40 +02:00
lukovnikov
d164867d90 - updated docs for optimization 2019-04-03 16:13:51 +02:00
lukovnikov
262a9992d7 class weights 2019-03-18 18:29:12 +01:00
thomwolf
2dd8f524f5 removing test for long sequences error following #337 2019-03-06 10:10:41 +01:00
thomwolf
009ee86a19 fix tests - bump up version 2019-02-17 23:57:23 +01:00
thomwolf
ffd623823d adding gpt2 2019-02-17 23:38:51 +01:00
thomwolf
884ca81d87 transposing the inputs of Transformer-XL to have a unified interface 2019-02-11 13:19:59 +01:00
thomwolf
0a9860daa7 tests pass on python 2 and 3 2019-02-11 10:47:52 +01:00
thomwolf
2071a9b86e fix python 2.7 imports 2019-02-11 10:35:36 +01:00
thomwolf
b514a60c36 added tests for OpenAI GPT and Transformer-XL tokenizers 2019-02-11 10:17:16 +01:00
thomwolf
9bdcba53fd fix tests 2019-02-09 17:07:12 +01:00
thomwolf
1320e4ec0c mc_token_mask => mc_token_ids 2019-02-09 16:58:53 +01:00
thomwolf
2df41663f1 added test 2019-02-07 17:05:49 +01:00
thomwolf
ba9e4eb354 fix unicode in tokenization tests 2019-02-06 00:28:00 +01:00
thomwolf
448937c00d python 2 compatibility 2019-02-06 00:07:46 +01:00
thomwolf
98c96fb1a7 splitting position and tokens embeddings in OpenAI GPT - updating tf imports - tests 2019-01-29 10:31:42 +01:00
thomwolf
a45a9cc0e1 update tests 2019-01-28 17:16:02 +01:00
thomwolf
dc5df92fa8 added LM head for OpenAI 2019-01-08 17:18:47 +01:00
thomwolf
3cf12b235a added tests + fixed losses 2019-01-08 16:24:23 +01:00
Patrick Lewis
78cf7b4ab4 added code to raise value error for bert tokenizer for covert_tokens_to_indices 2018-12-18 14:41:30 +00:00
thomwolf
0f544625f4 fix swag example for work with apex 2018-12-13 13:35:59 +01:00
thomwolf
52c53f39d0 clean up apex integration 2018-12-13 13:02:17 +01:00
thomwolf
85fff78c2d compatibility PT 1.0 and 0.4.1 2018-12-13 12:48:13 +01:00
Deyu Fu
c8ea286048 change to apex for better fp16 and multi-gpu support 2018-12-11 17:13:58 -08:00
thomwolf
7f7c41b0c1 tests for all model classes with and without labels 2018-11-30 22:54:33 +01:00
thomwolf
757750d6f6 fix tests 2018-11-17 11:58:14 +01:00
thomwolf
1de35b624b preparing for first release 2018-11-15 20:56:10 +01:00
Yaser Martinez Palenzuela
4d124baf8f
Add test for Chinese tokenization 2018-11-05 23:04:29 +01:00
thomwolf
3d291dea4a clean up tests 2018-11-04 21:27:19 +01:00
thomwolf
87da161c2a finishing model test 2018-11-04 21:27:10 +01:00
thomwolf
f8276008df update readme, file names, removing TF code, moving tests 2018-11-03 23:35:14 +01:00