LysandreJik
|
c10c7d59e7
|
Mask computing in standalone method. Tests.
|
2019-09-19 10:55:06 +02:00 |
|
LysandreJik
|
bf503158c5
|
Sentence -> Sequence. Removed output_mask from the special token addition methods.
|
2019-09-19 10:55:06 +02:00 |
|
LysandreJik
|
bac332fec0
|
Updated the GLUE data processor. Corrections to RoBERTa and XLNet.
|
2019-09-19 10:55:06 +02:00 |
|
LysandreJik
|
e391d4735e
|
Tokenizers' encode function can output binary masks
|
2019-09-19 10:55:06 +02:00 |
|
Thomas Wolf
|
d483cd8e46
|
Merge pull request #1074 from huggingface/improved_testing
Shortcut to special tokens' ids - fix GPT2 & RoBERTa tokenizers - improved testing for GPT/GPT-2
|
2019-08-30 23:18:58 +02:00 |
|
thomwolf
|
7044ed6b05
|
fix tokenizers serialization
|
2019-08-30 17:36:11 +02:00 |
|
Thomas Wolf
|
50e615f43d
|
Merge branch 'master' into improved_testing
|
2019-08-30 13:40:35 +02:00 |
|
thomwolf
|
f8aace6bcd
|
update tokenizers to use self.XX_token_id instead of converting self.XX_token
|
2019-08-30 13:39:52 +02:00 |
|
thomwolf
|
3bcbebd440
|
max_len_single_sentence & max_len_sentences_pair as attributes so they can be modified
|
2019-08-23 22:07:26 +02:00 |
|
thomwolf
|
47d6853439
|
adding max_lengths for single sentences and sentences pairs
|
2019-08-23 17:31:11 +02:00 |
|
LysandreJik
|
22ac004a7c
|
Added documentation and changed parameters for special_tokens_sentences_pair.
|
2019-08-12 15:13:53 -04:00 |
|
LysandreJik
|
14e970c271
|
Tokenization encode/decode class-based sequence handling
|
2019-08-09 15:01:38 -04:00 |
|
thomwolf
|
009273dbdd
|
big doc update [WIP]
|
2019-08-04 12:14:57 +02:00 |
|
thomwolf
|
1b35d05d4b
|
update conversion scripts and __main__
|
2019-07-16 09:41:55 +02:00 |
|
thomwolf
|
15d8b1266c
|
update tokenizer - update squad example for xlnet
|
2019-07-15 17:30:42 +02:00 |
|
thomwolf
|
7d4b200e40
|
good quality generation example for GPT, GPT-2, Transfo-XL, XLNet
|
2019-07-13 15:25:03 +02:00 |
|
LysandreJik
|
f773faa258
|
Fixed all links. Removed TPU. Changed CLI to Converting TF models. Many minor formatting adjustments. Added "TODO Lysandre filled" where necessary.
|
2019-07-10 14:45:56 -04:00 |
|
thomwolf
|
b19786985d
|
unified tokenizer api and serialization + tests
|
2019-07-09 10:25:18 +02:00 |
|
thomwolf
|
36bca545ff
|
tokenization abstract class - tests for examples
|
2019-07-05 15:02:59 +02:00 |
|
thomwolf
|
0bab55d5d5
|
[BIG] name change
|
2019-07-05 11:55:36 +02:00 |
|