Sam Shleifer
|
be1520d3a3
|
rename prepare_translation_batch -> prepare_seq2seq_batch (#6103)
|
2020-08-11 15:57:07 -04:00 |
|
Sam Shleifer
|
e0d58ddb65
|
[fix] Marian tests import (#5442)
|
2020-07-01 11:42:22 -04:00 |
|
Sam Shleifer
|
43cb03a93d
|
MarianTokenizer.prepare_translation_batch uses new tokenizer API (#5182)
|
2020-07-01 10:32:50 -04:00 |
|
Sam Shleifer
|
3d495c61ef
|
Fix marian tokenizer save pretrained (#5043)
|
2020-06-16 09:48:19 -04:00 |
|
Anthony MOI
|
36434220fc
|
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline
* failing pretrokenized test
* Fix is_pretokenized in python
* add pretokenized tests
* style and quality
* better tests for batched pretokenized inputs
* tokenizers clean up - new padding_strategy - split the files
* [HUGE] refactoring tokenizers - padding - truncation - tests
* style and quality
* bump up requied tokenizers version to 0.8.0-rc1
* switched padding/truncation API - simpler better backward compat
* updating tests for custom tokenizers
* style and quality - tests on pad
* fix QA pipeline
* fix backward compatibility for max_length only
* style and quality
* Various cleans up - add verbose
* fix tests
* update docstrings
* Fix tests
* Docs reformatted
* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
|
2020-06-15 17:12:51 -04:00 |
|
Sam Shleifer
|
4ab7424597
|
[cleanup/marian] pipelines test and new kwarg (#4812)
|
2020-06-05 18:45:19 -04:00 |
|
Sam Shleifer
|
efbc1c5a9d
|
[MarianTokenizer] implement save_vocabulary and other common methods (#4389)
|
2020-05-19 19:45:49 -04:00 |
|