Commit Graph

2 Commits

Author SHA1 Message Date
Nicolas Patry
8e5f76f511
Upgrading to tokenizers 0.19.0 (#30289)
* [DO NOT MERGE] Testing tokenizers 0.19.0rc0

* Accounting for the breaking change.

* Ruff.

* Upgrading to tokenizers `0.19` (new release with preprend_scheme fixed
and new surface for BPE tiktoken bug).
2024-04-17 17:17:50 +02:00
Patrick von Platen
31c3e7e75b
[Flax] Add T5 pretraining script (#12355)
* fix_torch_device_generate_test

* remove @

* add length computatan

* finish masking

* finish

* upload

* fix some bugs

* finish

* fix dependency table

* correct tensorboard

* Apply suggestions from code review

* correct processing

* slight change init

* correct some more mistakes

* apply suggestions

* improve readme

* fix indent

* Apply suggestions from code review

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* correct tokenizer

* finish

* finish

* finish

* finish

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2021-06-28 20:11:29 +01:00