Commit Graph

11 Commits

Author SHA1 Message Date
Thomas Wolf
827d6d6ef0
Cleanup fast tokenizers integration (#3706)
* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy

Co-authored-by: Stefan Schweter <stefan@schweter.it>
2020-04-18 13:43:57 +02:00
elk-cloner
5ebd898953
fix dataset shuffling for Distributed training (#huggingface#3721) (#3766) 2020-04-13 10:11:18 -04:00
Nicolas
c50aa67bff
Resizing embedding matrix before sending it to the optimizer. (#3532)
* Resizing embedding matrix after sending it to the optimizer prevents from updating the newly resized matrix.

* Remove space for style matter
2020-04-02 15:00:05 -04:00
Mark Kockerbeck
1b10159950
Adding should_continue check for retraining (#3509) 2020-04-02 14:07:08 -04:00
Julien Chaumond
eaabaaf750 [run_language_modeling] Fix: initialize a new model from a config object 2020-03-24 17:56:40 -04:00
Julien Chaumond
f8823bad9a Expose missing mappings (see #3415) 2020-03-24 17:46:25 -04:00
Julien Chaumond
a8e3336a85 [examples] Use AutoModels in more examples 2020-03-23 20:11:14 -04:00
Victor SANH
6b1ff25084
fix n_gpu count when no_cuda flag is activated (#3077)
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
2020-03-02 10:20:21 -05:00
Lysandre
f54a5bd37f Raise error when using an mlm flag for a clm model + correct TextDataset 2020-02-12 13:23:14 -05:00
Lysandre
569897ce2c Fix a few issues regarding the language modeling script 2020-02-12 13:23:14 -05:00
Julien Chaumond
42f08e596f [examples] rename run_lm_finetuning to run_language_modeling 2020-02-07 09:15:28 -05:00