Commit Graph

17 Commits

Author SHA1 Message Date
Sylvain Gugger
b01483faa0
Truncate max length if needed in all examples (#10034) 2021-02-08 05:03:55 -05:00
Stas Bekman
8ea412a86f
[examples] make run scripts executable (#10037)
* make executable

* make executable

* same for the template

* cleanup
2021-02-05 15:51:18 -08:00
Sylvain Gugger
b4e559cfa1
Deprecate model_path in Trainer.train (#9854) 2021-01-28 08:32:46 -05:00
Sylvain Gugger
f2fabedbab
Setup logging with a stdout handler (#9816) 2021-01-27 03:39:11 -05:00
Sylvain Gugger
caf4abf768
Auto-resume training from checkpoint (#9776)
* Auto-resume training from checkpoint

* Update examples/text-classification/run_glue.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Roll out to other examples

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 12:03:51 -05:00
Sylvain Gugger
a1ad16a446
Restrain tokenizer.model_max_length default (#9681)
* Restrain tokenizer.model_max_length default

* Fix indent
2021-01-20 04:17:39 -05:00
Sylvain Gugger
453a70d4cb
Allow example to use a revision and work with private models (#9407)
* Allow example to use a revision and work with private models

* Copy to other examples and template

* Styling
2021-01-06 06:49:23 -05:00
Sylvain Gugger
ab17758874
Add speed metrics to all example scripts + template (#9260) 2020-12-22 14:02:26 -05:00
Teven
2a7e8e1608
[Examples] Add automatic dataset splitting in language-modeling examples (#9133)
* replaced jnp.split + removing textual model inputs + ensuring warmup_steps > 0

* Add automatic dataset splitting in language-modeling examples
2020-12-15 16:02:43 -05:00
Sylvain Gugger
a0c62d2493
Fix training from scratch in new scripts (#8623) 2020-11-18 12:15:26 -05:00
Julien Chaumond
042a6aa777
Tokenizers: ability to load from model subfolder (#8586)
* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-11-17 08:58:45 -05:00
Julien Plu
27b3ff316a
Try to understand and apply Sylvain's comments (#8458) 2020-11-12 13:43:00 -05:00
zeyuyun1
924c624a46
quick fix on concatenating text to support more datasets (#8474) 2020-11-12 09:47:08 -05:00
Sylvain Gugger
9c4aa4ac1a
Clean up data collators and datasets (#8308)
* Clean up data collators and datasets

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-04 17:24:49 -05:00
Sylvain Gugger
cf89724696
Fix validation file loading in scripts (#8298) 2020-11-04 10:42:18 -05:00
Sylvain Gugger
e1b1b614b1
Add line by line option to mlm/plm scripts (#8240)
* Make line by line optional in run_mlm

* Add option to disable dynamic padding

* Add option to plm too and update README

* Typos

* More typos

* Even more typos

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-02 12:27:04 -05:00
Sylvain Gugger
691176283d
Add a template for examples and apply it for mlm and plm examples (#8153)
* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Styling
2020-10-29 13:38:11 -04:00