Commit Graph

8 Commits

Author SHA1 Message Date
Sylvain Gugger
00aa9dbca2
Copyright (#8970)
* Add copyright everywhere missing

* Style
2020-12-07 18:36:34 -05:00
Sylvain Gugger
9c4aa4ac1a
Clean up data collators and datasets (#8308)
* Clean up data collators and datasets

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-04 17:24:49 -05:00
Sylvain Gugger
7f556d2e39
Data collator for token classification (#8274)
* Add DataCollatorForTokenClassification and clean tests

* Make quality
2020-11-03 16:33:27 -05:00
Sylvain Gugger
077478637d
Fix label name in DataCollatorForNextSentencePrediction test (#8048) 2020-10-26 09:23:12 -04:00
Sylvain Gugger
1ee2194fb6
Mark big downloads slow (#7325)
* Make big downloads as slow

* Add import

* Right order for slow decorator

* More slow tests
2020-09-22 12:21:52 -04:00
Yu Liu
762cba3bda
Albert pretrain datasets/ datacollator (#6168)
* add dataset for albert pretrain

* datacollator for albert pretrain

* naming, comprehension, file reading change

* data cleaning is no needed after this modification

* delete prints

* fix a bug

* file structure change

* add tests for albert datacollator

* remove random seed

* add back len and get item function

* sample file for testing and test code added

* format change for black

* more format change

* Style

* var assignment issue resolve

* add back wrongly deleted DataCollatorWithPadding in init file

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-10 07:56:29 -04:00
Huang Lianzhe
2de7ee0385
Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-31 08:25:00 -04:00
Sylvain Gugger
573bdb0a5d
Add tests to Trainer (#6605)
* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs
2020-08-20 11:13:50 -04:00