Commit Graph

2613 Commits

Author SHA1 Message Date
Matt
7e22609e0f
Tensorflow LM examples (#12358)
* Tensorflow MLM example

* Add CLM example

* Style fixes, adding missing checkpoint code from the CLM example

* Fix TPU training, avoid massive dataset warnings

* Fix incorrect training length calculation for multi-GPU training

* Fix incorrect training length calculation for multi-GPU training

* Refactors and nitpicks from the review

* Style pass

* Adding README
2021-06-28 19:31:44 +01:00
Patrick von Platen
2d70c91206
[Flax] Adapt flax examples to include push_to_hub (#12391)
* fix_torch_device_generate_test

* remove @

* finish

* correct summary writer

* correct push to hub

* fix indent

* finish

* finish

* finish

* finish

* finish

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-28 19:23:35 +01:00
Sylvain Gugger
276bc149d2 Fix copies 2021-06-28 12:26:40 -04:00
Patrick von Platen
27b6ac4611
Update README.md 2021-06-28 17:22:10 +01:00
Patrick von Platen
89b57a6669
[Flax community event] Add more description to readme (#12398)
* fix_torch_device_generate_test

* remove @

* boom boom

* correct typos

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review

Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>
2021-06-28 17:18:42 +01:00
Bhadresh Savani
04dbea31a9
[Examples] Added context manager to datasets map (#12367)
* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc
2021-06-28 09:14:00 -07:00
Sylvain Gugger
57461ac0b4
Add possibility to maintain full copies of files (#12312) 2021-06-28 10:02:53 -04:00
Taha ValizadehAslani
9490d668d2
Update run_mlm.py (#12344)
Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
2021-06-28 07:49:22 -04:00
Bhadresh Savani
ff5cdc086b
replace print with logger (#12368) 2021-06-26 09:31:25 -07:00
Bhadresh Savani
539ee456d4
[Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359)
* added log_level

* fix comment

* fixed log_level

* Trigger CI

* Unfied logging

* simplified args for log_level
2021-06-25 14:58:42 -07:00
Stas Bekman
64e6098094
[trainer] add main_process_first context manager (#12351)
* main_process_first context manager

* handle multi-node, add context description

* sync desc
2021-06-25 14:58:03 -07:00
Stas Bekman
4a872caef4
remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
Vasudev Gupta
332a245861
Add FlaxBigBird QuestionAnswering script (#12233)
* port bigbird script

* adapt script a bit

* change location

* adapt more

* save progress

* init commit

* style

* dataset script tested

* readme add
2021-06-25 18:05:48 +01:00
michal pitr
d4ce31e839
fixed typo (#12356) 2021-06-25 07:49:29 -04:00
Patrick von Platen
aa550c4a11
Update README.md 2021-06-25 11:55:51 +01:00
Marc van Zee
f2c4ce7e33
Add flax/jax quickstart (#12342) 2021-06-24 17:04:18 +01:00
Suraj Patil
aef3823e1a
[examples/Flax] move the examples table up (#12341) 2021-06-24 16:03:37 +05:30
Sylvain Gugger
2150dfed31 v4.9.0.dev0 2021-06-23 13:31:19 -04:00
Sylvain Gugger
9252a5127f Release: v4.8.0 2021-06-23 13:25:56 -04:00
Patrick von Platen
44739c8180
[Flax/JAX] Add how to propose projects markdown (#12311)
* fix_torch_device_generate_test

* remove @

* finish

* make style
2021-06-23 14:50:35 +01:00
Suraj Patil
c0fe3c9a7a
Flax summarization script (#12230)
* add summrization script

* fix arguments, preprocessing, metrics

* add generation and metrics

* auto model, prediction loop

* prettify

* label smoothing

* adress Sylvain and Patricks suggestions

* dynamically import shift_tokens_right

* fix shift_tokens_right_fn call
2021-06-23 15:49:30 +05:30
Stas Bekman
ebe5413589
[trainer] 2 bug fixes and a rename (#12309)
* bug fixes and a rename

* add extended DDP test
2021-06-22 11:13:23 -07:00
Patrick von Platen
64029abe4c
[Flax] Main doc for event orga (#12305)
* fix_torch_device_generate_test

* remove @

* push

* finish

* some typos

* add more info on communication

* add suggestions
2021-06-22 18:02:52 +01:00
Stas Bekman
dad414d5f9
[trainer + examples] set log level from CLI (#12276)
* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-21 19:30:50 -07:00
Matt
e3cb7a0b60
Tensorflow QA example (#12252)
* New Tensorflow QA example!

* Style pass

* Updating README.md for the new example

* flake8 fixes

* Update examples/tensorflow/question-answering/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-21 16:37:28 +01:00
Vishal Burman
b53bc55ba9
Fix for making student ProphetNet for Seq2Seq Distillation (#12130)
* make_student.py: fix to make student ProphetNet

* reformat
2021-06-21 09:36:44 -04:00
Bhavitvya Malik
e43e11260f
update desc for map in all examples (#12226)
* update desc for map in all examples

* added plm

* suggestions
2021-06-17 15:37:31 -04:00
Lysandre
0daadc1919 Docs for v4.8.0 2021-06-17 18:17:42 +02:00
Lysandre
7a6c9fab8e Release: v4.7.0 2021-06-17 17:57:42 +02:00
Sylvain Gugger
7d7ceca396
Model card defaults (#12122)
* [WIP] Model card defaults

* finetuned_from default value

* Add all mappings to the mapping file

* Be more defensive on finetuned_from arg

* Add default task tag

* Separate tags from tasks

* Edge case for dataset

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-06-15 16:01:37 -04:00
kumapo
955b2b97a6
Enable add_prefix_space if model_type is roberta or gpt2 (#12116) 2021-06-15 09:33:21 -04:00
Avital Oliver
9b393240a2
Use a released version of optax rather than installing from Git. (#12173)
Use a released version of optax rather than installing from Git
2021-06-15 16:42:51 +05:30
Stas Bekman
88e84186e5
[style] consistent nn. and nn.functional: part 4 examples (#12156)
* consistent nn. and nn.functional: p4 examples

* restore
2021-06-14 12:28:24 -07:00
Kumar Abhishek
9de62cfbce
[lm examples] Replicate --config_overrides addition to other LM examples (#12135)
* [lm examples] Replicate --config_overrides addition to other LM examples

* Removing no trainer files changes

* Update README

Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
2021-06-14 08:12:22 -04:00
Nicholas Broad
cd7961b632
Use text_column_name variable instead of "text" (#12132)
* Use text_column_name variable instead of "text"

`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.

This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.

* black formatting

* make style

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
2021-06-14 08:11:13 -04:00
Sylvain Gugger
b8ab541340
Don't log anything before logging is setup in examples (#12121)
* Don't log anything before logging is setup in examples

* Last example
2021-06-14 08:03:33 -04:00
Patrick von Platen
7566fefa69
[Flax] Add links to google colabs (#12146)
* fix_torch_device_generate_test

* remove @

* add colab links
2021-06-14 11:00:29 +01:00
Suraj Patil
d36fce8237
add readme for flax clm (#12111)
* add readme for flax clm

* use section link for tokenizer

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update metrics

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-14 15:03:55 +05:30
Patrick von Platen
16c0efca2c
Add mlm pretraining xla torch readme (#12011)
* fix_torch_device_generate_test

* remove @

* upload

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Update examples/flax/language-modeling/README.md

* add more info

* finish

* fix

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-14 10:31:21 +01:00
Suraj Patil
15b498f3b8
Flax CLM script (#12023)
* first draft

* max_seq_length => block_size

* fix arg names

* fix typos

* fix loss calculation

* add max examples, fix  train eval steps, metrics

* optimizer mask

* fix perpelexity, metric logging

* fix logging

* data_collator = > data_loader

* refactor loss_fn

* support single GPU

* pass distributed to write_metric

* fix jitting

* fix single device training

* fix single device metrics

* close inner progress bars once finished

* add overwrite_cache arg

* ifx dataset caching issue

* add more logs

* few small fixes,

* address nicholas suggestions

* fix docstr

* address patricks suggestions

* make flake happy

* pass new new_dropout_rng to apply_gradients

* reset train metrics after every epoc

* remove distributed logis, small fixes
2021-06-11 15:16:20 +05:30
Bhavitvya Malik
d2753dcbec
add relevant description to tqdm in examples (#11927)
* add relevant `desc` in examples

* require_version datasets>=1.8.0
2021-06-10 15:59:55 -04:00
Matt
bebbdd0fc9
Appending label2id and id2label to models to ensure inference works properly (#12102) 2021-06-10 15:25:04 +01:00
Matt
4cda08decb
Minor style edits 2021-06-10 15:10:57 +01:00
Matt
7f08dbd10a
Update README.md to cover the TF GLUE example. 2021-06-10 14:33:42 +01:00
Sylvain Gugger
d72e5a3a6d Fix quality 2021-06-10 09:27:11 -04:00
Matt
73a532651a
New TF GLUE example (#12028)
* Pushing partially-complete new GLUE example

* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.

* Fix to the fit() call

* Bugfixes, making sure TPU and multi-GPU support is ready

* Remove logger line that depends on Pytorch

* Style pass

* Deleting old TF GLUE example

* Include label2id and id2label in the saved model config

* Don't clobber the existing model.config.label2id

* Style fixes

* Update examples/tensorflow/text-classification/run_glue.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-10 14:14:37 +01:00
kumapo
472a867626
Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)
* Add text_column_name and label_column_name to run_ner args

* Minor fix: grouping for text and label column name
2021-06-10 08:03:20 -04:00
Stas Bekman
61e191987d
rm require_version_examples (#12088) 2021-06-09 11:02:52 -07:00
Suraj Patil
d1500d9151
pass decay_mask fn to optimizer (#12087) 2021-06-09 18:49:27 +01:00
Anton Lozhkov
d472bd7b18
Wav2Vec2 Pretraining (#11306)
* Working quantizer forward

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Remove custom outputs from the shared ones

* correct conversion

* correct bug

* add first pretrain script

* save intermediate

* static shapes

* save intermediate

* finish first pretrain script version

* more refactor

* remove wanddb

* refactor more

* improve test

* correct perplexity compute bug

* finish model implementation

* add to docs

* finish docs

* finish pretraining script

* finish pretraining script

* remove wandb

* finish PR for merge

* finish config

* finish

* make deepspeed work

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

* fix flaky test

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-09 18:40:56 +01:00
Stas Bekman
d14e0af274
sync LayerDrop for Wav2Vec2Encoder + tests (#12076) 2021-06-09 13:21:03 +01:00
Koichi Yasuoka
82a2b76c95
Update run_ner.py with id2label config (#12001) 2021-06-09 07:27:05 -04:00
Stas Bekman
11d86d3de4
[Deepspeed Wav2vec2] integration (#11638)
* wip

* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044

* cleanup

* workaround

* working 5/8 modes

* solve fp32 distributed zero3

* style

* sync

* sync

* rework

* deprecation

* cleanup

* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged

* clean up

* add a guide

* more prose

* more prose

* fix

* more prose

* sub_group_size was too big

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor

* bug fix

* make the true check explicit

* new deepspeed release

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 12:32:03 -07:00
Sylvain Gugger
fd6902838a
Properly indent block_size (#12070) 2021-06-08 10:27:02 -04:00
cdleong
49bee0aea4
Add torch to requirements.txt in language-modeling (#12040)
* Add torch to requirements.txt in language-modeling

* Update examples/pytorch/language-modeling/requirements.txt

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 09:02:35 -04:00
Mario Šaško
f5eec0d8e9
Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}

* Remove torch.Tensor in examples
2021-06-08 13:58:38 +01:00
Shamane Siri
e33085d648
updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)
* updated the original RAG implementation to be compatible with the latest PL version

* updated the requirements.txt file

* execute make style

* code quality test

* code quality

* conflix resolved in requirement.txt

* code quality

* changed the MyDDP class name to CustomDDP
2021-06-08 13:42:49 +01:00
Russell Klopfer
e363e1d936
adds metric prefix. (#12057)
* adds metric prefix.

* update tests to include prefix
2021-06-07 22:34:10 -04:00
Patrick von Platen
242ec31aa5
[Flax] Refactor MLM (#12013)
* fix_torch_device_generate_test

* remove @

* finish refactor

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-03 16:31:32 +01:00
Nicholas Vadivelu
4674061b2a
Fix weight decay masking in run_flax_glue.py (#11964)
* Fix weight decay masking in `run_flax_glue.py`

Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`

* Fix formatting with black

* adapt results

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-03 11:35:26 +01:00
dependabot[bot]
6db3a87de2
Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-06-02 03:40:20 -04:00
Fan Zhang
7e73601f32
modify qa-trainer (#11872)
* modify qa-trainer

* fix flax model
2021-06-01 08:28:41 -04:00
Shamane Siri
9ec0f01b6c
RAG-2nd2end-revamp (#11893)
* initial

* code quality test

* code quality

* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver

* minor change in test_modeling_rag

* fixed tests

* Update examples/research_projects/rag-end2end-retriever/README.md

typo corrected as suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py

type change suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update src/transformers/models/rag/retrieval_rag.py

Adding this change as mentioned by lhoestq.

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* completed the minor changes suggested by the reviewers

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
2021-06-01 07:32:26 +01:00
Philip May
cfca638acb
Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)
* Add MT5ForConditionalGeneration as supported arch.

* Update README.md
2021-05-31 21:24:33 +05:30
Nicholas Vadivelu
1ab147d648
Remove redundant nn.log_softmax in run_flax_glue.py (#11920)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`

`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.

* Remove unused 'flax.linen' import
2021-05-31 15:29:04 +01:00
Avital Oliver
2df546918e
Link official Cloud TPU JAX docs (#11892) 2021-05-26 15:44:40 -04:00
Stas Bekman
1b6530104d
[Examples] create model with custom config on the fly (#11798)
* create custom model on the flight

* better wording

* add update_from_string

* cleanup

* cleanup

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more bool options

* style

* fix logger

* add test

* add the doc

* assert on conflict of options

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-25 10:40:49 -07:00
Stas Bekman
6287c929c1
[lm examples] fix overflow in perplexity calc (#11855)
* fix overflow in perplexity calc

* use inf

* fix
2021-05-25 08:11:26 -07:00
Sylvain Gugger
f086652b16
Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training

* Use an alternate property
2021-05-25 08:03:43 -04:00
Wang Ran (汪然)
b8344a274f
typo (#11858) 2021-05-25 04:23:46 -04:00
Patrick von Platen
f580604157
[Flax] Fix PyTorch import error (#11839)
* fix_torch_device_generate_test

* remove @

* change pytorch import to flax import
2021-05-24 10:41:10 +01:00
Patrick von Platen
da22245ed9
Add flax text class colab (#11824)
* fix_torch_device_generate_test

* remove @

* add flax glue link
2021-05-21 23:11:58 +01:00
Patrick von Platen
82335185fe
[Flax] Small fixes in run_flax_glue.py (#11820)
* fix_torch_device_generate_test

* remove @

* correct best seed for flax fine-tuning

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-21 16:52:23 +01:00
Patrick von Platen
bd9871657b
[Flax] Align GLUE training script with mlm training script (#11778)
* speed up flax glue

* remove unnecessary line

* remove folder

* remove run in loop

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-21 09:36:56 +01:00
Keren Fuentes
223943872e
Fix failing test on Windows Platform (#11589)
* add separator for windows

* fixes test_is_copy_consistent on Windows

* fixing writing encoding issue on extended test (for Windows)

* resolving comments
2021-05-20 19:54:23 -04:00
Patrick von Platen
00440e350f
[Flax MLM] Refactor run mlm with optax (#11745)
* refactor

* update

* update

* update

* refactor run mlm

* finalize

* refactor more

* fix typo

* update

* finish refactor

* modify run mlm

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* small fixes

* upload

* upload

* finish run mlm script

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-19 12:00:58 +01:00
Tomy Hsieh
eb3e072a3b
Fix a small error in summarization example (#11762) 2021-05-18 14:38:36 -04:00
Avital Oliver
77f9bd18af
Add Flax Examples and Cloud TPU README (#11753)
* Add Flax Examples README

* Apply suggestions from code review

* Update examples/flax/README.md

* add nice table

* fix

* fix

* apply suggestions

* upload

* finish flax readme.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-18 17:45:16 +01:00
Philipp Schmid
04e25c6286
add dataset_name to data_args and added accuracy metric (#11760)
* add `dataset_name` to data_args and added accuracy metric

* added documentation for dataset_name

* spelling correction
2021-05-18 16:27:29 +02:00
Patrick von Platen
cebb96f53a
Add more subsections to main doc (#11758)
* add headers to main doc

* Apply suggestions from code review

* update

* upload
2021-05-18 14:38:56 +01:00
Tommy Chiang
da7e73b721
Fix incorrect newline in #11650 (#11757) 2021-05-18 15:28:13 +02:00
Sylvain Gugger
936b57158a
Use new evaluation loop in TrainerQA (#11746) 2021-05-17 10:10:13 -04:00
Marc van Zee
726e953d44
Improvements to Flax finetuning script (#11727)
* Add Cloud details to README

* Flax script and readme updates

* Some simplifications of Flax script
2021-05-17 09:26:33 +01:00
Marc van Zee
94a2348706
Add Cloud details to README (#11706)
* Add Cloud details to README

* Flax script and readme updates
2021-05-14 14:51:25 +01:00
Patrick von Platen
113eaa7575
correct example script (#11726) 2021-05-14 12:02:57 +01:00
Lysandre
d77eb0cf92 Docs for v4.7.0.dev0 2021-05-12 17:08:35 +02:00
Lysandre
64e78564a5 Release: v4.6.0 2021-05-12 17:03:03 +02:00
Philip May
77f4c46b50
remove defaults to None if optional (#11703) 2021-05-12 09:11:10 -04:00
Marc van Zee
6797cdc077
Updates README and fixes bug (#11701) 2021-05-12 13:52:52 +01:00
Marc van Zee
4ce6bcc310
Adds Flax BERT finetuning example on GLUE (#11564)
* Adds Flax BERT finetuning example

* fix traced jax tensor type

* Use Optax losses and learning schedulers

* Add 1GPU training results

* merge into master & make style

* fix input

* del file

* Fix bug in loss and add torch runs

* finish bert flax fine-tune

* Update examples/flax/text-classification/README.md

* Update examples/flax/text-classification/run_flax_glue.py

* add requirements

* finalize

* finalize

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-11 19:02:59 +01:00
Sylvain Gugger
a135f59536
Auto modelcard (#11599)
* Autogenerate model cards from the Trainer

* ModelCard deprecated

* Fix test

* Style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Quality

* With all metadata

* Metadata

* Post-merge conflict mess

* Data args and all examples

* Default license and languages when possible

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-11 11:30:34 -04:00
Jonathan Chang
64232bc0df
Add --text_column to run_summarization_no_trainer (#11673) 2021-05-11 07:58:38 -04:00
Matt
ef8d32c5ea
Fix suggested by @bhadreshpsavani (#11660) 2021-05-10 14:28:04 +01:00
Quentin Lhoest
1a0b41781d
Update requirements.txt (#11634) 2021-05-10 11:19:52 +05:30
Tommy Chiang
7e406f4a65
[Examples] Fix invalid links after reorg (#11650) 2021-05-10 11:16:48 +05:30
Tommy Chiang
f2ffcaf49f
[Examples] Check key exists in datasets first (#11503) 2021-05-09 15:42:38 -04:00
Stas Bekman
ba0d50f214
[examples] fix sys.path in conftest.py (#11636)
* restore conftest.py

* fix conftest and make copies

* remove unneeded parts

* remove unwanted files
2021-05-07 14:44:22 -07:00
Jonathan Chang
6f40e31766
Fix comment in run_clm_no_trainer.py (#11624) 2021-05-07 12:32:30 +05:30
Vipul Raheja
f594090a93
fix typo in command (#11605) 2021-05-06 12:32:54 +05:30
Patrick von Platen
3e3e41ae20
Pytorch - Lazy initialization of models (#11471)
* lazy_init_weights

* remove ipdb

* save int

* add necessary code

* remove unnecessary utils

* Update src/transformers/models/t5/modeling_t5.py

* clean

* add tests

* correct

* finish tests

* finish tests

* fix some more tests

* fix xlnet & transfo-xl

* fix more tests

* make sure tests are independent

* fix tests more

* finist tests

* final touches

* Update src/transformers/modeling_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* clean tests

* give arg positive name

* add more mock weights to xlnet

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-05 17:22:20 +02:00
Sylvain Gugger
6b241e0e3b
Reproducible checkpoint (#11582)
* Set generator in dataloader

* Use generator in all random samplers

* Checkpoint all RNG states

* Final version

* Quality

* Test

* Address review comments

* Quality

* Remove debug util

* Add python and numpy RNGs

* Split states in different files in distributed

* Quality

* local_rank for TPUs

* Only use generator when accepted

* Add test

* Set seed to avoid flakiness

* Make test less flaky

* Quality
2021-05-04 16:20:56 -04:00
Patrick von Platen
084a187da3
[FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py (#11470)
* add flax roberta

* make style

* correct initialiazation

* modify model to save weights

* fix copied from

* fix copied from

* correct some more code

* add more roberta models

* Apply suggestions from code review

* merge from master

* finish

* finish docs

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-04 19:57:59 +02:00
Sylvain Gugger
87dd1a00ef
Fix metric computation in run_glue_no_trainer (#11569) 2021-05-03 11:42:55 -04:00
Bhadresh Savani
84326a28f8
[Examples] Added support for test-file in QA examples with no trainer (#11510)
* added support for test-file

* fixed typo

* added suggested changes

* reformatted code

* modifed files

* fix post processing error

* Trigger CI

* removed extra lines
2021-04-30 09:02:50 -04:00
Suraj Patil
57c8e822f7
reszie token embeds (#11524) 2021-04-30 08:47:01 -04:00
Matt
20d6931e32
Update TF text classification example (#11496)
Big refactor, fixes and multi-GPU/TPU support
2021-04-30 13:45:33 +01:00
Manuel Romero
58c789e3d2
Update README.md (#11489)
Add link to code
2021-04-30 04:29:59 -04:00
Sylvain Gugger
b29eb247d3
Split checkpoint from model_name_or_path in examples (#11492)
* Split checkpoint from model_name_or_path in examples

* Address review comments

* Address review comments
2021-04-29 18:33:47 -04:00
Jaimeen Ahn
0661abc545
Variable Correction for Consistency in Distillation Example (#11444)
As the error comes from the inconsistency of variable meaning number of gpus in parser and its actual usage in the train.py script, 'gpus' and 'n_gpu' respectively,  the correction makes the example work
2021-04-26 13:30:48 -04:00
Bhadresh Savani
1d30ec95c7
[Examples] Fixes inconsistency around eval vs val and predict vs test (#11380)
* added changes for uniformity

* modified files

* corrected typo

* fixed qa scripts

* fix typos

* fixed predict typo in qa no trainer

* fixed test file

* reverted trainer changes

* reverted trainer changes in custom exmaples

* updated readme

* added changes in deepspeed test

* added changes for predict and eval
2021-04-26 09:24:31 -07:00
Amine Abdaoui
e3e70f9551
docs(examples): fix link to TPU launcher script (#11427) 2021-04-26 09:08:43 -04:00
Patrick von Platen
32dbb2d954
make style (#11442) 2021-04-26 13:50:34 +02:00
Sylvain Gugger
1ef152eb48
Default to accuracy metric (#11405) 2021-04-23 14:49:59 -04:00
Sylvain Gugger
bf2e0cf70b
Trainer push to hub (#11328)
* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-23 09:17:37 -04:00
Yoshitomo Matsubara
c3d6f33918
fixed typos (#11391) 2021-04-23 07:48:42 -04:00
Max Del
a90d3f1862
Fix typo in text (#11396) 2021-04-23 07:37:19 -04:00
Patrick von Platen
b48cf7124c
correct typo (#11393) 2021-04-23 11:34:59 +02:00
Matt
2617396094
Correctly cast num_train_epochs to int (#11379) 2021-04-22 13:49:59 +01:00
johnson7788
5b5e4ca366
[run_translation.py] fix typo (#11372)
fix typo

Co-authored-by: johnson <johnson@github.com>
2021-04-22 17:47:11 +05:30
Matt
6fe79e57d7
Move old TF text classification script to legacy (#11361)
And update README to explain the work-in-progress!
2021-04-21 17:36:18 +01:00
Matt
ac588594e2
Merge new TF example script (#11360)
First of the new and more idiomatic TF examples!
2021-04-21 17:04:55 +01:00
Sylvain Gugger
dabeb15292
Examples reorg (#11350)
* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments and clean

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00
Sylvain Gugger
f1b938fda8
Update to use datasets remove_cloumns method (#11343)
* Update to use datasets remove_cloumns method

* Quality
2021-04-20 14:12:01 -04:00
rajvi-k
bfd83c17a7
Added translation example script (#11196)
* initial changes

* modified evaluation

* updated evaluation

* updated evaluation on text translation example script

* added translation example script

* Formatted translation example script

* Reformatted translation example

* Fixed evaluation bug and added support for other tokenisers

* Fixed evaluation bug and added support for other tokenisers

* Added translation example script

* Formatted summarization example script

* Removed typos from summarization example script
2021-04-20 07:18:47 -04:00
Sudharsan S T
f25444cb22
Close open files to suppress ResourceWarning (#11240)
Co-authored-by: Sudharsan Thirumalai <sudharsan.t@sprinklr.com>
2021-04-14 10:31:04 -04:00
Nithin Holla
653076ca30
Save the Wav2Vec2 processor before training starts (#10910)
Co-authored-by: nithin19 <nithin@amberscript.com>
2021-04-14 14:52:06 +03:00
Philipp Schmid
9fa2995993
added cache_dir=model_args.cache_dir to all example with cache_dir arg (#11220) 2021-04-13 18:35:18 +02:00
Takuya Makino
cb251ba619
Fix typo (#11188) 2021-04-12 17:35:32 -04:00
Masatoshi TSUCHIYA
ef102c4886
model_path should be ignored as the checkpoint path (#11157)
* model_path is refered as the path of the trainer, and should be ignored as the checkpoint path.

* Improved according to Sgugger's comment.
2021-04-12 09:06:41 -04:00
Stas Bekman
07f0bb691d
[examples run_clm] fix _LazyModule hasher error (#11168)
* fix _LazyModule hasher error

* reword
2021-04-09 11:39:12 -07:00
Suraj Patil
c161dd56df
[examples/translation] support mBART-50 and M2M100 fine-tuning (#11170)
* keep a list of multilingual tokenizers

* add forced_bos_token argument
2021-04-09 23:58:42 +05:30
Saviour Owolabi
6060746570
Update README.md (#11161)
Corrected a typo ('Downlowd' to 'Download')
2021-04-09 11:52:21 -04:00
Stas Bekman
66446909b2
[tests] relocate core integration tests (#11146)
* relocate core integration tests

* add sys.path context manager

* cleanup

* try

* try2

* fix path

* doc

* style

* add dep

* add 2 more deps
2021-04-08 13:13:17 -07:00
Andrea Cappelli
6c40e49712
Run mlm pad to multiple for fp16 (#11128)
* Add mlm collator pad to multiple option (#10627)

* Use padding to 8x in run mlm (#10627)
2021-04-08 16:12:49 -04:00
Stas Bekman
c6d664849b
[DeepSpeed] ZeRO Stage 3 (#10753)
* synced gpus

* fix

* fix

* need to use t5-small for quality tests

* notes

* complete merge

* fix a disappearing std stream problem

* start zero3 tests

* wip

* tune params

* sorting out the pre-trained model loading

* reworking generate loop wip

* wip

* style

* fix tests

* split the tests

* refactor tests

* wip

* parameterized

* fix

* workout the resume from non-ds checkpoint pass + test

* cleanup

* remove no longer needed code

* split getter/setter functions

* complete the docs

* suggestions

* gpus and their compute capabilities link

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* style

* remove invalid paramgd

* automatically configure zero3 params that rely on hidden size

* make _get_resized_embeddings zero3-aware

* add test exercising resize_token_embeddings()

* add docstring

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-08 09:53:01 -07:00
Stas Bekman
acc851e1ff
[run_clm] clarify why we get the tokenizer warning on long input (#11145)
* clarify why we get the warning here

* Update examples/language-modeling/run_clm.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* wording

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-08 09:46:28 -07:00
Stas Bekman
424419f549
[examples] fix white space (#11099)
these get concatenated without whitespace, so fix it
2021-04-07 09:20:58 -04:00
Stas Bekman
c9035e4537
fix: The 'warn' method is deprecated (#11105)
* The 'warn' method is deprecated

* fix test
2021-04-07 09:20:06 -04:00
Sylvain Gugger
fd338abdeb Style 2021-04-06 19:54:13 -04:00
SHYAM SUNDER KUMAR
aef4cf8c52
accelerate question answering examples with no trainer (#11091)
* accelerate question answering examples with no trainer

* removed train and eval flags also fixed fill np array function

* Update examples/question-answering/run_qa_beam_search_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-06 19:35:21 -04:00
Lysandre
9853c5dd58 Development on v4.6.0dev0 2021-04-06 12:53:25 -04:00
Lysandre
4906a29f7f Release v4.5.0 2021-04-06 12:37:47 -04:00
Hemil Desai
6ab7d1a429
Add Readme for language modeling scripts with accelerate (#11073) 2021-04-05 20:56:12 -04:00
Hemil Desai
b51b87c41d
Add examples/language_modeling/run_clm_no_trainer.py (#11026)
* Initial draft for clm no trainer

* Remove unwanted args

* Fix bug

* Update examples/language-modeling/run_clm_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-05 12:27:52 -04:00
Stas Bekman
3d39226a51
s|Pretrained|PreTrained| (#11048) 2021-04-04 18:08:42 -07:00
versis
335c0ca35c
fixed typo: logging instead of logger (#11025) 2021-04-02 09:22:22 -04:00
Hemil Desai
838f83d84c
Add examples/language_modeling/run_mlm_no_trainer.py (#11001)
* Add initial script for finetuning MLM models with accelerate

* Add evaluation metric calculation

* Fix bugs

* Use no_grad on evaluation

* update script docstring

* Update examples/language-modeling/run_mlm_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* PR feedback

* Fix CI failure

* Update examples/language-modeling/run_mlm_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-31 18:49:45 -04:00
Sylvain Gugger
acc3bd9d2a
Enforce string-formatting with f-strings (#10980)
* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-31 10:00:27 -04:00
WybeKoper
645f45c462
Fixed some typos and removed legacy url (#10989)
* Fixed typos

* Removed legacy colab notebook from readme

Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-31 16:53:15 +05:30
Yih-Dar
e031162a6b
fix md file to avoid evaluation crash (#10962) 2021-03-30 21:26:22 +03:00
Philipp Schmid
3e09d813aa
[examples/s2s] added py7zr dep (#10971)
* added py7zr

* comment out check_min for sagemaker test

* added min version again
2021-03-30 23:17:12 +05:30
Stas Bekman
05c966f24b
[vulnerability] dep fix (#10954)
Fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/Pygments/open

@LysandreJik
2021-03-29 17:25:47 -04:00
Daniel Stancl
5057213bcc
Add examples/multiple-choice/run_swag_no_trainer.py (#10934)
* Initial commit

* Another bunch of updates

* make style quliaty + delete debug arg from bash script

* Use compue_metrics func

* Do a few fixes

* Add copyright

* Fix typos
2021-03-29 16:41:09 -04:00
Sylvain Gugger
4002f95eb6 Remove duplicate code 2021-03-29 15:27:12 -04:00
Daniel Stancl
d7b50ce469
Add examples/run_ner_no_trainer.py (#10902)
* Add NER example with accelerate library

* This commit contains the first (yet really unfinished)
version of a script for showing how to train HuggingFace model
with their new accelerate library.

* Fix metric calculation

* make style quality

* mv ner_no_trainer to token-classification dir

* Delete --debug flag from running script

* hf_datasets -> raw_datasets

* Make a few slight adjustments

* Add an informative comment + rewrite a help comment

* Change header

* Fix a few things

* Enforce to use fast tokenizers only

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Change bash script: python3 -> accelerate launch

* make style

* Add a few missing things (see below)

* Add a max-lenghth padding to predictions and labels to
enable accelerate gather functionality

* Add PyTorch no trainer example to the example README.md

* Remove --do-train from args as being redundant for now

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Remove some obsolete args.do_train conditions from the script

* Delete --do_train from bash running script

* Delete use_slow_tokenizer from args

* Add unintentionally removed flag --label_all_tokens

* Delete --debug flag from running script
2021-03-29 15:11:23 -04:00
WybeKoper
ddea8771c6
Updated colab links in readme of examples (#10932)
Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
2021-03-29 08:47:09 -04:00
Bhadresh Savani
4f21e1ddd6
fixed finename (#10939) 2021-03-28 09:48:12 -07:00
Stas Bekman
3c27d246e5
[vulnerability] fix dependency (#10914)
this PR fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/PyYAML/open
2021-03-26 09:06:11 -04:00
Jethro Kuan
5f1491d3b3
run_glue_no_trainer: datasets -> raw_datasets (#10898)
Use the correct variable (raw_datasets) instead of the module (datasets)
where appropriate.
2021-03-25 08:28:17 -04:00
Bhadresh Savani
7ef40120a0
[Examples] Added predict stage and Updated Example Template (#10868)
* added predict stage

* added test keyword in exception message

* removed example specific saving predictions

* fixed f-string error

* removed extra line

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-23 10:37:59 -07:00
Eliza Szczechla
9f8fa4e973
Use DataCollatorForSeq2Seq in run_summarization in all cases (#10856)
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
2021-03-22 15:05:39 -04:00
Boris Dayma
125ccead71
feat(wandb): logging and configuration improvements (#10826)
* feat: ensure unique artifact id

* feat: allow manual init

* fix: simplify reinit logic

* fix: no dropped value + immediate commits

* fix: wandb use in sagemaker

* docs: improve documenation and formatting

* fix: typos

* docs: improve formatting
2021-03-22 10:45:17 -04:00
Stas Bekman
8fb4671811
[vulnerability] in example deps fix (#10817)
Takes care of:
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/jinja2/open

@LysandreJik

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-22 09:05:24 -04:00
dependabot[bot]
dbfe379514
Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert (#10818)
Bumps [jinja2](https://github.com/pallets/jinja) from 2.11.2 to 2.11.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/2.11.2...2.11.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-22 08:54:50 -04:00
Qiushi Pan
29904a967b
Update FINE_TUNE_XLSR_WAV2VEC2.md (#10849)
Fix typo.
2021-03-22 07:58:59 -04:00
Patrick von Platen
0f226f78ce
push (#10846) 2021-03-22 10:32:21 +03:00
Suraj Patil
82b8d8c7b0
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-21 22:47:09 +05:30
Patrick von Platen
af6125ffdb
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-21 12:31:33 +03:00
Patrick von Platen
5aaf6e1460
small improvements for wav2vec2 info script (#10829) 2021-03-21 11:41:44 +03:00
Suraj Patil
68b55885ed
add doc for Local machine (#10828) 2021-03-21 13:25:34 +05:30
Julien Chaumond
1438c487df
wav2vec doc tweaks (#10808)
* wording/typos tweaks

* Make model upload instructions simpler
2021-03-19 12:48:54 -04:00
Patrick von Platen
b9570a813c
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 19:45:28 +03:00
Sylvain Gugger
946400fb68
Expand a bit the presentation of examples (#10799)
* Expand a bit the presentation of examples

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-19 10:06:08 -04:00
Bhadresh Savani
fd1d9f1ab8
[Example] Updating Question Answering examples for Predict Stage (#10792)
* added prediction stage and eval fix

* style correction

* removed extra lines
2021-03-19 09:42:17 -04:00
Patrick von Platen
e8968bd03a
[XLSR-Wav2Vec2 Info doc] Add a couple of lines (#10806)
* finish

* fix

* fix

* fix

* fix
2021-03-19 12:52:54 +03:00
Stas Bekman
427ea3fecb
addressing vulnerability report in research project deps (#10802)
Following up on a security alert:
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/Pillow/open
2021-03-18 22:02:10 -04:00
Patrick von Platen
2ae678229f
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:29:20 +03:00
Patrick von Platen
68a3215949
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:27:40 +03:00
Patrick von Platen
03df3fbcb4
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:26:49 +03:00
Patrick von Platen
e84adbed40
Add XLSR-Wav2Vec2 Fine-Tuning README.md (#10786)
* upload

* upload fine-tuning script

* improve

* adapt

* Apply suggestions from code review

* correct

* upload

* finalize

* remove @

* correct typos
2021-03-19 00:22:43 +03:00
Stas Bekman
9352b5151a
[examples/seq2seq/README.md] fix t5 examples (#10734)
* [examples/seq2seq] fix t5 examples

This PR:
* fixes T5 examples to include `--source_prefix` - it's **not** optional. If you give it a try you will see that you get 10x worse bleu scores w/o it. w/ `27.6849`, w/ `2.374`
* added a normal translation example w/o the peculiarities of MBart and T5
* reduces the default max samples to 50 so it's much faster to test quickly

summarization seems to be broken for t5 score-wise: https://github.com/huggingface/transformers/issues/10733

@sgugger

* specify explicitly the t5 models requiring the special handling

* one more

* update the t5 summarization example to use cnn_dailymail

* move max*samples into the top level README.md

* better wording

* better wording
2021-03-18 09:55:39 -07:00
Julien Chaumond
4f3e93cfaf
[file_utils] do not gobble certain kinds of requests.ConnectionError (#10235)
* do not gobble certain kinds of requests.ConnectionError

* Apply review comments

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-18 12:37:45 -04:00
Suraj Patil
5f19c07a70
add run_common_voice script (#10767)
* add initial script

* finish script

* add shell script example

* accept chars_to_ignor as cl arg

* align the script with other example scripts

* add torchaudio dep
2021-03-18 17:21:16 +05:30
Mohamed El-Geish
af8afdc88d
wav2vec2: support datasets other than LibriSpeech (#10581)
* wav2vec2: support datasets other than LibriSpeech

* Formatting run_asr.py to pass code quality test

* bundled orthography options and added verbose logs

* fixing a typo in timit fine-tuning script

* update comment for clarity

* resize_lm_head and load custom vocab from file

* adding a max_duration_in_seconds filter

* do not assign `duration_filter` lambda, use a def

* log untransliterated text as well

* fix base model for arabic

* fix duration filter when target_sr is not set

* drop duration_in_seconds when unneeded

* script for wav2vec2-large-lv60-timit-asr

* fix for "tha" in arabic corpus (huggingface#10581)

* adding more options to work with common_voice

* PR feedback (huggingface#10581)

* small README change
2021-03-18 10:20:26 +03:00
Stas Bekman
393739194e
[examples] document resuming (#10776)
* document resuming in examples

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* put trainer code last, adjust notes

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-17 12:48:35 -07:00
Stas Bekman
cd8c93f701
[DeepSpeed] improve checkpoint loading code plus tests (#10760)
* deepspeed checkpoint loading code plus tests

* style

* style
2021-03-17 10:22:58 -07:00
Cheng Li
c83fbc5f2d
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed (#10464)
* pass hf optimizer and scheduler to deepspeed if not specified in ds config

* pass hf optimizer and scheduler to deepspeed if not specified in ds config

* update

* make init_deepspeed support config dict

* fix docstring formatting

* clean up trainer's comments

* add new tests

* fix type

* composit argparse doesn't work

* style

* add a new test, rename others

* document new functionality

* complete tests, add docs

* style

* correct level

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add new methods to the doc

* must tell DS we are using a non-native optimizer

* add protection against cpu_offload + HF optimizer combo

* fix the cli overrides

* sync docs + tests

* restore AdamW

* better docs

* need new version

* no longer needed

* remove outdate information

* refactor duplicated code

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-16 15:51:09 -07:00
Lysandre
1b5ce1e63b Development on v4.5.0dev0 2021-03-16 11:41:15 -04:00
Lysandre
c988db5af2 Release v4.4.0 2021-03-16 11:33:35 -04:00
Russell Klopfer
87d685b8a9
independent training / eval with local files (#10710)
* independent training / eval with local files

* remove redundant assert
2021-03-15 19:35:26 -04:00
Sylvain Gugger
4c379daf64
Add minimum version check in examples (#10724)
* Add minimum version check in examples

* Style

* No need for new line maybe?

* Add helpful comment
2021-03-15 19:29:54 -04:00
Joe Davison
966ba081c9
zero-shot pipeline multi_class -> multi_label (#10727) 2021-03-15 16:02:46 -06:00
Théo Matussière
6f840990a7
split seq2seq script into summarization & translation (#10611)
* split seq2seq script, update docs

* needless diff

* fix readme

* remove test diff

* s/summarization/translation

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* cr

* fix arguments & better mbart/t5 refs

* copyright

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* reword readme

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* s/summarization/translation

* short script names

* fix tests

* fix isort, include mbart doc

* delete old script, update tests

* automate source prefix

* automate source prefix for translation

* s/translation/trans

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* fix script name (short version)

* typos

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* exact parameter

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* remove superfluous source_prefix calls in docs

* rename scripts & warn for source prefix

* black

* flake8

Co-authored-by: theo <theo@matussie.re>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-15 09:11:42 -04:00
Stas Bekman
4c32f9f26e
AdamW is now supported by default (#9624) 2021-03-12 13:40:07 -08:00
Lysandre Debut
9fbb4cdc80
Specify minimum version for sacrebleu (#10662) 2021-03-11 13:45:06 -05:00
ArvidYin
27d9e05ce2
Update README.md (#10647)
correct spell error: 'nether'
2021-03-11 08:58:04 -05:00
Sylvain Gugger
efb5c0a453
Add new GLUE example with no Trainer. (#10555)
* Add new GLUE example with no Trainer.

* Style

* Address review comments
2021-03-10 09:29:19 -05:00
Allen Wang
6f52fce673
Fixes an issue in text-classification where MNLI eval/test datasets are not being preprocessed. (#10621)
* Fix MNLI tests

* Linter fix
2021-03-09 22:13:45 -05:00
Sylvain Gugger
0d909f6bd8
Fairscale FSDP fix model save (#10596)
* Hotfix fairscale FSDP

* Evaluation works

* Save on process zero
2021-03-09 14:42:07 -05:00
Stas Bekman
f284089ec4
[examples tests on multigpu] resolving require_torch_non_multi_gpu_but_fix_me (#10561)
* batch 1

* this is tpu

* deebert attempt

* the rest
2021-03-08 11:11:40 -08:00
Bhadresh Savani
dfd16af832
Added max_sample_ arguments (#10551)
* reverted changes of logging and saving metrics

* added max_sample arguments

* fixed code

* white space diff

* reformetting code

* reformatted code
2021-03-08 13:57:10 -05:00
Stas Bekman
917f104502
[examples tests] various fixes (#10584)
* fix sharded ddp enum

* test fixes

* stronger validation + apex breaks other tests
2021-03-08 10:28:44 -08:00
Stas Bekman
e6ce636e02
fix nltk lookup (#10585) 2021-03-07 22:09:58 -08:00
Stas Bekman
88a951e3cc
offline mode for firewalled envs (#10407)
* offline mode start

* add specific values

* fix fallback

* add test

* better values check and range

* test that actually works

* document the offline mode

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more strict check

* cleaner test

* pt-only test

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-05 17:27:48 -08:00
Patrick von Platen
395ffcd757
fix run seq2seq (#10547) 2021-03-05 18:17:12 +03:00
Sylvain Gugger
a5bd40b75c
Not always consider a local model a checkpoint in run_glue (#10517) 2021-03-04 11:11:39 -05:00
Sylvain Gugger
745ea78dcc Revert "Not always consider a local model a checkpoint in run_glue"
This reverts commit f3660613bc.
2021-03-04 09:45:18 -05:00
Sylvain Gugger
f3660613bc Not always consider a local model a checkpoint in run_glue 2021-03-04 09:44:02 -05:00
Patrick von Platen
0234de8418
Add Fine-Tuning for Wav2Vec2 (#10145)
* add encode labels function to tokenizer

* start adding finetuning

* init dropout

* upload

* correct convert script

* apply changes

* fix second typo

* make first dummy training run

* adapt convert script

* push confg for comparison

* remove conf

* finish training

* adapt data collator

* add research folder

* update according to fairseq feedback

* some minor corrections

* refactor masking indices a bit

* some minor changes

* clean tokenizer

* finish clean-up

* remove previous logic

* update run script

* correct training

* finish changes

* finish model

* correct bug

* fix training a bit more

* add some tests

* finish gradient checkpointing

* finish example

* correct gradient checkpointing

* improve tokenization method

* revert changes in tokenizer

* revert general change

* adapt fine-tuning

* update

* save intermediate test

* Update README.md

* finish finetuning

* delete conversion script

* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* finish wav2vec2 script

* finish wav2vec2 fine-tuning

* finalize test

* correct test

* adapt tests

* finish

* remove test file

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-01 12:13:17 +03:00
Bhadresh Savani
aca6288ff4
updated logging and saving metrics (#10436)
* updated logging and saving metrics

* space removal
2021-02-27 09:53:44 -08:00
Stas Bekman
f52a15897b
[run_seq2seq.py] restore functionality: saving to test_generations.txt (#10428)
This PR restores the original functionality that for some reason was modified.

Fixes: https://github.com/huggingface/transformers/issues/10381

@sgugger
2021-02-27 08:21:50 -08:00
Stas Bekman
ee04b69822
[examples] better model example (#10427)
* refactors

* typo
2021-02-26 17:01:01 -08:00
Sylvain Gugger
17b6e0d474
Fix run_glue evaluation when model has a label correspondence (#10401) 2021-02-25 15:30:38 -05:00
Sylvain Gugger
9d14be5c20
Add support for ZeRO-2/3 and ZeRO-offload in fairscale (#10354)
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale

* Quality

* Rework from review comments

* Add doc

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-02-25 11:07:53 -05:00
Patrick von Platen
cb38ffcc5e
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324)
* push to show

* small improvement

* small improvement

* Update src/transformers/feature_extraction_utils.py

* Update src/transformers/feature_extraction_utils.py

* implement base

* add common tests

* make all tests pass for wav2vec2

* make padding work & add more tests

* finalize feature extractor utils

* add call method to feature extraction

* finalize feature processor

* finish tokenizer

* finish general processor design

* finish tests

* typo

* remove bogus file

* finish docstring

* add docs

* finish docs

* small fix

* correct docs

* save intermediate

* load changes

* apply changes

* apply changes to doc

* change tests

* apply surajs recommend

* final changes

* Apply suggestions from code review

* fix typo

* fix import

* correct docstring
2021-02-25 17:42:46 +03:00
Stas Bekman
3437d12134
[Trainer/Deepspeed] handle get_last_lr() before first step() (#10362)
* handle get_last_lr() before first step()

* abstract away the lr getting logic

* cleanup

* add test

* move to utils
2021-02-23 17:42:25 -08:00
Akmal
23e87c27be
Fix broken examples/seq2seq/README.md markdown (#10344) 2021-02-23 10:49:25 -05:00
Stas Bekman
622a8c5995
[trainer] add Trainer methods for metrics logging and saving (#10266)
* make logging and saving trainer built-in

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-22 13:02:53 -08:00
Stas Bekman
eab0afc19c
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310)
* implement gradient_accumulation_steps support in DeepSpeed integration

* typo

* cleanup

* cleanup
2021-02-22 11:15:59 -08:00
Stas Bekman
f991daed18
defensive programming + expand/correct README (#10295) 2021-02-22 10:58:50 -08:00
Julien Plu
536aee99bb
Move the TF NER example (#10276) 2021-02-19 16:06:13 -05:00
Joe Davison
cbadb5243c
Zero shot distillation script cuda patch (#10284) 2021-02-19 14:06:57 -05:00
Joe Davison
c6fe17557e
Script for distilling zero-shot classifier to more efficient student (#10244)
* add zero-shot distillation script

* readme wordsmithing

* clean up code

* add multi-gpu teacher inference
plus tidying up more code

* add use_fast_tokenizer arg

* update results in readme

* more readme wordsmithing

* style

* Add handle to readme

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix code block

* add error+docs about distributed & tpu

* add @sgugger format requests

* xla -> tpu

* support fp16 for teacher preds

* no checkpoint by default

* add demo colab link

* add model sharing prompt + model link

* correct resulting acc of example

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-02-18 17:08:45 -05:00
Stas Bekman
97e688bc22
[Trainer] memory tracker metrics (#10225)
* memory tracker metrics

* go back to eval for somewhat consistency

* handle no-gpu case

* deal with stackable eval calls

* restore callback order

* style

* simplify the API

* add test

* docs

* consistently use eval_ prefix

* improve docs

* Update src/transformers/trainer_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename method

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 09:27:32 -08:00
Stas Bekman
d1eb88f42d
[CI] 2 fixes (#10248)
* fix invalid port

* missing requirements
2021-02-17 14:12:39 -08:00
Zhang Cheng
df1b0fb54d
set tgt_lang of MBart Tokenizer for summarization (#10205) 2021-02-16 09:39:37 -05:00
Suraj Patil
1c8c2d9ab3
[WIP][examples/seq2seq] move old s2s scripts to legacy (#10136)
* move old s2s scripts to legacy

* add the tests back

* proper rename

* restore

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-15 10:48:02 -08:00
Stas Bekman
0b1f552a24
fix run_seq2seq.py; porting trainer tests to it (#10162)
* fix run_seq2seq.py; porting DeepSpeed tests to it

* unrefactor

* defensive programming

* defensive programming 2

* port the rest of the trainer tests

* style

* a cleaner scripts dir finder

* cleanup
2021-02-15 09:12:17 -08:00
Suraj Patil
f51188cbe7
[examples/run_s2s] remove task_specific_params and update rouge computation (#10133)
* fix rouge metrics and task specific params

* fix typo

* round metrics

* typo

* remove task_specific_params
2021-02-12 17:18:21 +05:30
Stas Bekman
b54cb0bd82
[DeepSpeed in notebooks] Jupyter + Colab (#10130)
* init devices/setup explicitly

* docs + test

* simplify

* cleanup

* cleanup

* cleanup

* correct the required dist setup

* derive local_rank from env LOCAL_RANK
2021-02-11 14:02:05 -08:00
Qbiwan
8dcfaea08d
Update run_xnli.py to use Datasets library (#9829)
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* fix

* fix

* fix

* push

* fix

* everything works

* fix init

* fix

* special treatment for sepconv1d

* style

* 🙏🏽

* add doc and cleanup


* fix doc

* fix doc again

* fix doc again

* Apply suggestions from code review

* make style

* Proposal that should work

* Remove needless code

* Fix test

* Apply suggestions from code review

* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* amend README

* removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README.

* removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset()

* removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README
2021-02-11 10:27:23 +05:30
Stas Bekman
77b862847b
[DeepSpeed] restore memory for evaluation (#10114)
* free up memory at the end of train

* rework tests

* consistent formatting

* correction
2021-02-10 09:09:48 -08:00
Lysandre Debut
0d8e554d42
Line endings should be LF across repo and not CRLF (#10119) 2021-02-10 10:50:00 -05:00
Boris Dayma
7c7962ba89
doc: update W&B related doc (#10086)
* doc: update W&B related doc

* doc(wandb): mention report_to

* doc(wandb): commit suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* doc(wandb): fix typo

* doc(wandb): remove WANDB_DISABLED

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-09 14:47:52 -05:00
Suraj Patil
63fddcf69c
[examples/s2s] add test set predictions (#10085)
* add do_predict, pass eval_beams durig eval

* update help

* apply suggestions from code review
2021-02-09 20:41:41 +05:30
Stas Bekman
781220acab
transition to new tests dir (#10080) 2021-02-08 12:41:52 -08:00
Stas Bekman
322037e842
[trainer] deepspeed bug fixes and tests (#10039)
* deepspeed bug fixes and tests

* manual wrap?
2021-02-08 09:44:02 -08:00
Olivier
ece6c51458
[s2s examples] Replace -100 token ids with the tokenizer pad_id for compute_metrics (#10046)
* replace -100 token ids with the tokenizer pad_id for compute_metrics

* fixed typo for label_ids
2021-02-08 10:08:16 -05:00
Sylvain Gugger
b01483faa0
Truncate max length if needed in all examples (#10034) 2021-02-08 05:03:55 -05:00
Stas Bekman
24db8cc329
Can't mix --fp16 and --device cpu (#10041) 2021-02-07 17:54:20 -08:00
Stas Bekman
769948fad2
json to jsonlines, and doc, and typo (#10043) 2021-02-07 17:51:34 -08:00
Stas Bekman
8ea412a86f
[examples] make run scripts executable (#10037)
* make executable

* make executable

* same for the template

* cleanup
2021-02-05 15:51:18 -08:00
Suraj Patil
1cd16512dc
[examples/seq2seq] support label smoothing (#9844)
* add prepare_decoder_input_ids_from_labels in s2s models

* support lbl smoothing and enc/emb freezing

* fix freezing

* use pad_token_id from config

* remove embed freezing and add warning

* prepare decoder_input_ids inside DataCollatorForSeq2Seq
2021-02-05 23:21:57 +05:30
Suraj Patil
bca0dd5ee3
[run_clm.py] fix getting extention 2021-02-03 20:14:42 +05:30
Stas Bekman
d55e10beab
[research proj] [lxmert] rm bleach dependency (#9970)
Looks like a vulnerability and it's not really used anywhere in the code, so just as well remove it completely from deps.
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/bleach/open
2021-02-03 05:24:40 -05:00
Patrick von Platen
538b3b4607
[Tokenizer Utils Base] Make pad function more flexible (#9928)
* change tokenizer requirement

* split line

* Correct typo from list to str

* improve style

* make other function pretty as well

* add comment

* correct typo

* add new test

* pass tests for tok without padding token

* Apply suggestions from code review
2021-02-02 10:35:27 +03:00
Sylvain Gugger
115d97dd2f
Remove subclass for sortish sampler (#9907)
* Remove subclass for sortish sampler

* Use old Seq2SeqTrainer in script

* Styling
2021-02-01 08:06:32 -05:00
wlhgtc
1682804ebd
Fit chinese wwm to new datasets (#9887)
* MOD: fit chinese wwm to new datasets

* MOD: move wwm to new folder

* MOD: formate code

* Styling

* MOD add param and recover trainer

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-02-01 03:37:59 -05:00
Stas Bekman
6bab83683b
fix logger format for non-main process (#9911) 2021-02-01 03:08:12 -05:00
Stas Bekman
6bf94bc0b6
correctly handle mt5 (#9879) 2021-01-29 08:11:22 -08:00
Sylvain Gugger
b4e559cfa1
Deprecate model_path in Trainer.train (#9854) 2021-01-28 08:32:46 -05:00
Sylvain Gugger
f2fabedbab
Setup logging with a stdout handler (#9816) 2021-01-27 03:39:11 -05:00
Yusuke Mori
059bb25817
Fix a bug in run_glue.py (#9812) (#9815) 2021-01-26 14:32:19 -05:00
Magdalena Biesialska
8f6c12d306
Fix fine-tuning translation scripts (#9809) 2021-01-26 11:30:31 -05:00
Andrea Cappelli
10e5f28212
Improve pytorch examples for fp16 (#9796)
* Pad to 8x for fp16 multiple choice example (#9752)

* Pad to 8x for fp16 squad trainer example (#9752)

* Pad to 8x for fp16 ner example (#9752)

* Pad to 8x for fp16 swag example (#9752)

* Pad to 8x for fp16 qa beam search example (#9752)

* Pad to 8x for fp16 qa example (#9752)

* Pad to 8x for fp16 seq2seq example (#9752)

* Pad to 8x for fp16 glue example (#9752)

* Pad to 8x for fp16 new ner example (#9752)

* update script template #9752

* Update examples/multiple-choice/run_swag.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa_beam_search.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve code quality #9752

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-26 04:47:07 -05:00
Sylvain Gugger
caf4abf768
Auto-resume training from checkpoint (#9776)
* Auto-resume training from checkpoint

* Update examples/text-classification/run_glue.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Roll out to other examples

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 12:03:51 -05:00
Wilfried L. Bounsi
9152f16023
Fix broken [Open in Colab] links (#9761) 2021-01-23 15:11:46 +05:30
Sylvain Gugger
411c582109
Fixes to run_seq2seq and instructions (#9734)
* Fixes to run_seq2seq and instructions

* Add more defaults for summarization
2021-01-22 10:03:57 -05:00
Stefan Schweter
08b22722c7
examples: fix XNLI url (#9741) 2021-01-22 18:13:52 +05:30
Sylvain Gugger
5f80c15ef5
Fix memory regression in Seq2Seq example (#9713)
* Fix memory regression in Seq2Seq example

* Fix test and properly deal with -100

* Easier condition with device safety

* Patch for MBartTokenzierFast
2021-01-21 12:05:46 -05:00
Sylvain Gugger
582f516adb
Use datasets squad_v2 metric in run_qa (#9677) 2021-01-20 04:52:13 -05:00
Sylvain Gugger
a1ad16a446
Restrain tokenizer.model_max_length default (#9681)
* Restrain tokenizer.model_max_length default

* Fix indent
2021-01-20 04:17:39 -05:00
Sylvain Gugger
e4c06ed664
New run_seq2seq script (#9605)
* New run_seq2seq script

* Add tests

* Mark as slow

* Update examples/seq2seq/run_seq2seq.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/data/data_collator.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/data/data_collator.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Address review comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-01-19 15:22:17 -05:00
Sylvain Gugger
97b787fb4e
Fix old Seq2SeqTrainer (#9675) 2021-01-19 09:56:25 -05:00
Stas Bekman
c60e0e1ee4
deepspeed + grad acumm (#9622) 2021-01-15 10:12:26 -08:00
Sylvain Gugger
329fe2746a
Upstream (and rename) sortish sampler (#9574)
* Upstream (and rename) sortish sampler

* Use proper sampler

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-14 10:38:14 -05:00
Sylvain Gugger
46ed56cfd1
Switch metrics in run_ner to datasets (#9567)
* Switch metrics in run_ner to datasets

* Add flag to return all metrics

* Upstream (and rename) sortish_sampler

* Revert "Upstream (and rename) sortish_sampler"

This reverts commit e07d0dcf65.
2021-01-14 03:37:07 -05:00
Yusuke Mori
eabad8fd9c
Update run_glue for do_predict with local test data (#9442) (#9486)
* Update run_glue for do_predict with local test data (#9442)

* Update run_glue (#9442): fix comments ('files' to 'a file')

* Update run_glue (#9442): reflect the code review

* Update run_glue (#9442): auto format

* Update run_glue (#9442): reflect the code review
2021-01-13 07:48:35 -05:00
Pavel Tarashkevich
27d0e01d75
Fix classification script: enable dynamic padding with truncation (#9554)
Co-authored-by: Pavel Tarashkevich <Pavel.Tarashkievich@orange.com>
2021-01-13 07:46:48 -05:00
Stas Bekman
2df34f4aba
[trainer] deepspeed integration (#9211)
* deepspeed integration

* style

* add test

* ds wants to do its own backward

* fp16 assert

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* for clarity extract what args are being passed to deepspeed

* introduce the concept of self.wrapped_model

* s/self.wrapped_model/self.model_wrapped/

* complete transition to self.wrapped_model / self.model

* fix

* doc

* give ds its own init

* add custom overrides, handle bs correctly

* fix test

* clean up model_init logic, fix small bug

* complete fix

* collapse --deepspeed_config into --deepspeed

* style

* start adding doc notes

* style

* implement hf2ds optimizer and scheduler configuration remapping

* oops

* call get_num_training_steps absolutely when needed

* workaround broken auto-formatter

* deepspeed_config arg is no longer needed - fixed in deepspeed master

* use hf's fp16 args in config

* clean

* start on the docs

* rebase cleanup

* finish up --fp16

* clarify the supported stages

* big refactor thanks to discovering deepspeed.init_distributed

* cleanup

* revert fp16 part

* add checkpoint-support

* more init ds into integrations

* extend docs

* cleanup

* unfix docs

* clean up old code

* imports

* move docs

* fix logic

* make it clear which file it's referring to

* document nodes/gpus

* style

* wrong format

* style

* deepspeed handles gradient clipping

* easier to read

* major doc rewrite

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* docs

* switch to AdamW optimizer

* style

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* clarify doc

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 19:05:18 -08:00
Sylvain Gugger
3ec40299c1
Remove nested lxmert (#9440) 2021-01-07 04:10:41 -05:00
Sylvain Gugger
453a70d4cb
Allow example to use a revision and work with private models (#9407)
* Allow example to use a revision and work with private models

* Copy to other examples and template

* Styling
2021-01-06 06:49:23 -05:00
Patrick von Platen
eef66035a2
[PyTorch Bart] Split Bart into different models (#9343)
* first try

* remove old template

* finish bart

* finish mbart

* delete unnecessary line

* init pegasus

* save intermediate

* correct pegasus

* finish pegasus

* remove cookie cutter leftover

* add marian

* finish blenderbot

* replace in file

* correctly split blenderbot

* delete "old" folder

* correct "add statement"

* adapt config for tf comp

* correct configs for tf

* remove ipdb

* fix more stuff

* fix mbart

* push pegasus fix

* fix mbart

* more fixes

* fix research projects code

* finish docs for bart, mbart, and marian

* delete unnecessary file

* correct attn typo

* correct configs

* remove pegasus for seq class

* correct peg docs

* correct peg docs

* finish configs

* further improve docs

* add copied from statements to mbart

* fix copied from in mbart

* add copy statements to marian

* add copied from to marian

* add pegasus copied from

* finish pegasus

* finish copied from

* Apply suggestions from code review

* make style

* backward comp blenderbot

* apply lysandres and sylvains suggestions

* apply suggestions

* push last fixes

* fix docs

* fix tok tests

* fix imports code style

* fix doc
2021-01-05 22:00:05 +01:00
Yusuke Mori
57a6626929
[examples/text-classification] Fix a bug for using one's own dataset of a regression task (#9411) 2021-01-05 08:15:06 -05:00
dependabot[bot]
5dd389d1c7
Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/lxmert (#9402)
Bumps [notebook](https://github.com/jupyter/jupyterhub) from 6.1.4 to 6.1.5.
- [Release notes](https://github.com/jupyter/jupyterhub/releases)
- [Changelog](https://github.com/jupyterhub/jupyterhub/blob/master/CHECKLIST-Release.md)
- [Commits](https://github.com/jupyter/jupyterhub/commits)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-04 10:02:07 -05:00
Sylvain Gugger
23a71449c0
Put back LXMert example (#9401) 2021-01-04 09:59:07 -05:00
Sam Shleifer
8eb7f26d5d
simplify marian distillation script (#9394) 2021-01-04 11:21:24 +05:30
Yoshitomo Matsubara
d944966b19
Fix typos in README and bugs in RAG example code for end-to-end evaluation and finetuning (#9355)
* fix a bug in eval_batch_retrieval

* should return parser as well as other staticmethod

* remove duplicate argument

* these kwargs are no longer accepted (cause TypeError in self.generator.generate of modeling_rag.py)

* fixed file paths in README

* moved an arg to add_ray_specific_args
2021-01-03 16:00:30 +01:00
Sylvain Gugger
a1cb6e9866
Adapt to new name of label_smoothing_factor training arg (#9282) 2020-12-23 11:05:21 -05:00
Sylvain Gugger
e6c1f1cad8
Revert renaming in finetune_trainer (#9262) 2020-12-22 15:42:34 -05:00
Sylvain Gugger
ab17758874
Add speed metrics to all example scripts + template (#9260) 2020-12-22 14:02:26 -05:00
Manuel Romero
37d6fb5d04
Fix link to bertabs/README.md (#9255) 2020-12-22 11:41:23 -05:00
Manuel Romero
189c1b91a6
Fix link to old language modeling script (#9254) 2020-12-22 11:40:47 -05:00
Sylvain Gugger
490b39e614
Seq2seq trainer (#9241)
* Add label smoothing in Trainer

* Add options for scheduler and Adafactor in Trainer

* Put Seq2SeqTrainer in the main lib

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments and adapt scripts

* Documentation

* Move test not using script to tests folder

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-22 11:33:44 -05:00
Sylvain Gugger
ec07da65e2
Update the README of the text classification example (#9237)
* Update the README of the text classification example

* Update examples/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Adapt comment from review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-21 15:23:40 -05:00
Teven
4eef5889ac
Adding performer fine-tuning research exampke (#9239)
* added run_mlm_performer.py research example

* make styke

* make styke

* Added a README !
2020-12-21 21:19:41 +01:00
Amog Kamsetty
a4b21cdd20
[RAG] Add Ray implementation for distributed retrieval (#9197)
* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* uncomment

* uncomment

* wip

* updates

* add docstring

* updates

* fix arg

* fixes

* add unit tests

* update readme

* update readme

* update finetune script

* update test

* add test

* add ray to test dependencies

* separate ray and ray tune

* formatting

* shutdown ray at end of test

* fix tests

* formatting

* formatting

* even more formatting

* address comments

* formatting

* add files

* Update examples/research_projects/rag/test_distributed_retriever.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* address comments

* addressing comments

Co-authored-by: Ubuntu <ubuntu@ip-172-31-21-208.us-west-2.compute.internal>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-21 10:39:30 +01:00
Stas Bekman
f38c4ad302
better logging and help (#9203) 2020-12-20 10:28:28 -08:00
Stas Bekman
6b850b671d
[run_glue] add speed metrics (#9198)
* add speed metrics

* suggestions
2020-12-18 17:09:30 -08:00
Aleksey Tikhonov
291974c65c
GPT-model attention heads pruning example (#9189)
* Pruning for GPT attn heads

* The code formatted according to the transformers requirements

* Update run_prune_gpt.py

* Update run_prune_gpt.py
2020-12-18 16:32:10 -05:00
Sylvain Gugger
1198ba8fba
Add timing inside Trainer (#9196)
* Add timing inside Trainer

* Fix tests

* Add n_objs for train

* Sort logs
2020-12-18 15:10:39 -05:00
Sylvain Gugger
9a25c5bd3a
Add new run_swag example (#9175)
* Add new run_swag example

* Add check

* Add sample

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Very important change to make Lysandre happy

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-12-18 14:19:24 -05:00
Manuel Romero
077a5dce32
Fix link to old SQUAD fine-tuning script (#9181) 2020-12-18 09:12:10 -05:00
Wissam Antoun
fd7b6a5274
fixed JSON error in run_qa with fp16 (#9186) 2020-12-18 07:53:23 -05:00
Manuel Romero
66a14a2f6f
Fix link to old NER fine-tuning script (#9182) 2020-12-17 19:50:01 -05:00
Stas Bekman
f06d0fadc9
[trainer] apex fixes and tests (#9180) 2020-12-17 16:49:11 -08:00
Stas Bekman
63841c559b
add tests for the new sharded ddp fairscale integration (#9177) 2020-12-17 14:24:03 -08:00
Sylvain Gugger
9a67185344
Experimental support for fairscale ShardedDDP (#9139)
* Experimental stupport for fairscale ShardedDDP

* Add import error if fairscale not available

* Address review comments

* Fix seq2seq trainer
2020-12-16 13:47:48 -05:00
Sylvain Gugger
4d48973523
Update notebook table and transformers intro notebook (#9136) 2020-12-16 10:24:31 -05:00
Patrick von Platen
640e6fe190
[Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init (#9054)
* save intermediate

* save intermediate

* save intermediate

* correct flax bert model file

* new module / model naming

* make style

* almost finish BERT

* finish roberta

* make fix-copies

* delete keys file

* last refactor

* fixes in run_mlm_flax.py

* remove pooled from run_mlm_flax.py`

* fix gelu | gelu_new

* remove Module from inits

* splits

* dirty print

* preventing warmup_steps == 0

* smaller splits

* make fix-copies

* dirty print

* dirty print

* initial_evaluation argument

* declaration order fix

* proper model initialization/loading

* proper initialization

* run_mlm_flax improvements: improper model inputs bugfix + automatic dataset splitting + tokenizers parallelism warning + avoiding warmup_steps=0 bug

* removed tokenizers warning hack, fixed model re-initialization

* reverted training_args.py changes

* fix flax from pretrained

* improve test in flax

* apply sylvains tips

* update init

* make 0.3.0 compatible

* revert tevens changes

* revert tevens changes 2

* finalize revert

* fix bug

* add docs

* add pretrained to init

* Update src/transformers/modeling_flax_utils.py

* fix copies

* final improvements

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-12-16 13:03:32 +01:00
Teven
2a7e8e1608
[Examples] Add automatic dataset splitting in language-modeling examples (#9133)
* replaced jnp.split + removing textual model inputs + ensuring warmup_steps > 0

* Add automatic dataset splitting in language-modeling examples
2020-12-15 16:02:43 -05:00
Stas Bekman
14c79c3e31
native amp leak fix landed in 1.7.1 (#9115)
update README with good news that the leak fix has been applied to pytorch-1.7.1.
2020-12-15 09:10:41 -05:00
Yoshitomo Matsubara
44c340f45f
fix a bug in eval_batch_retrieval (#9089) 2020-12-15 14:46:55 +01:00
Stas Bekman
c19d04623e
[finetune_trainer] enhancements and fixes (#9042)
* trainer and finetune_trainer enhancements and fixes

* add fallback default

* move the fixing of incorrect keys back into finetune trainer

* s/eval/val/ to match the split

* trainer can now use a different prefix than eval_ for metrics

* document new arg

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* use 'eval' as the default for metric_key_prefix

* complete adjust var names + disambiguate

* fix logger

* add clarifying comment

* add clarifying comment

* style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/trainer.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* complete removal of optional for metric_key_prefix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-14 17:45:33 -08:00
Sylvain Gugger
29e4597950
Fix min_null_pred in the run_qa script (#9067) 2020-12-11 16:26:05 -05:00
dependabot[bot]
24f6cdeab6
Bump notebook in /examples/research_projects/movement-pruning/lxmert (#9062)
Bumps [notebook](https://github.com/jupyter/jupyterhub) from 6.1.4 to 6.1.5.
- [Release notes](https://github.com/jupyter/jupyterhub/releases)
- [Changelog](https://github.com/jupyterhub/jupyterhub/blob/master/CHECKLIST-Release.md)
- [Commits](https://github.com/jupyter/jupyterhub/commits)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2020-12-11 10:32:43 -05:00
Sylvain Gugger
783d7d2629
Reorganize examples (#9010)
* Reorganize example folder

* Continue reorganization

* Change requirements for tests

* Final cleanup

* Finish regroup with tests all passing

* Copyright

* Requirements and readme

* Make a full link for the documentation

* Address review comments

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add symlink

* Reorg again

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Adapt title

* Update to new strucutre

* Remove test

* Update READMEs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-12-11 10:07:02 -05:00
NatLun137
91ab02af28
Fix typo #9012 (#1) (#9038)
There is a tiny typo in the code "transformers/examples/language-modeling/run_mlm_wwm.py" at line 284. [Details.](https://github.com/huggingface/transformers/issues/9012)
2020-12-10 16:41:00 -05:00
Funtowicz Morgan
75627148ee
Flax Masked Language Modeling training example (#8728)
* Remove "Model" suffix from Flax models to look more 🤗

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Initial working (forward + backward) for Flax MLM training example.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Simply code

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing comments, using module and moving to LM task.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Restore parameter name "module" wrongly renamed model.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Restore correct output ordering...

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Actually commit the example 😅

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add FlaxBertModelForMaskedLM after rebasing.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make it possible to initialize the training from scratch

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Reuse flax linen example of cross entropy loss

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added specific data collator for flax

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove todo for data collator

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added evaluation step

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added ability to provide dtype to support bfloat16 on TPU

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable flax tensorboard output

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable jax.pmap support.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Ensure batches are correctly sized to be dispatched with jax.pmap

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable bfloat16 with --fp16 cmdline args

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correctly export metrics to tensorboard

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added dropout and ability to use it.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Effectively enable & disable during training and evaluation steps.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Oops.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable specifying kernel initializer scale

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added warmup step to the learning rate scheduler.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix typo.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Print training loss

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix linter issue (flake8)

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix model matching

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix dummies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix non default dtype on Flax models

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use the same create_position_ids_from_input_ids for FlaxRoberta

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make Roberta attention as Bert

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix copy

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Wording.

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
2020-12-09 17:13:56 +01:00
Sylvain Gugger
447808c85f
New squad example (#8992)
* Add new SQUAD example

* Same with a task-specific Trainer

* Address review comment.

* Small fixes

* Initial work for XLNet

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Final clean up and working XLNet script

* Test and debug

* Final working version

* Add new SQUAD example

* Same with a task-specific Trainer

* Address review comment.

* Small fixes

* Initial work for XLNet

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Final clean up and working XLNet script

* Test and debug

* Final working version

* Add tick

* Update README

* Address review comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-08 14:39:29 -05:00
Sylvain Gugger
00aa9dbca2
Copyright (#8970)
* Add copyright everywhere missing

* Style
2020-12-07 18:36:34 -05:00
Sylvain Gugger
62d30e0583
Small fix to the run clm script (#8973) 2020-12-07 17:32:09 -05:00
Sylvain Gugger
7f9ccffc5b
Use word_ids to get labels in run_ner (#8962)
* Use word_ids to get labels in run_ner

* Add sanity check
2020-12-07 14:26:36 -05:00
Ethan Perez
8dfc8c7221
Don't pass in token_type_ids to BART for GLUE (#8929)
Without this fix, training a `BARTForSequenceClassification` model with `run_pl_glue.py` gives `TypeError: forward() got an unexpected keyword argument 'token_type_ids'`, because BART does not have token_type_ids. I've solved this issue in the same way as it's solved for the "distilbert" model, and I can train BART models on SNLI without errors now.
2020-12-05 09:52:16 -05:00
Stas Bekman
df311a5ccf
[seq2seq] document the caveat of leaky native amp (#8930)
* document the caveat of leaky native amp

* Update examples/seq2seq/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-04 15:43:35 -08:00
Stas Bekman
4c3d98dddc
[s2s finetune_trainer] add instructions for distributed training (#8884) 2020-12-03 16:05:55 -08:00
Stas Bekman
379005c9d2
start using training_args.parallel_mode (#8882) 2020-12-01 11:40:36 -08:00
Stas Bekman
7f34d75780
[s2s trainer] fix DP mode (#8823)
* fix DP case on multi-gpu

* make executable

* test all 3 modes

* use the correct check for distributed

* dp doesn't need a special case

* restore original name

* cleanup
2020-11-30 12:55:56 -08:00
Sylvain Gugger
5530299096
Remove deprecated evalutate_during_training (#8852)
* Remove deprecated `evalutate_during_training`

* Update src/transformers/training_args_tf.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-30 11:12:15 -05:00
Stefan Schweter
19fa01ce2a
token-classification: use is_world_process_zero instead of deprecated is_world_master() (#8828) 2020-11-30 09:21:56 -05:00
Stas Bekman
ddf3c64654
potpurri of small fixes (#8807) 2020-11-26 14:06:27 -08:00
chutaklee
52708d2637
Fix PPLM (#8779)
* Fix pplm

* fix style

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-26 22:23:36 +01:00
Patrick von Platen
8f07f5c44b
Revert "finetune.py: specifying generation min_length (#8478)" (#8805)
This reverts commit 5aa361f3e5.
2020-11-26 20:12:01 +01:00
Daniel Khashabi
5aa361f3e5
finetune.py: specifying generation min_length (#8478) 2020-11-26 12:33:02 +05:30
Stas Bekman
82d443a7fd
[core] implement support for run-time dependency version checking (#8645)
* implement support for run-time dependency version checking

* try not escaping !

* use findall that works on py36

* small tweaks

* autoformatter worship

* simplify

* shorter names

* add support for non-versioned checks

* add deps

* revert

* tokenizers not required, check version only if installed

* make a proper distutils cmd and add make target

* tqdm must be checked before tokenizers

* workaround the DistributionNotFound peculiar setup

* handle the rest of packages in setup.py

* fully sync setup.py's install_requires - to check them all

* nit

* make install_requires more readable

* typo

* Update setup.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* restyle

* add types

* simplify

* simplify2

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-24 13:22:25 -05:00
Quentin Lhoest
a7d73cfdd4
fix rag index names in eval_rag.py example (#8730) 2020-11-24 17:04:47 +01:00
zhiheng-huang
2c83b3c38d
Support various BERT relative position embeddings (2nd) (#8276)
* Support BERT relative position embeddings

* Fix typo in README.md

* Address review comment

* Fix failing tests

* [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py

* make fix copies

* fix configs of electra and albert and fix longformer

* remove copy statement from longformer

* fix albert

* fix electra

* Add bert variants forward tests for various position embeddings

* [tiny] Fix style for test_modeling_bert.py

* improve docstring

* [tiny] improve docstring and remove unnecessary dependency

* [tiny] Remove unused import

* re-add to ALBERT

* make embeddings work for ALBERT

* add test for albert

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-24 14:40:53 +01:00
Sylvain Gugger
367f497dec
Fix max length in run_plm script (#8738) 2020-11-23 16:02:31 -05:00
Stas Bekman
1e45bef0a7
[trainer] make generate work with multigpu (#8716)
* make generate work with multigpu

* better fix - thanks @sgugger
2020-11-23 10:57:27 -08:00
Santiago Castro
e1f3156b21
Fix many typos (#8708) 2020-11-21 22:58:10 -05:00
Quentin Lhoest
8062fa63c5
Fix rag finetuning + add finetuning test (#8585)
* replace init_ddp_connection for index init

* style

* add finetune test

* add test data

* move generate tensors to device

* add test on EM metric

* style

* allow multi process test

* keep gloo process group for retrieval

* add multi-gpu test

* use custom accelerator

* clean test finetune

* minor

* style

* style

* typo

* use python call instead of imported main fumction

* return_dict fix in modeling_rag

* use float32 in retrieval

* store as float32 as well in the custom knowledge dataset example

* style

* rename to finetune_rag

* style

* update readme

* rename utils and callbacks to utils_rag and callbacks_rag

* fix test

* patrick's comments

* generate dummy data in the finetue test script

* remove dummy data files

* style
2020-11-20 19:05:03 +01:00
Stas Bekman
0ad45e108d
[examples/seq2seq] fix PL deprecation warning (#8577)
* fix deprecation warning

* fix
2020-11-19 21:46:04 +01:00
Sylvain Gugger
20b658607e
Fix run_ner script (#8664)
* Fix run_ner script

* Pin datasets
2020-11-19 13:59:30 -05:00
Sylvain Gugger
cb3e5c33f7
Fix a few last paths for the new repo org (#8666) 2020-11-19 11:56:42 -05:00
Matthias
a79a96ddaa
fix small typo (#8644)
Fixed a small typo on the XLNet and permutation language modelling section
2020-11-19 11:24:11 -05:00
Sylvain Gugger
4208f496ee
Better filtering of the model outputs in Trainer (#8633)
* Better filtering of the model outputs in Trainer

* Fix examples tests

* Add test for Lysandre
2020-11-19 10:43:15 -05:00
Quentin Lhoest
62cd9ce9f8
fix missing return dict (#8653) 2020-11-19 15:17:18 +01:00
Tim Isbister
28d16e7ac5
Update README.md (#8635) 2020-11-18 18:35:23 -05:00
Stas Bekman
d86d57faa3
[s2s] distillation apex breaks return_dict obj (#8631)
* apex breaks return_dict obj

* style
2020-11-18 12:51:29 -08:00
Sylvain Gugger
a0c62d2493
Fix training from scratch in new scripts (#8623) 2020-11-18 12:15:26 -05:00
Stas Bekman
cdf1b7ae82
fix to adjust for #8530 changes (#8612) 2020-11-18 10:25:00 -05:00
Stas Bekman
2819da02f7
[s2s] broken test (#8613) 2020-11-18 10:15:53 -05:00
Sylvain Gugger
dd52804f5f
Remove deprecated (#8604)
* Remove old deprecated arguments

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Remove needless imports

* Fix tests

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-17 15:11:29 -05:00
Stas Bekman
f0435f5a61
these should run fine on multi-gpu (#8582) 2020-11-17 14:00:41 -05:00
Julien Chaumond
042a6aa777
Tokenizers: ability to load from model subfolder (#8586)
* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-11-17 08:58:45 -05:00
Sylvain Gugger
c89bdfbe72
Reorganize repo (#8580)
* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command
2020-11-16 21:43:42 -05:00
Sylvain Gugger
1073a2bde5
Switch return_dict to True by default. (#8530)
* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests
2020-11-16 11:43:00 -05:00
Thomas Wolf
f4e04cd2c6
[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)
* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-15 22:50:59 +01:00
Julien Plu
27b3ff316a
Try to understand and apply Sylvain's comments (#8458) 2020-11-12 13:43:00 -05:00
zeyuyun1
924c624a46
quick fix on concatenating text to support more datasets (#8474) 2020-11-12 09:47:08 -05:00
Sumithra Bhakthavatsalam
81ebd70671
[s2s] distill t5-large -> t5-small (#8376)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-11 17:58:45 -05:00
sarnoult
a38d1c7c31
Example NER script predicts on tokenized dataset (#8468)
The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`
2020-11-11 10:28:23 -05:00
Stas Bekman
02bdfc0251
using multi_gpu consistently (#8446)
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc
2020-11-10 13:23:58 -05:00
Stas Bekman
5d4972e608
[examples] better PL version check (#8429) 2020-11-10 09:33:23 -05:00
Shichao Sun
ae1cb4ec22
[s2s/distill] hparams.tokenizer_name = hparams.teacher (#8382) 2020-11-10 09:32:01 -05:00
Julien Chaumond
55e8d0cea2 Update links from s3 to huggingface.co 2020-11-10 14:03:29 +01:00
Stas Bekman
190df58560
[github CI] add a multi-gpu job for all example tests (#8341)
* add a multi-gpu job for all example tests

* run only ported tests

* rename

* explain why env is re-activated on each step

* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me

* style

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-09 15:47:38 -05:00
Patrick von Platen
9c83b96e62
[Tests] Add Common Test for Training + Fix a couple of bugs (#8415)
* add training tests

* correct longformer

* fix docs

* fix some tests

* fix some more train tests

* remove ipdb

* fix multiple edge case model training

* fix funnel and prophetnet

* clean gpt models

* undo renaming of albert
2020-11-09 18:24:41 +01:00
Sylvain Gugger
5c766ecb50 Fix typo 2020-11-09 11:50:51 -05:00
Sylvain Gugger
908a28894c
Add new token classification example (#8340)
* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-11-09 11:39:55 -05:00
Sam Shleifer
ebde57acac
examples/docs: caveat that PL examples don't work on TPU (#8309) 2020-11-09 08:55:22 -05:00
Sam Shleifer
e6d9cdaafe
[s2s/distill] remove run_distiller.sh, fix xsum script (#8412) 2020-11-08 16:57:43 -05:00
Stas Bekman
66582492d3
[s2s test_finetune_trainer] failing multigpu test (#8400) 2020-11-08 16:45:40 -05:00
Stas Bekman
f62755a600
[s2s examples test] fix data path (#8398) 2020-11-08 16:44:18 -05:00
Jonathan Chang
5807ba3fa9
Fix typo (#8351) 2020-11-06 11:19:41 -05:00
Stas Bekman
9edafaebef
[s2s] test_bash_script.py - actually learn something (#8318)
* use decorator

* remove hardcoded paths

* make the test use more data and do real quality tests

* shave off 10 secs

* add --eval_beams 2, reformat

* reduce train size, use smaller custom dataset
2020-11-05 23:15:14 -05:00
Leandro von Werra
17450397a7
Docs bart training ref (#8330)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-05 17:20:57 -05:00
Stas Bekman
d787935a14
[s2s] test_distributed_eval (#8315)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-05 16:01:15 -05:00
Sam Shleifer
7abc1d96d1
no warn (#8329) 2020-11-05 11:42:24 -05:00
Bobby Donchev
52f44dd6d2
change TokenClassificationTask class methods to static methods (#7902)
* change TokenClassificationTask class methods to static methods

Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.

Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.

* Trigger Build

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-11-05 09:38:30 -05:00
Guillem García Subies
77c8f6c627
Corrected typo in readme (#8320) 2020-11-05 07:48:36 -05:00
Sylvain Gugger
9c4aa4ac1a
Clean up data collators and datasets (#8308)
* Clean up data collators and datasets

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-04 17:24:49 -05:00
Manuel Romero
b1d3e95eb5
Fix path to old run_language_modeling.py script (#8302) 2020-11-04 13:17:57 -05:00
Sylvain Gugger
cf89724696
Fix validation file loading in scripts (#8298) 2020-11-04 10:42:18 -05:00
Pengzhi Gao
734afa37f6
Fix typo in language-modeling README.md (#8287) 2020-11-04 09:38:02 -05:00
Stas Bekman
1bb4bba53c
[CIs] Better reports everywhere (#8275)
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added

* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts

* add `pytest --make-reports` to all CIs (and artifacts)

* fix
2020-11-03 16:57:12 -05:00
Patrick von Platen
068e6b5edd
make files independent (#8267) 2020-11-03 21:13:33 +01:00
Stas Bekman
cd360dcb26
[examples] minimal version requirement run-time check in PL (#8133)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-03 13:17:11 -05:00
Lysandre
eb6313e823 Fix Tatoeba skip 2020-11-03 10:35:00 -05:00
Sam Shleifer
b63beb743c
Skip tatoeba tests if Tatoeba-Challenge not cloned (#8260) 2020-11-03 09:49:29 -05:00
Patrick von Platen
9f1747f999
[Seq2Seq] Correct import in Seq2Seq Trainer (#8254) 2020-11-03 07:56:41 -05:00
Sylvain Gugger
e1b1b614b1
Add line by line option to mlm/plm scripts (#8240)
* Make line by line optional in run_mlm

* Add option to disable dynamic padding

* Add option to plm too and update README

* Typos

* More typos

* Even more typos

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-02 12:27:04 -05:00
Patrick von Platen
9bd30f7cf4
[Seq2SeqTrainer] Move import to init to make file self-contained (#8194)
* boom boom

* reverse order
2020-11-01 23:31:55 +01:00
Sylvain Gugger
9eb3a410cd
Remove deprecated arguments from new run_clm (#8197) 2020-10-30 15:27:20 -04:00
Sylvain Gugger
cdc48ce92d
Finalize lm examples (#8188)
* Finish the cleanup of the language-modeling examples

* Update main README

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Propagate changes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-30 14:20:18 -04:00
wlhgtc
9a21b50614
Fix eval ref miss in Chinese WWM. (#8115)
* ADD: add whole word mask proxy for both eng and chinese

* MOD: adjust format

* MOD: reformat code

* MOD: update import

* MOD: fix bug

* MOD: add import

* MOD: fix bug

* MOD: decouple code and update readme

* MOD: reformat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change wwm to whole_word_mask

* reformat code

* reformat

* format

* Code quality

* ADD: update chinese ref readme

* MOD: small changes

* MOD: small changes2

* update readme

* fix eval ref file miss bug

* format file

* MOD: move ref code to contrib

* MOD: add delimeter check

* reformat code

* refomat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-29 17:08:39 -04:00
Sylvain Gugger
691176283d
Add a template for examples and apply it for mlm and plm examples (#8153)
* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Styling
2020-10-29 13:38:11 -04:00
Sam Shleifer
49e4fece5c
[s2s] distillBART docs for paper replication (#8150) 2020-10-29 12:01:15 -04:00
Sylvain Gugger
acf56408d8
Smarter prediction loop and no- -> no_ in console args (#8151)
* Smarter prediction loop and no- -> no_ in console args

* Fix test
2020-10-29 10:56:25 -04:00
Santiago Castro
969859d5f6
Fix doc errors and typos across the board (#8139)
* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes
2020-10-29 10:33:33 -04:00
Stas Bekman
825925dfaa
[s2s test] cleanup (#8131) 2020-10-28 16:50:36 -04:00
Sean Naren
5e24982e58
Upgrade PyTorch Lightning to 1.0.2 (#7852)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-28 14:59:14 -04:00
Sylvain Gugger
378142afdf
Rename add_start_docstrings_to_callable (#8120) 2020-10-28 13:42:31 -04:00
Stas Bekman
5423f2a9d4
[testing] port test_trainer_distributed to distributed pytest + TestCasePlus enhancements (#8107)
* move the helper code into testing_utils

* port test_trainer_distributed to work with pytest

* improve docs

* simplify notes

* doc

* doc

* style

* doc

* further improvements

* torch might not be available

* real fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-28 11:51:32 -04:00
Sylvain Gugger
47dfa65b0c
New run_clm script (#8105)
* New run_clm script

* Formatting

* More comments

* Remove unused imports

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Change link to the hub

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-28 10:38:58 -04:00
Sylvain Gugger
1e01db3579 Remove header 2020-10-27 17:36:13 -04:00
Sylvain Gugger
b715e40ced Fix typo 2020-10-27 17:34:05 -04:00
Sylvain Gugger
41cc5f3f59
Move installation instructions to the top (#8106) 2020-10-27 17:32:20 -04:00
Stas Bekman
bfd5e370a7
[CI] generate separate report files as artifacts (#7995)
* better reports

* a whole bunch of reports in their own files

* clean up

* improvements

* github artifacts experiment

* style

* complete the report generator with multiple improvements/fixes

* fix

* save all reports under one dir to easy upload

* can remove temp failing tests

* doc fix

* some cleanup
2020-10-27 09:25:07 -04:00
Patrick von Platen
664c7ec453
[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token (#8043)
* make sure padding is implemented for non-padding tokens models as well

* add better error message

* add better warning

* remove results files

* Update examples/seq2seq/seq2seq_trainer.py

* remove unnecessary copy line

* correct usage of labels

* delete test files
2020-10-26 17:28:16 +01:00
mohammadreza-Banaei73
098ddc2244
Update README.md (#8050)
--wwm cant be used as an argument given run_language_modeling.py and should be changed to --whole_word_mask
2020-10-26 12:00:18 -04:00
suliuzh
20a0894d1a
update version for scipy (#7998) 2020-10-26 08:56:56 -04:00
Patrick von Platen
3c682ea15c
[Examples] Allow EncoderDecoderModels to be trained with Seq2Seq (#7809)
* Make Seq2Seq Trainer more similar to Trainer

* fix typo

* fix seq2seq trainer

* remove from tests

* remove lock

* remove train files

* delete test files

* correct typo

* check at init

* make sure trainer is not slowed down on TPU

* correct isort

* remove use cache

* fix use cache

* add last use chache = false
2020-10-23 23:05:51 +02:00
Ethan Perez
d39da5a2ab
Handling longformer model_type (#7990)
Updating the run_squad training script to handle the "longformer" `model_type`. The longformer is trained in the same was as RoBERTa, so I've added the "longformer" `model_type` (that's the right hugginface name for the LongFormer model, right?) everywhere there was a "roberta" `model_type` reference. The longformer (like RoBERTa) doesn't use `token_type_ids` (as I understand from looking at the [longformer notebook](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb), which is what gets updated after this change.

This fix might be related to [this issue](https://github.com/huggingface/transformers/issues/7249) with SQuAD training when using run_squad.py
2020-10-23 10:34:06 -04:00
Lalit Pagaria
88b3a91e61
Handle the case when title is None (#7941) 2020-10-23 15:54:45 +02:00
Stas Bekman
023f0f3708
[s2s trainer] tests to use distributed on multi-gpu machine (#7965) 2020-10-22 17:26:22 -04:00
Sylvain Gugger
2e5052d4f1
New run glue script (#7917)
* Start simplification

* More progress

* Finished script

* Address comments and update tests instructions

* Wrong test

* Accept files as inputs and fix test

* Update src/transformers/trainer_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Fix labels and add combined score

* Add special labels

* Update TPU command

* Revert to old label strategy

* Use model labels

* Fix for STT-B

* Styling

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Code styling

* Fix review comments

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-22 11:42:22 -04:00
wlhgtc
a16e568f22
# Add whole word mask support for lm fine-tune (#7925)
* ADD: add whole word mask proxy for both eng and chinese

* MOD: adjust format

* MOD: reformat code

* MOD: update import

* MOD: fix bug

* MOD: add import

* MOD: fix bug

* MOD: decouple code and update readme

* MOD: reformat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change wwm to whole_word_mask

* reformat code

* reformat

* format

* Code quality

* ADD: update chinese ref readme

* MOD: small changes

* MOD: small changes2

* update readme

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-10-22 09:19:00 -04:00
Stas Bekman
8b38173398
[seq2seq testing] multigpu test run via subprocess (#7281)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-21 17:20:53 -04:00
Stas Bekman
0e24e4c136
[s2s] create doc for pegasus/fsmt replication (#7934) 2020-10-20 15:07:52 -04:00
Stas Bekman
3e31e7f956
[testing] rename skip targets + docs (#7863)
* rename skip targets + docs

* fix quotes

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small improvements

* fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-20 04:39:13 -04:00
Quentin Lhoest
033f29c625
Allow Custom Dataset in RAG Retriever (#7763)
* add CustomHFIndex

* typo in config

* update tests

* add custom dataset example

* clean script

* update test data

* minor in test

* docs

* docs

* style

* fix imports

* allow to pass the indexed dataset directly

* update tests

* use multiset DPR

* address thom and patrick's comments

* style

* update dpr tokenizer

* add output_dir flag in use_own_knowledge_dataset.py

* allow custom datasets in examples/rag/finetune.py

* add test for custom dataset in distributed rag retriever
2020-10-19 19:42:45 +02:00
Thomas Wolf
ba8c4d0ac0
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉

* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-18 20:51:24 +02:00
Stas Bekman
9f7b2b2432
[s2s testing] turn all to unittests, use auto-delete temp dirs (#7859) 2020-10-17 14:33:21 -04:00
Stas Bekman
1652ddad35
[seq2seq testing] improve readability (#7845) 2020-10-16 09:05:29 -04:00
Quentin Lhoest
466115b279
Fix missing reference titles in retrieval evaluation of RAG (#7817) 2020-10-16 10:15:49 +02:00
Stas Bekman
464b53f5e4
[testing] disable FutureWarning in examples tests (#7842)
* [testing] disable FutureWarning in examples tests

same as tests/conftest.py, we can't resolve those warning, so turn the noise off.

* fix
2020-10-16 03:35:39 -04:00
Sam Shleifer
96e47d9229
[cleanup] assign todos, faster bart-cnn test (#7835)
* 2 beam output

* unassign/remove TODOs

* remove one more
2020-10-16 03:11:18 -04:00
Stas Bekman
2255c2c7a0
[seq2seq] get_git_info fails gracefully (#7843)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-16 00:22:43 -04:00
Lysandre
2485b8b0ac Set XLA example time to 500s 2020-10-15 12:34:29 +02:00
Sylvain Gugger
bb9559a7f9
Don't use store_xxx on optional bools (#7786)
* Don't use `store_xxx` on optional bools

* Refine test

* Refine test
2020-10-14 12:05:02 -04:00
Sylvain Gugger
a1d1b332d0
Add predict step accumulation (#7767)
* Add eval_accumulation_step and clean distributed eval

* Add TPU test

* Add TPU stuff

* Fix arg name

* Fix Seq2SeqTrainer

* Fix total_size

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc and add test to TPU

* Add unit test

* Adapt name

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-14 11:41:45 -04:00
Sam Shleifer
8feb0cc967
fix examples/rag imports, tests (#7712) 2020-10-14 11:35:00 -04:00
Tiger
7e73c12805
fixed lots of typos. (#7758) 2020-10-13 10:00:20 -04:00
Sam Shleifer
9c2b2db2cd
[marian] Automate Tatoeba-Challenge conversion (#7709) 2020-10-12 12:24:25 -04:00
Julien Plu
d9ffb87efb
Fix tf text class (#7724)
* Fix test

* fix generic text classification

* fix test

* Fix tests
2020-10-12 08:45:15 -04:00
sgugger
d6175a4268 Fix code quality 2020-10-12 08:22:27 -04:00
Kelvin
f176e70723
The input training data files (multiple files in glob format). (#7717)
Very often splitting large files to smaller files can prevent tokenizer going out of memory in environment like Colab that does not have swap memory
2020-10-12 07:44:02 -04:00
Sam Shleifer
827c519494
[examples] bump pl=0.9.0 (#7053) 2020-10-11 16:39:38 -04:00
Julien Plu
9ad830596d
Fix dataset cardinality (#7678)
* Fix test

* Fix cardinality issue

* Fix test
2020-10-09 10:38:25 -04:00
Sam Shleifer
297233fa92
[s2s] Switch README urls to cdn (#7670) 2020-10-08 21:22:22 -04:00
Sam Shleifer
a1ecc90d6b
[pseudo] Switch URLS to CDN (#7661) 2020-10-08 14:12:39 -04:00
Suraj Patil
06a973fd2a
[s2s] configure lr_scheduler from command line (#7641) 2020-10-08 13:06:35 -04:00
Sam Shleifer
aba4e22944
[pseudolabels] cleanup markdown table (#7653) 2020-10-07 23:04:18 -04:00
Sam Shleifer
e2bb9abb6a
[s2s] release pseudolabel links and instructions (#7639) 2020-10-07 11:20:44 -04:00
Sylvain Gugger
08ba4b4902
Trainer callbacks (#7596)
* Initial callback proposal

* Finish various callbacks

* Post-rebase conflicts

* Fix tests

* Don't use something that's not set

* Documentation

* Remove unwanted print.

* Document all models can work

* Add tests + small fixes

* Update docs/source/internal/trainer_utils.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Fix TF tests

* Real fix this time

* This one should work

* Fix typo

* Really fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-07 10:50:21 -04:00
Sam Shleifer
500be01c5d
[s2s] save first batch to json for debugging purposes (#6810) 2020-10-06 16:11:56 -04:00
Sam Shleifer
d5d2744aa7
Support T5 Distillation w/hidden state supervision (#7599) 2020-10-05 21:31:48 -04:00
Suraj Patil
99cb924bfb
[s2s] add config params like Dropout in Seq2SeqTrainingArguments (#7532) 2020-10-04 12:42:30 -04:00
Sam Shleifer
9bdce3a4f9
[s2s] fix lockfile and peg distillation constants (#7545) 2020-10-02 15:58:14 -04:00
Sam Shleifer
de4d7b004a
[s2s] Adafactor support for builtin trainer (#7522) 2020-10-01 17:27:45 -04:00
Sam Shleifer
d3a9601a11
[s2s] trainer scripts: Remove --run_name, thanks sylvain! (#7521) 2020-10-01 17:18:47 -04:00
Sylvain Gugger
bdcc4b78a2
Fix seq2seq example test (#7518)
* Fix seq2seq example test

* Fix bad copy-paste

* Also save the state
2020-10-01 14:13:29 -04:00
Sam Shleifer
2a358f45ef
[s2s] fix nltk pytest race condition with FileLock (#7515) 2020-10-01 12:51:09 -04:00
Suraj Patil
72d363d979
[examples/s2s] clean up finetune_trainer (#7509) 2020-10-01 12:19:29 -04:00
Sam Shleifer
48f23f92a8
[s2sTrainer] test + code cleanup (#7467) 2020-10-01 00:33:01 -04:00
Julien Chaumond
0acd1ffa09
[doc] rm Azure buttons as not implemented yet 2020-09-30 17:31:08 -04:00
Sam Shleifer
03e46c1de3
[s2s] fix kwargs style (#7488) 2020-09-30 17:00:06 -04:00
Sam Shleifer
6fe8a693eb
[s2s] Fix t5 warning for distributed eval (#7487) 2020-09-30 16:58:03 -04:00
Amanpreet Singh
c031d01023
Seq2SeqDataset: avoid passing src_lang everywhere (#7470)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-30 13:27:48 -04:00
Suraj Patil
08939cfdf7
[s2strainer] fix eval dataset loading (#7477) 2020-09-30 12:39:13 -04:00
Sam Shleifer
74d8d69bd4
[s2s] consistent output format across eval scripts (#7435) 2020-09-28 23:20:03 -04:00
Ola Piktus
ae3e84f3ba
[RAG] Clean Rag readme in examples (#7413)
* Improve README + consolidation script

* Reformat README

* Reformat README

Co-authored-by: Your Name <you@example.com>
2020-09-28 10:06:39 +02:00
Sam Shleifer
748425d47d
[T5] allow config.decoder_layers to control decoder size (#7409)
* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
2020-09-28 03:08:04 -04:00
Sam Shleifer
7296fea1d6
[s2s] rougeLSum expects \n between sentences (#7410)
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
2020-09-27 16:27:19 -04:00
Suraj Patil
eab5f59682
[s2s] add create student script (#7290)
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-27 15:10:46 -04:00
Ola Piktus
fe326bd5cf
Remove dependency on examples/seq2seq from rag (#7395)
Co-authored-by: Your Name <you@example.com>
2020-09-25 18:20:49 +02:00
Quentin Lhoest
cf1c88e092
[RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372)
* Fix retrieval offset in RAG's HfIndex

* update slow tests

* style

* fix new test

* style

* add better tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-25 16:12:46 +02:00
Suraj Patil
415071b4c2
doc changes (#7385) 2020-09-25 08:00:36 -04:00
Patrick von Platen
2dd652d757
[RAG] Add missing doc and attention_mask to rag (#7382)
* add docs

* add missing docs and attention_mask in fine-tune
2020-09-25 11:23:55 +02:00
Suraj Patil
9e68d075a4
Seq2SeqTrainer (#6769)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 18:46:58 -04:00
Sam Shleifer
d9d0f1140b
[s2s] distributed eval allows num_return_sequences > 1 (#7254) 2020-09-24 17:30:09 -04:00
Patrick von Platen
0804d077c6
correct attention mask (#7373) 2020-09-24 23:22:04 +02:00
Stas Bekman
eadd870b2f
[seq2seq] make it easier to run the scripts (#7274) 2020-09-24 15:23:48 -04:00
Felipe Curti
d266613635
[Benchmarks] Change all args to from no_... to their positive form (#7075)
* Changed name to all no_... arguments and all references to them, inverting the boolean condition

* Change benchmark tests to use new Benchmark Args

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/benchmark/benchmark.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix Style. Add --no options in help

* fix some part of tests

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* fix all tests

* make style

* add backwards compability

* make backwards compatible

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>
2020-09-23 13:25:24 -04:00
Sam Shleifer
78387cc63e
[s2s] only save metrics.json from rank zero (#7331) 2020-09-22 18:27:28 -04:00
Sam Shleifer
e53138a1b9
[s2s] add src_lang kwarg for distributed eval (#7300) 2020-09-22 18:26:37 -04:00
Sam Shleifer
25b0463d0b
[s2s] add supported architecures to MD (#7252) 2020-09-22 13:09:35 -04:00
Ola Piktus
c754c41c61
RAG (#6813)
* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867

* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name

Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-09-22 18:29:58 +02:00
Julien Plu
585217c87f
Add generic text classification example in TF (#5716)
* Add new example with nlp

* Update README

* replace nlp by datasets

* Update examples/text-classification/README.md

Add Lysandre's suggestion.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 12:05:05 -04:00
Sam Shleifer
656c27c3a3
[s2s] save hostname with repo info (#7301)
* save hostname
2020-09-21 17:26:24 -04:00
Stas Bekman
af4b98ed97
[s2s] adjust finetune + test to work with fsmt (#7263) 2020-09-21 15:13:19 -04:00
Stas Bekman
8d562a2d1a
[s2s] s/alpha_loss_encoder/alpha_encoder_loss/ (#7298)
fix to match `distillation.py:        self.alpha_encoder_loss`
2020-09-21 14:14:26 -04:00
Stas Bekman
cbb2f75a16
[s2s tests] fix test_run_eval_search (#7297) 2020-09-21 14:00:40 -04:00
Lysandre
aae4edb5f0 Addressing review comment 2020-09-21 11:37:00 +02:00
Suraj Patil
43b9d93875
[example/glue] fix compute_metrics_fn for bart like models (#7248)
* fix compute_metrics_fn

* p.predictions -> preds

* apply suggestions
2020-09-21 05:34:20 -04:00
Stas Bekman
7cbf0f722d
examples/seq2seq/__init__.py mutates sys.path (#7194) 2020-09-20 16:54:42 -04:00
Sam Shleifer
83dba10b8f
[s2s] distributed_eval.py saves better speed info (#7242) 2020-09-18 15:46:01 -04:00
Stefan Schweter
ee9eae4e06
token-classification: update url of GermEval 2014 dataset (#6571) 2020-09-18 06:18:06 -04:00
Sam Shleifer
67d9fc50d9
[s2s] remove double assert (#7223) 2020-09-17 18:32:31 -04:00
Sam Shleifer
a5638b2b3a
[s2s] dynamic batch size with --max_tokens_per_batch (#7030) 2020-09-17 15:19:34 -04:00
Stas Bekman
efeab6a3f1
[s2s] run_eval/run_eval_search tweaks (#7192)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-17 14:26:38 -04:00
Stas Bekman
1eeb206bef
[ported model] FSMT (FairSeq MachineTranslation) (#6940)
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in https://github.com/huggingface/transformers/pull/6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad15ca.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-17 11:31:29 -04:00
RafaelWO
709745927b
Transformer-XL: Remove unused parameters (#7087)
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL

 * Some changes are still to be done

* Removed 'tgt_len' and 'ext_len' from Transfomer-XL (2)

 * Removed comments
 * Fixed quality

* Changed warning to info
2020-09-17 06:10:34 -04:00
Sam Shleifer
45b0b1ff2f
[s2s] fix kwarg typo (#7196) 2020-09-16 21:58:57 -04:00
Sam Shleifer
0203ad43bc
[s2s] distributed eval cleanup (#7186) 2020-09-16 15:38:37 -04:00
sgugger
3babef815c Formatting 2020-09-16 14:57:09 -04:00
Stas Bekman
fdaf8ab349
[s2s run_eval] new features (#7109)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-16 13:59:57 -04:00
Stas Bekman
b0cbcdb05b
[logging] remove no longer needed verbosity override (#7100) 2020-09-15 04:01:14 -04:00
Sam Shleifer
33d479d2b2
[s2s] distributed eval in one command (#7124) 2020-09-14 15:57:56 -04:00
Antonio V Mendoza
e0e0675ac7
Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. (#6986)
* adding demo

* Update examples/lxmert/requirements.txt

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update examples/lxmert/checkpoint.sh

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* added user input for .py demo

* updated model loading, data extrtaction, checkpoints, and lots of other automation

* adding normalizing for bounding boxes

* Update requirements.txt

* some optimizations for extracting data

* added data extracting file

* added data extraction file

* minor fixes to reqs and readme

* Style

* remove options

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-14 10:07:04 -04:00
Stas Bekman
3ca1874ca4
[examples testing] restore code (#7099)
For some reason https://github.com/huggingface/transformers/pull/5512 re-added temp dir creation code that was removed by
https://github.com/huggingface/transformers/pull/6494 defeating the purpose of that PR for those tests.
2020-09-14 08:54:23 -04:00
Lysandre Debut
bb3106f741
Temporarily skip failing tests due to dependency change (#7118)
* Temporarily skip failing tests due to dependency change

* Remove trace
2020-09-14 07:42:13 -04:00
Sam Shleifer
0fab39695a
[s2s distill] allow pegasus-12-12 (#7104) 2020-09-14 00:03:59 -04:00
Sam Shleifer
de9e297964
[s2s] distributed eval cleanup (#7110) 2020-09-13 23:40:38 -04:00
Sam Shleifer
e7f8d2ab64
[s2s] two stage run_distributed_eval.py (#7105) 2020-09-13 17:28:18 -04:00
Sam Shleifer
b76cb1c3df
[s2s] run_eval supports --prefix clarg. (#6953) 2020-09-12 01:08:21 -04:00
Sam Shleifer
77950c485a
[wip/s2s] DistributedSortishSampler (#7056) 2020-09-10 15:23:44 -04:00
Sylvain Gugger
514486739c
Fix CI with change of name of nlp (#7054)
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
2020-09-10 14:51:08 -04:00
Sam Shleifer
e9a2f772bc
[s2s] --eval_max_generate_length (#7018) 2020-09-10 14:11:34 -04:00
Manuel Romero
1b76936d1a
Fix typo (#6994) 2020-09-08 04:22:57 -04:00
Lysandre
1650130b0f Remove misleading docstring 2020-09-07 14:16:59 +02:00
Boris Dayma
995a958dd1
feat: allow prefix for any generative model (#5885)
* feat: allow padding_text for any generative model

* docs(pipelines.py): correct typo

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* feat: rename padding_text to prefix

* fix: cannot tokenize empty text

* fix: pass prefix arg to pipeline

* test: add prefix to text-generetation pipeline

* style: fix style

* style: clean code and variable name more explicit

* set arg docstring to optional

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 03:03:45 -04:00
Sam Shleifer
ce37be9d94
[s2s] warn if --fp16 for torch 1.6 (#6977) 2020-09-06 20:41:29 -04:00
Stas Bekman
48ff6d5109
[doc] remove the implied defaults to :obj:None, s/True/ :obj:`True/, etc. (#6956)
* remove the implied defaults to :obj:`None`

* fix bug in the original

* replace to :obj:`True`, :obj:`False`
2020-09-04 18:22:25 -04:00
Sam Shleifer
a4fc0c80b1
[s2s] run_eval.py parses generate_kwargs (#6948) 2020-09-04 14:19:31 -04:00
Sam Shleifer
6078b12098
[s2s] distill: --normalize_hidden --supervise_forward (#6834) 2020-09-04 14:05:56 -04:00
Sam Shleifer
e95d262f25
[s2s] support early stopping based on loss, rather than rouge (#6927) 2020-09-03 17:31:35 -04:00
Sam Shleifer
207ed8cb78
[s2s] use --eval_beams command line arg (#6926) 2020-09-03 12:42:09 -04:00
Sam Shleifer
39ed68d597
[s2s] allow task_specific_params=summarization_xsum (#6923) 2020-09-03 11:11:40 -04:00
Sam Shleifer
5a318f075a
[s2s]: script to convert pl checkpoints to hf checkpoints (#6911)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 09:47:00 -04:00
brett koonce
b8e4906c97
tweak tar command in readme (#6919) 2020-09-03 09:29:01 -04:00
Jin Young (Daniel) Sohn
21d719238c
Add cache_dir to save features TextDataset (#6879)
* Add cache_dir to save features TextDataset

This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).

* style
2020-09-01 11:42:17 -04:00
Sam Shleifer
431ab19d7a
[fix] typo in available in helper function (#6859) 2020-08-31 17:59:34 -04:00
Sam Shleifer
b9772897ec
[s2s] command line args for faster val steps (#6833) 2020-08-31 16:16:10 -04:00
Sam Shleifer
61b7ba93f5
Marian distill scripts + integration test (#6799) 2020-08-31 13:48:26 -04:00
Sam Shleifer
dfa10a41ba
[s2s README] Add more dataset download instructions (#6737) 2020-08-30 16:29:24 -04:00
xujiaze13
32fe44086c
clearly indicate shuffle=False (#6312)
* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-30 19:26:10 +08:00
Sam Shleifer
0f58903bb6
Pegasus finetune script: add --adafactor (#6811) 2020-08-29 17:43:32 -04:00
Sam Shleifer
ac47458a02
[s2s] round runtime in run_eval (#6798) 2020-08-29 17:36:31 -04:00
Sam Shleifer
5ab21b072f
[s2s] Test hub configs in self-scheduled CI (#6809) 2020-08-28 17:05:52 -04:00
Sam Shleifer
9336086ab5
prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654)
* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc
2020-08-28 11:15:17 -04:00
Sam Shleifer
fb78a90d6a
PL: --adafactor option (#6776) 2020-08-27 22:19:46 -04:00
Tom Grek
c225e872ed
Fix it to work with BART (#6756) 2020-08-27 09:04:50 -04:00
Julien Plu
6f289dc97a
Fix the TF Trainer gradient accumulation and the TF NER example (#6713)
* Align TF NER example over the PT one

* Fix Dataset call

* Fix gradient accumulation training

* Apply style

* Address Sylvain's comments

* Address Sylvain's comments

* Apply style
2020-08-27 08:45:34 -04:00
Sam Shleifer
4bd7be9a42
s2s distillation uses AutoModelForSeqToSeqLM (#6761) 2020-08-26 23:25:11 -04:00
Sam Shleifer
61518e2df3
[s2s] run_eval.py QOL improvements and cleanup(#6746) 2020-08-26 18:59:20 -04:00
Lysandre
a75c64d80c Black 20 release 2020-08-26 17:20:22 +02:00
Joel Hanson
4db2fa77d7
Allow tests in examples to use cuda or fp16,if they are available (#5512)
* Allow tests in examples to use cuda or fp16,if they are available

The tests in examples didn't use the cuda or fp16 even if they where available.
- The text classification example (`run_glue.py`) didn't use the fp16 even if it was available but
  the device was take based on the availablity(cuda/cpu).
- The language-modeling example (`run_language_modeling.py`) was having `--no_cuda` argument
  which made the test to work without cuda. This example is having issue when running with fp16
  thus it not enabled (got an assertion error for perplexity due to it higher value).
- The cuda and fp16 is not enabled for question-answering example (`run_squad.py`) as it is having a
  difference in the f1 score.
- The text-generation example (`run_generation.py`) will take the cuda or fp16 whenever it is available.

Resolves some of: #5057

* Unwanted import of is_apex_available was removed

* Made changes to test examples file to have the pass --fp16 only if cuda and apex is avaliable
- run_glue.py: Removed the check for cuda and fp16.
- run_generation.py: Removed the check for cuda and fp16 also removed unwanted flag creation.

* Incorrectly sorted imports fixed

* The model needs to be converted to half precision

* Formatted single line if condition statement to multiline

* The torch_device also needed to be checked before running the test on examples
- The tests in examples which uses cuda should also depend from the USE_CUDA flag,
  similarly to the rest of the test suite. Even if we decide to set USE_CUDA to
  True by default, setting USE_CUDA to False should result in the examples not using CUDA

* Format some of the code in test_examples file

* The improper import of is_apex_available was sorted

* Formatted the code to keep the style standards

* The comma at the end of list giving a flake8 issue was fixed

* Import sort was fixed

* Removed the clean_test_dir function as its not used right now
2020-08-25 06:02:07 -04:00
Sam Shleifer
0344428f79
[s2s] round bleu, rouge to 4 digits (#6704) 2020-08-25 00:33:11 -04:00
vblagoje
dd522da004
Fix PL token classification examples (#6682) 2020-08-24 11:30:06 -04:00
Sylvain Gugger
a573777901
Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00
Suraj Patil
6f972e1423
update xnli-mt url (#6580) 2020-08-18 13:10:47 -04:00
Sam Shleifer
d2da2cb232
allow spaces in bash args with "$@" (#6521) 2020-08-17 09:06:35 -04:00
Stas Bekman
9dbe4094f2
[testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() (#6494)
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs

* respect after=True for tempfile, simplify code

* comments

* comment fix

* put `before` last in args, so can make debug even faster
2020-08-17 08:12:19 -04:00
Sam Shleifer
84c265ffcc
[lightning_base] fix s2s logging, only make train_loader once (#6404) 2020-08-16 22:49:41 -04:00
Sam Shleifer
72add6c98f
[s2s] docs, document desired filenames nicely (#6525) 2020-08-16 20:31:22 -04:00
Kyle Piira
2060181126
Fixes paths with spaces in seq2seq example (#6493) 2020-08-16 13:36:38 -04:00
Kevin Canwen Xu
eb613b566a
Use hash to clean the test dirs (#6475)
* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix
2020-08-14 15:34:39 +08:00
Kevin Canwen Xu
7bc00569df
Clean directory after script testing (#6453)
* Clean Dir after testing

* remove pabee ignore
2020-08-14 00:34:03 +08:00
Sam Shleifer
e92efcf728
Mult rouge by 100: standard units (#6359) 2020-08-13 12:15:54 -04:00
vblagoje
eda07efaa5
Add POS tagging and Phrase chunking token classification examples (#6457)
* Add more token classification examples

* POS tagging example

* Phrase chunking example

* PR review fixes

* Add conllu to third party list (used in token classification examples)
2020-08-13 12:09:51 -04:00
Sam Shleifer
f94a52cd79
[s2s] add BartTranslationDistiller for distilling mBART (#6363) 2020-08-12 11:41:04 -04:00
Stas Bekman
87b359439f
[test] replace capsys with the more refined CaptureStderr/CaptureStdout (#6422)
* replace capsys with the more refined CaptureStderr/CaptureStdout

* Update examples/seq2seq/test_seq2seq_examples.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-12 07:54:28 -04:00
Lysandre Debut
4ffea5ce2f
Disabled pabee test (#6431) 2020-08-12 02:52:50 -04:00
Sam Shleifer
3f071c4b6e
[examples] add pytest dependency (#6425) 2020-08-11 17:58:09 -04:00
Stas Bekman
ece0903e11
lr_schedulers: add get_polynomial_decay_schedule_with_warmup (#6361)
* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* [model_cards] electra-base-turkish-cased-ner (#6350)

* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Temporarily de-activate TPU CI

* Update modeling_tf_utils.py (#6372)

fix typo: ckeckpoint->checkpoint

* the test now works again (#6371)

* correct pl link in readme (#6364)

* refactor almost identical tests (#6339)

* refactor almost identical tests

* important to add a clear assert error message

* make the assert error even more descriptive than the original bt

* Small docfile fixes (#6328)

* Patch models (#6326)

* TFAlbertFor{TokenClassification, MultipleChoice}

* Patch models

* BERT and TF BERT info


s

* Update check_repo

* Ci GitHub caching (#6382)

* Cache Github Actions CI

* Remove useless file

* Colab button (#6389)

* Add colab button

* Add colab link for tutorials

* Fix links for open in colab (#6391)

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove dup (leftover from merge)

* convert the test into the new refactored format

* stick to using the current_step as is, without ++

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-11 17:56:41 -04:00
Stas Bekman
0203d6517f
[pl] restore lr logging behavior for glue, ner examples (#6314) 2020-08-11 16:27:11 -04:00
Sam Shleifer
be1520d3a3
rename prepare_translation_batch -> prepare_seq2seq_batch (#6103) 2020-08-11 15:57:07 -04:00
Sam Shleifer
66fa8ceaea
PegasusForConditionalGeneration (torch version) (#6340)
Co-authored-by: Jingqing  Zhang <jingqing.zhang15@imperial.ac.uk>
2020-08-11 14:31:23 -04:00
Stas Bekman
f6cb0f806e
[s2s] wmt download script use less ram (#6405) 2020-08-11 12:04:17 -04:00
Stas Bekman
7c6a085ebf
pl version: examples/requirements.txt is single source of truth (#6309) 2020-08-11 10:58:54 -04:00
Stas Bekman
f6c0680d36
add pl_glue example test (#6034)
* add pl_glue example test

* for now just test that it runs, next validate results of eval or predict?

* complete the run_pl_glue test to validate the actual outcome

* worked on my machine, CI gets less accuracy - trying higher epochs

* match run_pl.sh hparms

* more epochs?

* trying higher lr

* for now just test that the script runs to a completion

* correct the comment

* if cuda is available, add --fp16 --gpus=1 to cover more bases

* style
2020-08-11 03:16:52 -04:00
Sam Shleifer
b9ecd92ee4
[s2s] Script to save wmt data to disk (#6403) 2020-08-10 22:49:39 -04:00
Rohit Gupta
35eb96de4d
correct pl link in readme (#6364) 2020-08-10 03:08:46 -04:00
Stas Bekman
0830e79512
the test now works again (#6371) 2020-08-10 02:55:52 -04:00
Sam Shleifer
9a5ef83748
[s2s] fix --gpus clarg collision (#6358) 2020-08-08 21:51:37 -04:00
Suraj Patil
9bed355449
[s2s] fix label_smoothed_nll_loss (#6344) 2020-08-08 04:21:12 -04:00
Sam Shleifer
99f73bcc71
[s2s] tiny QOL improvement: run_eval prints scores (#6341) 2020-08-08 02:45:55 -04:00
Stas Bekman
322dffc6c9
remove a TODO item to use a tiny model (#6338)
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
2020-08-07 21:30:39 -04:00
zcain117
1b8a7ffcfd
Add setup for TPU CI to run every hour. (#6219)
* Add setup for TPU CI to run every hour.

* Re-organize config.yml

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-07 11:17:07 -04:00
Stas Bekman
6695450a23
[examples] consistently use --gpus, instead of --n_gpu (#6315) 2020-08-07 10:36:32 -04:00
Stas Bekman
175cd45e13
fix the shuffle agrument usage and the default (#6307) 2020-08-06 20:32:28 -04:00
Bhashithe Abeysinghe
ffceef2042
[Fix] text-classification PL example (#6027)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-06 15:46:43 -04:00
xujiaze13
eb2bd8d6eb
Remove redundant line in run_pl_glue.py (#6305) 2020-08-06 15:43:45 -04:00
Sam Shleifer
2804fff839
[s2s]Use prepare_translation_batch for Marian finetuning (#6293)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-06 14:58:38 -04:00
Doug Blank
b923871bb7
Adds comet_ml to the list of auto-experiment loggers (#6176)
* Support for Comet.ml

* Need to import comet first

* Log this model, not the one in the backprop step

* Log args as hyperparameters; use framework to allow fine control

* Log hyperparameters with context

* Apply black formatting

* isort fix integrations

* isort fix __init__

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer_tf.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address review comments

* Style + Quality, remove Tensorboard import test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-06 11:31:30 -04:00
Stas Bekman
376c02e9a9
[WIP] lightning_base: support --lr_scheduler with multiple possibilities (#6232)
* support --lr_scheduler with multiple possibilities

* correct the error message

* add a note about supported schedulers

* cleanup

* cleanup2

* needs the argument default

* style

* add another assert in the test

* implement requested changes

* cleanups

* fix relative import

* cleanup
2020-08-05 09:01:17 -04:00
Sam Shleifer
57eb1cb68d
[s2s] Document better mbart finetuning command (#6229)
* Document better MT command

* improve multigpu command
2020-08-03 18:22:31 -04:00
Victor SANH
0513f8d275
correct label extraction + add note on discrepancies on trained MNLI model and HANS (#6221) 2020-08-03 15:02:51 -04:00
Sam Shleifer
b6b2f2270f
s2s: fix LR logging, remove some dead code. (#6205) 2020-08-03 10:36:26 -04:00
Stas Bekman
d8dbf3b75d
[s2s] clean up + doc (#6184)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-01 14:51:07 -04:00
Stas Bekman
f250beb8aa
enable easy checkout switch (#5645)
* enable easy checkout switch

allow having multiple repository checkouts and not needing to remember to rerun 'pip install -e .[dev]' when switching between checkouts and running tests.

* make isort happy

* examples needs one too
2020-07-31 04:34:46 -04:00
Sylvain Gugger
91cb95461e
Switch from return_tuple to return_dict (#6138)
* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
2020-07-30 09:17:00 -04:00
Stas Bekman
3212b8850d
[s2s] add support for overriding config params (#6149) 2020-07-30 01:09:46 -04:00
Julien Plu
54f9fbeff8
Rework TF trainer (#6038)
* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import
2020-07-29 14:32:01 -04:00
Lysandre Debut
641b873c13
XLNet PLM Readme (#6121) 2020-07-29 11:38:15 -04:00
Sam Shleifer
92f8ce2ed6
Fix deebert tests (#6102) 2020-07-28 18:30:16 -04:00
Sam Shleifer
dafa296c95
[s2s] Delete useless method, log tokens_per_batch (#6081) 2020-07-28 11:24:23 -04:00
Stas Bekman
f0c70085c2
link to README.md (#6068)
* add a link to README.md

* Update README.md
2020-07-28 20:34:58 +08:00
Sam Shleifer
3c7fbf35a6
MBART: support summarization tasks where max_src_len > max_tgt_len (#6003)
* MBART: support summarization tasks

* fix test

* Style

* add tokenizer test
2020-07-28 08:18:11 -04:00
Sam Shleifer
7a68d40138
[s2s] Don't mention packed data in README (#6079) 2020-07-27 20:07:21 -04:00
Sam Shleifer
1e00ef681d
[s2s] dont document packing because it hurts performance (#6077) 2020-07-27 18:26:00 -04:00
Sam Shleifer
11792d7826
CL util to convert models to fp16 before upload (#5953) 2020-07-27 12:21:25 -04:00
Sam Shleifer
4302ace5bd
[pack_dataset] don't sort before packing, only pack train (#5954) 2020-07-27 12:14:23 -04:00
Suraj Patil
d1d15d6f2d
[examples (seq2seq)] fix preparing decoder_input_ids for T5 (#5994) 2020-07-27 10:10:43 -04:00
Sam Shleifer
c69ea5efc4
[CI] Don't test apex (#6021) 2020-07-24 15:34:16 -04:00
Sam Shleifer
c3206eef44
[test] partial coverage for train_mbart_enro_cc25.sh (#5976) 2020-07-22 14:34:49 -04:00
Sam Shleifer
feeb956a19
[docs] Add integration test example to copy pasta template (#5961)
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-22 12:48:38 -04:00
Sam Shleifer
9dab39feea
seq2seq/run_eval.py can take decoder_start_token_id (#5949) 2020-07-21 16:58:45 -04:00
Sam Shleifer
5b193b39b0
[examples/seq2seq]: add --label_smoothing option (#5919) 2020-07-21 16:51:39 -04:00
Sam Shleifer
95d1962b9c
[Doc] explaining romanian postprocessing for MBART BLEU hacking (#5943) 2020-07-21 14:12:48 -04:00
Aditya Soni
ccbf74a685
typos in seq2seq/readme (#5937) 2020-07-21 09:44:59 -04:00
Qingqing Cao
8e0bcb56ec
DataParallel fix: multi gpu evaluation (#5926)
The DataParallel training was fixed in https://github.com/huggingface/transformers/pull/5733, this commit also fixes the evaluation. It's more convenient when the user enables both `do_train` and `do_eval`.
2020-07-20 17:54:08 -04:00
Sam Shleifer
f1a4e06f1f
[Fix] seq2seq pack_dataset.py actually packs (#5913)
Huge MT speedup!
2020-07-20 15:18:26 -04:00
Stas Bekman
35cb101eae
DataParallel fixes (#5733)
* DataParallel fixes:

1. switched to a more precise check
-        if self.args.n_gpu > 1:
+        if isinstance(model, nn.DataParallel):

2. fix tests - require the same fixup under DataParallel as the training module

* another fix
2020-07-20 09:29:12 -04:00
Sam Shleifer
09a2f40684
Seq2SeqDataset uses linecache to save memory by @Pradhy729 (#5792)
Co-authored-by: Pradhy729 <49659913+Pradhy729@users.noreply.github.com>
2020-07-18 13:57:33 -04:00
Sam Shleifer
dad5e12e54
[seq2seq] distillation.py accepts trainer arguments (#5865) 2020-07-18 07:43:57 -04:00
Sam Shleifer
ba2400189b
[seq2seq] MAX_LEN env var for MT commands (#5837) 2020-07-17 22:51:31 -04:00
Nathan Raw
529850ae7b
Lightning Updates for v0.8.5 (#5798)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-07-17 22:43:06 -04:00
Sam Shleifer
e238e3d55a
[seq2seq] Don't copy self.source in sortishsampler (#5818) 2020-07-17 01:53:25 -04:00
Sam Shleifer
283500ff9f
[seq2seq] pack_dataset.py rewrites dataset in max_tokens format (#5819) 2020-07-16 14:06:49 -04:00
Sam Shleifer
1a647abf0b
[fix] check code quality (#5772) 2020-07-15 14:59:38 -04:00
Sam Shleifer
d0486c8bc2
[cleanup] T5 test, warnings (#5761) 2020-07-15 08:23:22 -04:00
Boris Dayma
4d5a8d6557
docs(wandb): explain how to use W&B integration (#5607)
* docs(wandb): explain how to use W&B integration

fix #5262

* Also mention TensorBoard

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-14 05:12:33 -04:00
Julien Chaumond
201d23f285 Update The Big Table of Tasks
Co-Authored-By: Suraj Patil <surajp815@gmail.com>
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-10 18:07:29 +02:00
Lysandre Debut
0533cf4706
Test XLA examples (#5583)
* Test XLA examples

* Style

* Using `require_torch_tpu`

* Style

* No need for pytest
2020-07-09 09:19:19 -04:00
Ji Xin
cfbb982974
Add DeeBERT (entropy-based early exiting for *BERT) (#5477)
* Add deebert code

* Add readme of deebert

* Add test for deebert

Update test for Deebert

* Update DeeBert (README, class names, function refactoring); remove requirements.txt

* Format update

* Update test

* Update readme and model init methods
2020-07-08 08:17:59 +08:00
Patrick von Platen
fde217c679
readme for benchmark (#5363) 2020-07-07 23:21:23 +02:00
Sam Shleifer
353b8f1e7a
Add mbart-large-cc25, support translation finetuning (#5129)
improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg
2020-07-07 13:23:01 -04:00
Patrick von Platen
4dc65591b5
[Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile (#5395)
* add first version of clm tf

* make style

* add more tests for bert

* update tf clm loss

* fix tests

* correct tf ner script

* add mlm loss

* delete bogus file

* clean tf auto model + add tests

* finish adding clm loss everywhere

* fix training in distilbert

* fix flake8

* save intermediate

* fix tf t5 naming

* remove prints

* finish up

* up

* fix tf gpt2

* fix new test utils import

* fix flake8

* keep backward compatibility

* Update src/transformers/modeling_tf_albert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_electra.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_mobilebert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_distilbert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply sylvains suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 18:15:53 +02:00
Suraj Patil
e49393c361
[examples] Add trainer support for question-answering (#4829)
* add SquadDataset

* add DataCollatorForQuestionAnswering

* update __init__

* add run_squad with  trainer

* add DataCollatorForQuestionAnswering in __init__

* pass data_collator to trainer

* doc tweak

* Update run_squad_trainer.py

* Update __init__.py

* Update __init__.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 08:57:08 -04:00
Shashank Gupta
3dcb748e31
Added data collator for permutation (XLNet) language modeling and related calls (#5522)
* Added data collator for XLNet language modeling and related calls

Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
to generate necessary inputs for language modeling training with
XLNetLMHeadModel. Also added related arguments, logic and calls in
examples/language-modeling/run_language_modeling.py.

Resolves: #4739, #2008 (partially)

* Changed name to `DataCollatorForPermutationLanguageModeling`

Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
similar to `mems` for XLNet).
Changed calls and imports appropriately.

* Added detailed comments, changed variable names

Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.

* Added tests for new data collator

Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.

* Fixed styling issues
2020-07-07 10:17:37 +02:00
Lysandre Debut
9d9b872b66
The add_space_before_punct_symbol is only for TransfoXL (#5549) 2020-07-06 12:17:05 -04:00
Sylvain Gugger
734a28a767
Clean up diffs in Trainer/TFTrainer (#5417)
* Cleanup and unify Trainer/TFTrainer

* Forgot to adapt TFTrainingArgs

* In tf scripts n_gpu -> n_replicas

* Update src/transformers/training_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Formatting

* Fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-01 11:00:20 -04:00
Sam Shleifer
13deb95a40
Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
Sylvain Gugger
4ade7491f4
Fix examples titles and optimization doc page (#5408) 2020-07-01 08:11:25 -04:00
Hong Xu
501040fd30
In the run_ner.py example, give the optional label arg a default value (#5326)
Otherwise, if label is not specified, the following error occurs:

	Traceback (most recent call last):
	  File "run_ner.py", line 303, in <module>
	    main()
	  File "run_ner.py", line 101, in main
	    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
	  File "/home/user/anaconda3/envs/bert/lib/python3.7/site-packages/transformers/hf_argparser.py", line 159, in parse_json_file
	    obj = dtype(**inputs)
	TypeError: __init__() missing 1 required positional argument: 'labels'
2020-06-30 19:45:35 -04:00
Sam Shleifer
27a7fe7a8d
examples/seq2seq: never override $WANDB_PROJECT (#5407) 2020-06-30 15:29:13 -04:00
Kevin Canwen Xu
331d8d2936
Upload DistilBART artwork (#5394) 2020-06-30 18:11:11 +08:00
MichaelJanz
9a473f1e43
Update Bertabs example to work again (#5355)
* Fix the bug 'Attempted relative import with no known parent package' when using the bertabs example. Also change the used model from bertabs-finetuned-cnndm, since it seems not be accessible anymore

* Update run_summarization.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-06-30 14:05:01 +08:00
Sam Shleifer
a316a6aaa8
[seq2seq docs] Move evaluation down, fix typo (#5365) 2020-06-29 10:36:04 -04:00
Patrick von Platen
4bcc35cd69
[Docs] Benchmark docs (#5360)
* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-29 16:08:57 +02:00
Sam Shleifer
45e26125de
save_pretrained: mkdir(exist_ok=True) (#5258)
* all save_pretrained methods mkdir if not os.path.exists
2020-06-28 14:53:47 -04:00
Suraj Patil
12dfbd4f7a
[examples] fix example links (#5344) 2020-06-28 12:54:54 -04:00
Sam Shleifer
393b8dc09a
examples/seq2seq/run_eval.py fixes and docs (#5322) 2020-06-26 19:20:43 -04:00
Sam Shleifer
5543b30aa6
[pl_examples] default warmup steps=0 (#5316) 2020-06-26 15:03:41 -04:00
Thomas Wolf
601d4d699c
[tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308)
* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
2020-06-26 19:48:14 +02:00
Patrick von Platen
79a82cc06a
[Benchmarks] improve Example Plotter (#5245)
* improve plotting

* better labels

* fix time plot
2020-06-26 15:00:14 +02:00
Lysandre Debut
7cc15bdd96
Closes #5218 2020-06-25 18:19:21 -04:00
Sam Shleifer
e008d520bb
[examples/seq2seq] more README improvements (#5274) 2020-06-25 10:13:01 -04:00
Sam Shleifer
40457bcebb
examples/seq2seq supports translation (#5202) 2020-06-24 23:58:11 -04:00
Victor SANH
4965aee064
[HANS] Fix label_list for RoBERTa/BART (class flipping) (#5196)
* fix weirdness in roberta/bart for mnli trained checkpoints

* black compliance

* isort code check
2020-06-24 14:38:15 -04:00
Patrick von Platen
9fe09cec76
[Benchmark] Extend Benchmark to all model type extensions (#5241)
* add benchmark for all kinds of models

* improved import

* delete bogus files

* make style
2020-06-24 15:11:42 +02:00
Sylvain Gugger
7c41057d50
Add hugs (#5225) 2020-06-24 07:56:14 -04:00
Sylvain Gugger
5e85b324ec
Use the script in utils (#5224) 2020-06-24 07:55:58 -04:00
Kevin Canwen Xu
54e9ce785d
Fix PABEE division by zero error (#5233)
* Fix PABEE division by zero error

* patience=0 by default
2020-06-24 16:10:36 +08:00
Sam Shleifer
76e5af4cfd
[pl_examples] revert deletion of optimizer_step (#5227) 2020-06-23 16:40:45 -04:00
Sam Shleifer
f5c2a122e3
Upgrade examples to pl=0.8.1(#5146) 2020-06-22 20:40:10 -04:00
Patrick von Platen
fa0be6d761
Benchmarks (#4912)
* finish benchmark

* fix isort

* fix setup cfg

* retab

* fix time measuring of tf graph mode

* fix tf cuda

* clean code

* better error message
2020-06-22 12:06:56 +02:00
Ilya Boytsov
bc3a0c0607
[examples] fixes arguments for summarization finetune scripts (#5157)
Authored-by: i.boytsov <i.boytsov@MAC867.local>
2020-06-21 11:51:21 -04:00
Kevin Canwen Xu
c0c577cf8f
Fix PABEE's result table (#5158) 2020-06-20 22:56:39 +08:00
Kevin Canwen Xu
2fd28d4363
Add BERT Loses Patience (Patience-based Early Exit) (#5078)
* Add BERT Loses Patience (Patience-based Early Exit)

* update model archive

* update format

* sort import

* flake8

* Add results

* full results

* align the table

* refactor to inherit

* default per gpu eval = 1

* Formatting

* Formatting

* isort

* modify readme

* Add check

* Fix format

* Fix format

* Doc strings

* ALBERT & BERT for sequence classification don't inherit from the original anymore

* Remove incorrect comments

* Remove incorrect comments

* Remove incorrect comments

* Sync up with new code

* Sync up with new code

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Finishing up!
2020-06-20 13:41:46 +08:00
Sam Shleifer
2db1e2f415
[cleanup] remove redundant code in SummarizationDataset (#5119) 2020-06-18 20:34:48 -04:00
Lysandre
efeb75b805 Remove misleading comment
closes #4958
2020-06-17 18:24:35 -04:00
Sam Shleifer
f1a3d03741
add pandas to setup.cfg (#5093) 2020-06-17 16:39:17 -04:00
Pranav Dayanand Pawar
049e14f0e3
very minor spelling correction in script command (#5090)
actual script name - counts_parameters.py
2020-06-17 16:08:43 -04:00
Sam Shleifer
043f9f51f9
[examples] SummarizationModule improvements (#4951) 2020-06-17 13:51:34 -04:00
Sylvain Gugger
cd40f6564e
Add header and fix command (#5082) 2020-06-17 11:45:05 -04:00
flozi00
af497b5672
Typo (#5069) 2020-06-16 16:46:20 -04:00
Yacine Jernite
49c5202522
Eli5 examples (#4968)
* add eli5 examples

* add dense query script

* query_di

* merging

* merging

* add_utils

* adds nearest neighbor wikipedia

* batch queries

* training_retriever

* new notebooks

* moved retriever traiing script

* finished wiki40b

* max_len_fix

* train_s2s

* retriever_batch_checkpointing

* cleanup

* merge

* dim_fix

* fix_indexer

* fix_wiki40b_snippets

* fix_embed_for_r

* fp32 index

* fix_sparse_q

* joint_training

* remove obsolete datasets

* add_passage_nn_results

* add_passage_nn_results

* add_batch_nn

* add_batch_nn

* add_data_scripts

* notebook

* notebook

* notebook

* fix_multi_gpu

* add_app

* full_caching

* full_caching

* notebook

* sparse_done

* images

* notebook

* add_image_gif

* with_Gif

* add_contr_image

* notebook

* notebook

* notebook

* train_functions

* notebook

* min_retrieval_length

* pandas_option

* notebook

* min_retrieval_length

* notebook

* notebook

* eval_Retriever

* notebook

* images

* notebook

* add_example

* add_example

* notebook

* fireworks

* notebook

* notebook

* joe's notebook comments

* app_update

* notebook

* notebook_link

* captions

* notebook

* assing RetriBert model

* add RetriBert to Auto

* change AutoLMHead to AutoSeq2Seq

* notebook downloads from hf models

* style_black

* style_black

* app_update

* app_update

* fix_app_update

* style

* style

* isort

* Delete WikiELI5training.ipynb

* Delete evaluate_eli5.py

* Delete WikiELI5explore.ipynb

* Delete ExploreWikiELI5Support.html

* Delete explainlikeimfive.py

* Delete wiki_snippets.py

* children before parent

* children before parent

* style_black

* style_black_only

* isort

* isort_new

* Update src/transformers/modeling_retribert.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* typo fixes

* app_without_asset

* cleanup

* Delete ELI5animation.gif

* Delete ELI5contrastive.svg

* Delete ELI5wiki_index.svg

* Delete choco_bis.svg

* Delete fireworks.gif

* Delete huggingface_logo.jpg

* Delete huggingface_logo.svg

* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb

* Delete eli5_app.py

* Delete eli5_utils.py

* readme

* Update README.md

* unused imports

* moved_info

* default_beam

* ftuned model

* disclaimer

* Update src/transformers/modeling_retribert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* black

* add_doc

* names

* isort_Examples

* isort_Examples

* Add doc to index

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-16 16:36:58 -04:00
Sam Shleifer
c3e607496c
[cleanup] examples test_run_squad uses tiny model (#5059) 2020-06-16 14:06:45 -04:00
Sylvain Gugger
d5477baf7d
Convert hans to Trainer (#5025)
* Convert hans to Trainer

* Tick box
2020-06-16 08:06:31 -04:00
Anthony MOI
36434220fc
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-15 17:12:51 -04:00
Sylvain Gugger
1affde2f10
Make DataCollator a callable (#5015)
* Make DataCollator a callable

* Update src/transformers/data/data_collator.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 11:58:33 -04:00
Stefan Schweter
d812e6d76e
NER: fix construction of input examples for RoBERTa (#4943)
* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model
2020-06-15 08:30:40 -04:00
Sylvain Gugger
403d309857
Hans data (#4854)
* Update hans data to be able to use Trainer

* Fixes

* Deal with tokenizer that don't have token_ids

* Clean up things

* Simplify data use

* Fix the input dict

* Formatting + proper path in README
2020-06-13 09:35:13 -04:00
VictorSanh
473808da0d update mvmt-pruning/saving_prunebert (updating torch to 1.5) 2020-06-11 19:42:45 +00:00
Sylvain Gugger
e8db8b845a
Remove unused arguments in Multiple Choice example (#4853)
* Remove unused arguments

* Formatting

* Remove second todo comment
2020-06-09 20:05:09 -04:00
songyouwei
29c36e9f36
run_pplm.py bug fix (#4867)
`is_leaf` may become `False` after `.to(device=device)` function call.
2020-06-09 19:14:27 -04:00
Sam Shleifer
f90bc44d9a
[examples] Cleanup summarization docs (#4876) 2020-06-09 17:38:28 -04:00
Amil Khare
02e5f79662
[examples] consolidate summarization examples (#4837) 2020-06-09 11:14:12 -04:00
daniel-shan
b6f365a8ed
Updates args in tf squad example. (#4820)
Co-authored-by: Daniel Shan <daniel.shan@workday.com>
2020-06-08 05:36:09 -04:00
Mr Ruben
ddf9a3dfc7
Updated path "cd examples/text-generation/pplm" (#4778)
https://github.com/huggingface/transformers/issues/4776
2020-06-05 21:16:48 -04:00
Sam Shleifer
875288b344
[isort] add matplotlib to known 3rd party dependencies (#4800) 2020-06-05 17:27:31 -04:00
Julien Chaumond
b9109f2de1 [doc] Make it clearer that text-generation does not involve training 2020-06-05 14:59:22 +02:00
Stefan Schweter
2a4b9e09c0
NER: Add new WNUT’17 example (#4681)
* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort
2020-06-04 19:13:17 -04:00
prajjwal1
48a05026de removed deprecared use of Variable api from pplm example 2020-06-04 18:07:49 -04:00
Jason Phang
492b352ab6
Remove unnecessary model_type arg in example (#4771) 2020-06-04 13:41:24 -04:00
Jin Young Sohn
b231a413f5
Add cache_dir to save features in GLUE + Differentiate match/mismatch for MNLI metrics (#4621)
* Glue task cleaup

* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.

* Style

* Fix pytype

* Fix type

* Use cache_dir in mnli mismatch eval dataset

* Small Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-02 13:40:14 -04:00
Julien Chaumond
b42586ea56
Fix CI after killing archive maps (#4724)
* 🐛 Fix model ids for BART and Flaubert
2020-06-02 10:21:09 -04:00
Julien Chaumond
d4c2cb402d
Kill model archive maps (#4636)
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
2020-06-02 09:39:33 -04:00
Lysandre Debut
88762a2f8c
Specify PyTorch versions for examples (#4710) 2020-06-02 04:29:28 -04:00
Victor SANH
bf760c80b5 finish README 2020-06-01 09:23:31 -04:00
Victor SANH
9d7d9b3ae0 weird import 2020-06-01 09:23:31 -04:00
Victor SANH
2a3c88a659 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
Victor SANH
4ac462bfb8 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
Victor SANH
35fa0bbca0 clarify README 2020-06-01 09:23:31 -04:00
Victor SANH
cc746a5020 flake8 compliance 2020-06-01 09:23:31 -04:00
Victor SANH
b11386e158 less prints in saving prunebert 2020-06-01 09:23:31 -04:00
Victor SANH
8b5d4003ab complete README 2020-06-01 09:23:31 -04:00
Victor SANH
5c8e5b3709 commplying with isort 2020-06-01 09:23:31 -04:00
Victor SANH
db2a3b2e01 space 2020-06-01 09:23:31 -04:00
Victor SANH
5f8f2d849a add floppy bert model notebok 2020-06-01 09:23:31 -04:00
Victor SANH
b41948f5cd add requirements 2020-06-01 09:23:31 -04:00
Victor SANH
fb8f4277b2 add scripts 2020-06-01 09:23:31 -04:00
Victor SANH
d489a6d3d5 add masked_run_* 2020-06-01 09:23:31 -04:00
Victor SANH
e4c07faf0a add sparsity modules 2020-06-01 09:23:31 -04:00
Patrick von Platen
96f57c9ccb
[Benchmark] Memory benchmark utils (#4198)
* improve memory benchmarking

* correct typo

* fix current memory

* check torch memory allocated

* better pytorch function

* add total cached gpu memory

* add total gpu required

* improve torch gpu usage

* update memory usage

* finalize memory tracing

* save intermediate benchmark class

* fix conflict

* improve benchmark

* improve benchmark

* finalize

* make style

* improve benchmarking

* correct typo

* make train function more flexible

* fix csv save

* better repr of bytes

* better print

* fix __repr__ bug

* finish plot script

* rename plot file

* delete csv and small improvements

* fix in plot

* fix in plot

* correct usage of timeit

* remove redundant line

* remove redundant line

* fix bug

* add hf parser tests

* add versioning and platform info

* make style

* add gpu information

* ensure backward compatibility

* finish adding all tests

* Update src/transformers/benchmark/benchmark_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* delete csv files

* fix isort ordering

* add out of memory handling

* add better train memory handling

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-05-27 23:22:16 +02:00
Lysandre Debut
6a17688021
per_device instead of per_gpu/error thrown when argument unknown (#4618)
* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-27 11:36:55 -04:00
Hao Tan
a9aa7456ac
Add back --do_lower_case to uncased models (#4245)
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased).

Results:
BERT-BASE without --do_lower_case:  'exact': 73.83, 'f1': 82.22
BERT-BASE with --do_lower_case:  'exact': 81.02, 'f1': 88.34
2020-05-26 21:13:07 -04:00
Antonis Maronikolakis
50d1ce411f
add DistilBERT to supported models (#4558) 2020-05-25 14:50:45 -04:00
Zhangyx
49296533ca
Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (#4463)
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.

* Use Split enum + always output the label name

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-21 09:17:44 -04:00
Tobias Lee
271bedb485
[examples] fix no grad in second pruning in run_bertology (#4479)
* fix no grad in second pruning and typo

* fix prune heads attention mismatch problem

* fix

* fix

* fix

* run make style

* run make style
2020-05-21 09:17:03 -04:00
Patrick von Platen
aa925a52fa
[Tests, GPU, SLOW] fix a bunch of GPU hardcoded tests in Pytorch (#4468)
* fix gpu slow tests in pytorch

* change model to device syntax
2020-05-19 21:35:04 +02:00
Julien Chaumond
5e7fe8b585
Distributed eval: SequentialDistributedSampler + gather all results (#4243)
* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii
2020-05-18 22:02:39 -04:00
Boris Dayma
d9ece8233d
fix(run_language_modeling): use arg overwrite_cache (#4407) 2020-05-18 11:37:35 -04:00
Julien Chaumond
757baee846 Fix un-prefixed f-string
see https://github.com/huggingface/transformers/pull/4367#discussion_r426356693

Hat/tip @girishponkiya
2020-05-18 11:20:46 -04:00
Julien Chaumond
15550ce0d1 [skip ci] remove local rank 2020-05-15 17:08:38 -04:00
Lysandre Debut
edf9ac11d4
Should return overflowing information for the log (#4385) 2020-05-15 09:49:11 -04:00
Julien Chaumond
af2e6bf87c [examples] Streamline doc 2020-05-14 20:34:31 -04:00
Julien Chaumond
448c467256
Fix: unpin flake8 and fix cs errors (#4367)
* Fix: unpin flake8 and fix cs errors

* Ok we still need to quote those
2020-05-14 13:14:26 -04:00
Julien Chaumond
c547f15a17 Use Filelock to ensure distributed barriers
see context in https://github.com/huggingface/transformers/pull/4223
2020-05-14 11:58:32 -04:00
Julien Plu
ca13618681
Question Answering for TF trainer (#4320)
* Add QA trainer example for TF

* Make data_dir optional

* Fix parameter logic

* Fix feature convert

* Update the READMEs to add the question-answering task

* Apply style

* Change 'sequence-classification' to 'text-classification' and prefix with 'eval' all the metric names

* Apply style

* Apply style
2020-05-13 09:22:31 -04:00
Julien Chaumond
241759101e
(v2) Improvements to the wandb integration (#4324)
* Improvements to the wandb integration

* small reorg + no global necessary

* feat(trainer): log epoch and final metrics

* Simplify logging a bit

* Fixup

* Fix crash when just running eval

Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Boris Dayma <boris.dayma@gmail.com>
2020-05-12 21:52:01 -04:00
Viktor Alm
e4512aab3b
Add MultipleChoice to TFTrainer [WIP] (#4270)
* catch gpu len 1 set to gpu0

* Add mpc to trainer

* Add MPC for TF

* fix TF automodel for MPC and add Albert

* Apply style

* Fix import

* Note to self: double check

* Make shape None, None for datasetgenerator output shapes

* Add from_pt bool which doesnt seem to work

* Original checkpoint dir

* Fix docstrings for automodel

* Update readme and apply style

* Colab should probably not be from users

* Colabs should probably not be from users

* Add colab

* Update README.md

* Update README.md

* Cleanup __intit__

* Cleanup flake8 trailing comma

* Update src/transformers/training_args_tf.py

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Viktor Alm <viktoralm@pop-os.localdomain>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-12 08:48:48 -04:00
Stefan Schweter
3f42eb979f
Documentation: fix links to NER examples (#4279)
* docs: fix link to token classification (NER) example

* examples: fix links to NER scripts
2020-05-11 12:48:21 -04:00
Julien Chaumond
7b75aa9fa5
[TPU] Doc, fix xla_spawn.py, only preprocess dataset once (#4223)
* [TPU] Doc, fix xla_spawn.py, only preprocess dataset once

* Update examples/README.md

* [xla_spawn] Add `_mp_fn` to other Trainer scripts

* [TPU] Fix: eval dataloader was None
2020-05-08 14:10:05 -04:00
Julien Chaumond
c99fe0386b [doc] Fix broken links + remove crazy big notebook 2020-05-07 18:44:18 -04:00
Julien Chaumond
6669915b65 [examples] Add column for pytorch-lightning support 2020-05-07 15:26:58 -04:00
Julien Chaumond
612fa1b10b Examples readme.md (#4215)
* README

* Update README.md
2020-05-07 15:00:06 -04:00
Julien Chaumond
0ae96ff8a7 BIG Reorganize examples (#4213)
* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around
2020-05-07 13:48:44 -04:00
Lysandre Debut
ebf80e2e70
Tpu trainer (#4146)
* wip

* wip

* a last wip

* Better logging when using TPUs

* Correct argument name

* Tests

* fix

* Metrics in evaluation

* Update src/transformers/training_args.py

* [tpu] Use launcher script instead

* [tpu] lots of tweaks

* Fix formatting

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-07 10:34:04 -04:00
Julien Plu
aad50151f3
TF version of the trainer (#4017)
* First commit to add a TF version of the trainer.

* Make the TF trainer closer to what looks the PT trainer

* Refactoring common code between the PT and TF trainer into an util file.

* Some bugfix + better similarity with the PT trainer

* Add missing class in transformers init

* Bugfix over prediction + use classification report instead of simple metrics

* Fix name error

* Fix optimization tests + style

* Apply style

* Several bugfix for multi-gpu training

* Apply style

* Apply style

* Add glue example for the TF trainer

* Several bugix + address the reviews

* Fix on the TF training args file

* Add a debug mode

* Bugfix in utils_ner.py when segment_ids is None

* Apply style

* Apply style

* Add TPU strategy

* Fix selection strategy
2020-05-06 12:56:52 -04:00
Simone Primarosa
25296b12aa
Fix overwrite_cache behaviour for pytorch lightning examples (#4093) 2020-05-06 12:24:49 -04:00
William Falcon
4c5bd92183
Update run_pl_glue.py (#4117) 2020-05-02 10:38:30 -04:00
William Falcon
5282b31df4
Update run_pl_ner.py (#4118) 2020-05-02 10:38:21 -04:00
Stefan Schweter
1e616c0af3
NER: parse args from .args file or JSON (#4110)
* ner: parse args from .args file or JSON

* examples: mention json-based configuration file support for run_ner script
2020-05-02 10:29:17 -04:00
Julien Chaumond
b8686174be
Merge pull request #3934 from huggingface/examples_args_from_files
[qol] example scripts: parse args from .args file or JSON
2020-04-30 22:40:13 -04:00
Julien Chaumond
455c639093
CDN urls (#4030)
* [file_utils] use_cdn + documentation

* Move to cdn. urls for weights

* [urls] Hotfix for bert-base-japanese
2020-04-28 20:27:14 -04:00
Sam Shleifer
d714dfeaa8
[isort] add known 3rd party to setup.cfg (#4053)
* add known 3rd party to setup.cfg

* comment

* Update CONTRIBUTING.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-28 17:12:00 -04:00
Patrick von Platen
180585741c
[Generation] Generation should allow to start with empty prompt (#3993)
* fix empty prompt

* fix length in generation pipeline
2020-04-28 14:33:15 +02:00
Julien Chaumond
c811526004 [examples] For convenience, also save the tokenizer
Close #3921
2020-04-24 09:52:42 -04:00
Cola
b0167632ce
Shuffle train subset for summarization example (#3909)
* Shuffle train subset

* Cleaner shuffle
2020-04-24 07:55:34 -04:00
Julien Chaumond
1dc9b3c784 Fixes #3877 2020-04-22 01:15:10 +00:00
Julien Chaumond
dd9d483d03
Trainer (#3800)
* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6ef0
2020-04-21 20:11:56 -04:00
Andrey Kulagin
b1ff0b2ae7 Fix bug in examples: double wrap into DataParallel during eval 2020-04-20 19:37:44 -04:00
Jared T Nielsen
c79b550dd0
Add qas_id to SquadResult and SquadExample (#3745)
* Add qas_id

* Fix incorrect name in squad.py

* Make output files optional for squad eval
2020-04-20 16:08:57 -04:00
Sam Shleifer
a504cb49ec
[examples] fix summarization do_predict (#3866) 2020-04-20 10:49:56 -04:00
Thomas Wolf
827d6d6ef0
Cleanup fast tokenizers integration (#3706)
* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy

Co-authored-by: Stefan Schweter <stefan@schweter.it>
2020-04-18 13:43:57 +02:00
Sam Shleifer
f0c96fafd1
[examples] summarization/bart/finetune.py supports t5 (#3824)
renames `run_bart_sum.py` to `finetune.py`
2020-04-16 15:15:19 -04:00
Patrick von Platen
80a1694514
[Examples, T5] Change newstest2013 to newstest2014 and clean up (#3817)
* Refactored use of newstest2013 to newstest2014. Fixed bug where argparse consumed first command line argument as model_size argument rather than using default model_size by forcing explicit --model_size flag inclusion

* More pythonic file handling through 'with' context

* COSMETIC - ran Black and isort

* Fixed reference to number of lines in newstest2014

* Fixed failing test. More pythonic file handling

* finish PR from tholiao

* remove outcommented lines

* make style

* make isort happy

Co-authored-by: Thomas Liao <tholiao@gmail.com>
2020-04-16 20:00:41 +02:00
Davide Fiocco
b1e2368b32
Typo fix (#3821) 2020-04-16 11:04:32 -04:00
Sam Shleifer
c59b1e682d
[examples] unit test for run_bart_sum (#3544)
- adds pytorch-lightning dependency
2020-04-15 18:35:01 -04:00
Patrick von Platen
01c37dcdb5
[Config, Caching] Remove output_past everywhere and replace by use_cache argument (#3734)
* remove output_past from pt

* make style

* add optional input length for gpt2

* add use cache to prepare input

* save memory in gpt2

* correct gpt2 test inputs

* make past input optional for gpt2

* finish use_cache for all models

* make style

* delete modeling_gpt2 change in test file

* correct docstring

* correct is true statements for gpt2
2020-04-14 14:40:28 -04:00
elk-cloner
5ebd898953
fix dataset shuffling for Distributed training (#huggingface#3721) (#3766) 2020-04-13 10:11:18 -04:00
Jin Young Sohn
700ccf6e35
Fix glue_convert_examples_to_features API breakage (#3742) 2020-04-10 16:03:27 -04:00
Jin Young Sohn
551b450527
Add run_glue_tpu.py that trains models on TPUs (#3702)
* Initial commit to get BERT + run_glue.py on TPU

* Add README section for TPU and address comments.

* Cleanup TPU bits from run_glue.py (#3)

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* Cleanup TPU bits from run_glue.py

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* No need to call `xm.mark_step()` explicitly (#4)

Since for gradient accumulation we're accumulating on batches from
`ParallelLoader` instance which on next() marks the step itself.

* Resolve R/W conflicts from multiprocessing (#5)

* Add XLNet in list of models for `run_glue_tpu.py` (#6)

* Add RoBERTa to list of models in TPU GLUE (#7)

* Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)

* Use barriers to reduce duplicate work/resources (#9)

* Shard eval dataset and aggregate eval metrics (#10)

* Shard eval dataset and aggregate eval metrics

Also, instead of calling `eval_loss.item()` every time do summation with
tensors on device.

* Change defaultdict to float

* Reduce the pred, label tensors instead of metrics

As brought up during review some metrics like f1 cannot be aggregated
via averaging. GLUE task metrics depends largely on the dataset, so
instead we sync the prediction and label tensors so that the metrics can
be computed accurately on those instead.

* Only use tb_writer from master (#11)

* Apply huggingface black code formatting

* Style

* Remove `--do_lower_case` as example uses cased

* Add option to specify tensorboard logdir

This is needed for our testing framework which checks regressions
against key metrics writtern by the summary writer.

* Using configuration for `xla_device`

* Prefix TPU specific comments.

* num_cores clarification and namespace eval metrics

* Cache features file under `args.cache_dir`

Instead of under `args.data_dir`. This is needed as our test infra uses
data_dir with a read-only filesystem.

* Rename `run_glue_tpu` to `run_tpu_glue`

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-04-10 12:53:54 -04:00
Julien Chaumond
cbad305ce6
[docs] The use of do_lower_case in scripts is on its way to deprecation (#3738) 2020-04-10 12:34:04 -04:00
Julien Chaumond
b169ac9c2b
[examples] Generate argparsers from type hints on dataclasses (#3669)
* [examples] Generate argparsers from type hints on dataclasses

* [HfArgumentParser] way simpler API

* Restore run_language_modeling.py for easier diff

* [HfArgumentParser] final tweaks from code review
2020-04-10 12:21:58 -04:00
Julien Chaumond
f98d0ef2a2
Big cleanup of glue_convert_examples_to_features (#3688)
* Big cleanup of `glue_convert_examples_to_features`

* Use batch_encode_plus

* Cleaner wrapping of glue_convert_examples_to_features for TF

@lysandrejik

* Cleanup syntax, thanks to @mfuntowicz

* Raise explicit error in case of user error
2020-04-10 10:20:18 -04:00
Sam Shleifer
715aa5b135
[Bart] Replace config.output_past with use_cache kwarg (#3632) 2020-04-07 19:08:26 -04:00
Sam Shleifer
e344e3d402
[examples] SummarizationDataset cleanup (#3451) 2020-04-07 19:05:58 -04:00
Patrick von Platen
80fa0f7812
[Examples, Benchmark] Improve benchmark utils (#3674)
* improve and add features to benchmark utils

* update benchmark style

* remove output files
2020-04-07 16:25:57 -04:00
Ethan Perez
e52d1258e0
Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (#3631)
* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py

`convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase.

* Simplifying change to match recent commits
2020-04-06 16:52:22 -04:00
Nicolas
c50aa67bff
Resizing embedding matrix before sending it to the optimizer. (#3532)
* Resizing embedding matrix after sending it to the optimizer prevents from updating the newly resized matrix.

* Remove space for style matter
2020-04-02 15:00:05 -04:00
Mark Kockerbeck
1b10159950
Adding should_continue check for retraining (#3509) 2020-04-02 14:07:08 -04:00
Patrick von Platen
ab5d06a094
[T5, examples] replace heavy t5 models with tiny random models (#3556)
* replace heavy t5 models with tiny random models as was done by sshleifer

* fix isort
2020-04-02 12:34:05 +02:00
Julien Chaumond
50e15c825c
Tokenizers: Start cleaning examples a little (#3455)
* Start cleaning examples

* Fixup
2020-04-01 07:13:40 -04:00
Patrick von Platen
ae6834e028
[Examples] Clean summarization and translation example testing files for T5 and Bart (#3514)
* fix conflicts

* add model size argument to summarization

* correct wrong import

* fix isort

* correct imports

* other isort make style

* make style
2020-03-31 17:54:13 +02:00
Ethan Perez
e5c393dceb
[Bug fix] Using loaded checkpoint with --do_predict (instead of… (#3437)
* Using loaded checkpoint with --do_predict

Without this fix, I'm getting near-random validation performance for a trained model, and the validation performance differs per validation run. I think this happens since the `model` variable isn't set with the loaded checkpoint, so I'm using a randomly initialized model. Looking at the model activations, they differ each time I run evaluation (but they don't with this fix).

* Update checkpoint loading

* Fixing model loading
2020-03-30 17:06:08 -04:00
Sam Shleifer
8deff3acf2
[bart-tiny-random] Put a 5MB model on S3 to allow faster exampl… (#3488) 2020-03-30 12:28:27 -04:00
Julien Plu
d38bbb225f
Update the NER TF script (#3511)
* Update the NER TF script to remove the softmax and make the pad token label id to -1

* Reformat the quality and style

Co-authored-by: Julien Plu <julien.plu@adevinta.com>
2020-03-30 09:50:12 -04:00
Sam Shleifer
33ef7002e1
[Docs] examples/summarization/bart: Simplify CNN/DM preprocessi… (#3516) 2020-03-29 13:25:42 -04:00
Patrick von Platen
17dceae7a1
Fix circle ci flaky fail of wmt example (#3485)
* force bleu

* fix wrong file name

* rename file

* different filenames for each example test

* test files should clean up after themselves

* test files should clean up after themselves

* do not force bleu

* correct typo

* fix isort
2020-03-27 13:01:28 -04:00
Funtowicz Morgan
b08259a120
run_ner.py / bert-base-multilingual-cased can output empty tokens (#2991)
* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* run_ner.py - Do not add a label to the labels_ids if word_tokens is empty.

This can happen when using bert-base-multilingual-cased with an input containing an unique space.
In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior
over the labels_ids tokens adding one more tokens than tokens vector.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-27 10:59:55 -04:00
Patrick von Platen
f4f4946836
Rename t5-large to t5-base in README.md 2020-03-27 15:57:58 +01:00
Lysandre Debut
ff80b73157
Add option to choose T5 model size. (#3480)
T5-small in test


isort
2020-03-27 15:56:59 +01:00
Patrick von Platen
5ad2ea06af
Add wmt translation example (#3428)
* add translation example

* make style

* adapt docstring

* add gpu device as input for example

* small renaming

* better README
2020-03-26 19:07:59 +01:00
Patrick von Platen
e703e923ca
Add t5 summarization example (#3411)
* rebase to master

* change tf to pytorch

* change to pytorch

* small fix

* renaming

* add gpu training possibility

* renaming

* improve README

* incoorporate collins feedback

* better Readme

* better README.md
2020-03-26 18:17:55 +01:00
Lysandre Debut
ffcffebe85
Force the return of token type IDs (#3439) 2020-03-26 09:41:36 +01:00
Andre Carrera
3d76df3a12
BART for summarization training with CNN/DM using pytorch-lightning 2020-03-24 21:00:24 -04:00
Julien Chaumond
eaabaaf750 [run_language_modeling] Fix: initialize a new model from a config object 2020-03-24 17:56:40 -04:00
Julien Chaumond
f8823bad9a Expose missing mappings (see #3415) 2020-03-24 17:46:25 -04:00
Julien Chaumond
a8e3336a85 [examples] Use AutoModels in more examples 2020-03-23 20:11:14 -04:00
Julien Chaumond
f7dcf8fcea [BertAbs] Move files around for more consistent naming 2020-03-23 13:58:49 -04:00
Julien Chaumond
cf72479bf1 One last reorder of {scheduler,optimizer}.step() 2020-03-20 18:05:50 -04:00
Elijah Rippeth
634bf6cf7e fixes lr_scheduler warning
For more details, see https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2020-03-20 18:03:50 -04:00
Patrick von Platen
95e00d0808
Clean special token init in modeling_....py (#3264)
* make style

* fix conflicts
2020-03-20 21:41:04 +01:00
Nitish Shirish Keskar
8becb73293
removing torch.cuda.empty_cache() from TF function (#3267)
torch.cuda.empty_cache() was being called from a TF function (even when torch is unavailable)
not sure any replacement is needed if TF OOMs
2020-03-19 23:25:30 +01:00
Julien Chaumond
656e1386a2 Fix #3305: run_ner only possible on ModelForTokenClassification models 2020-03-19 16:41:28 -04:00
mataney
c44a17db1b
[FIX] not training when epoch is small (#3006)
* solving bug where for small epochs and large gradient_accumulation_steps we never train

* black formatting

* no need to change these files
2020-03-19 11:21:21 -04:00
J.P Lee
2b60a26b46
Update examples/ner/run_ner.py to use AutoModel (#3305)
* Update examples/ner/run_ner.py to use AutoModel

* Fix missing code and apply `make style` command
2020-03-17 12:30:10 -04:00
Nathan Raw
930c9412b4
[WIP] Lightning glue example (#3290)
*  Alter base pl transformer to use automodels

* 🐛 Add batch size env variable to function call

* 💄 Apply black code style from Makefile

* 🚚 Move lightning base out of ner directory

*  Add lightning glue example

* 💄 self

* move _feature_file to base class

*  Move eval logging to custom callback

* 💄 Apply black code style

* 🐛 Add parent to pythonpath, remove copy command

* 🐛 Add missing max_length kwarg
2020-03-17 11:46:42 -04:00
Patrick von Platen
e8f44af5bf
[generate] do_sample default back to False (#3298)
* change do_samples back

* None better default as boolean

* adapt do_sample to True in test example

* make style
2020-03-17 10:52:37 -04:00
Thomas Wolf
2187c49f5c
CPU/GPU memory benchmarking utilities - Remove support for python 3.5 (now only 3.6+) (#3186)
* memory benchmark rss

* have both forward pass and line-by-line mem tracing

* cleaned up tracing

* refactored and cleaning up API

* no f-strings yet...

* add GPU mem logging

* fix GPU memory monitoring

* style and quality

* clean up and doc

* update with comments

* Switching to python 3.6+

* fix quality
2020-03-17 10:17:11 -04:00
Sam Shleifer
5ea8ba67b4
[BART] Remove unused kwargs (#3279)
* Remove unused kwargs
* dont call forward in tests
2020-03-15 23:00:44 -04:00
Thomas Wolf
3814e167d9
Merge pull request #3225 from patrickvonplaten/finalize_merge_bart_generate_into_default_generate
Complete merge Seq-2-Seq generation into default generation
2020-03-14 15:08:59 +01:00
Patrick von Platen
4f75d380a4 make style 2020-03-13 16:35:52 +01:00
Patrick von Platen
c2ee3840ae update file to new starting token logic 2020-03-13 16:34:44 +01:00
dependabot[bot]
afea70c01c Bump psutil from 5.6.3 to 5.6.6 in /examples/distillation
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.6.3 to 5.6.6.
- [Release notes](https://github.com/giampaolo/psutil/releases)
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.6.3...release-5.6.6)

Signed-off-by: dependabot[bot] <support@github.com>
2020-03-12 21:14:56 -04:00
Sam Shleifer
2e81b9d8d7
Bart: update example for #3140 compatibility (#3233)
* Update bart example docs
2020-03-12 10:36:37 -04:00
Patrick von Platen
5b3000d933 renamed min_len to min_length 2020-03-11 11:06:56 +01:00
Shubham Agarwal
5ca356a464
NER - pl example (#3180)
* 1. seqeval required by ner pl example. install from examples/requirements. 2. unrecognized arguments: save_steps

* pl checkpoint callback filenotfound error: make directory and pass

* #3159 pl checkpoint path difference

* 1. Updated Readme for pl 2. pl script now also correct displays logs 3. pass gpu ids compared to number of gpus

* Updated results in readme

* 1. updated readme 2. removing deprecated pl methods 3. finalizing scripts

* comment length check

* using deprecated validation_end for stable results

* style related changes
2020-03-09 20:43:38 -04:00
Sam Shleifer
3aca02efb3
Bart example: model.to(device) (#3194) 2020-03-09 15:09:35 -04:00
Lysandre
eb3e6cb04f cased -> uncased in BERT SQuAD example
closes #3183
2020-03-09 10:54:18 -04:00
Sam Shleifer
857e0a0d3b
Rename BartForMaskedLM -> BartForConditionalGeneration (#3114)
* improved documentation
2020-03-05 17:41:18 -05:00
Sam Shleifer
5b396457e5
Summarization Examples: add Bart CNN Evaluation (#3082)
* Rename and improve example

* Add test

* slightly faster test

* style

* This breaks remy prolly

* shorter test string

* no slow

* newdir structure

* New tree

* Style

* shorter

* docs

* clean

* Attempt future import

* more import hax
2020-03-03 15:29:59 -05:00
Davide Fiocco
c0c7ec3458
Don't crash if fine-tuned model doesn't end with a number (#3099)
That's the same fix applied in https://github.com/huggingface/transformers/issues/2258 , but for the GLUE example
2020-03-03 08:59:47 -05:00
Victor SANH
6b1ff25084
fix n_gpu count when no_cuda flag is activated (#3077)
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
2020-03-02 10:20:21 -05:00
Julien Chaumond
298bed16a8 make style 2020-03-01 14:08:01 -05:00
VictorSanh
852e032ca6 include roberta in run_squad_w_distillation - cc @graviraja 2020-03-01 01:56:50 +00:00
VictorSanh
b5509abb36 --do_lower_case will always trick me... 2020-03-01 01:39:24 +00:00
srush
908fa43b54
Changes to NER examples for PLT and TPU (#3053)
* changes to allow for tpu training

* black

* tpu

* tpu
2020-02-27 16:45:32 -05:00
Lysandre Debut
8bcb37bfb8
NER support for Albert in run_ner.py and NerPipeline (#2983)
* * Added support for Albert when fine-tuning for NER

* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens

* Added ,

* Now passes style guide enforcement

* Changes from reviews.

* Code now passes style enforcement

* Added test for AlbertForTokenClassification

* Added test for AlbertForTokenClassification
2020-02-27 10:22:55 -05:00
Martin Malmsten
d762d4289c Code now passes style enforcement 2020-02-26 23:50:40 +01:00
Martin Malmsten
9495d38b0d Changes from reviews. 2020-02-26 23:36:39 +01:00
Andrew Walker
5bc99e7f33
fix several typos in Distil* readme (#3034) 2020-02-26 12:39:54 -05:00
Jhuo IH
7a7ee28cb9
missing ner link (#2967) 2020-02-25 14:06:57 -05:00
Patrick von Platen
65d74c4965
Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987)
* add preprocessing to add space before punctuation for transfo_xl

* improve warning messages

* make style

* compile regex at instantination of tokenizer object
2020-02-24 15:11:10 -05:00
Martin Malmsten
105dcb4162 Now passes style guide enforcement 2020-02-23 21:47:59 +01:00
Martin Malmsten
33eb8a165d Added , 2020-02-23 21:43:31 +01:00
Martin Malmsten
869b66f6b3 * Added support for Albert when fine-tuning for NER
* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
2020-02-23 21:13:03 +01:00
saippuakauppias
cafc4dfc7c fix hardcoded path in examples readme 2020-02-22 11:12:38 -05:00
Patrick von Platen
fc38d4c86f
Improve special_token_id logic in run_generation.py and add tests (#2885)
* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* changed fast random lm generation testing design to more general one

* delete in old testing design in gpt2

* correct old variable name

* temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed

* adapted all fast random generate tests to new design

* better warning description in modeling_utils

* better comment

* better comment and error message

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-02-21 12:09:59 -05:00
maximeilluin
c749a543fa
Added CamembertForQuestionAnswering (#2746)
* Added CamembertForQuestionAnswering

* fixed camembert tokenizer case
2020-02-21 12:01:02 -05:00
Martin Malmsten
4452b44b90
Labels are now added to model config under id2label and label2id (#2945) 2020-02-21 08:53:05 -05:00
Sam Shleifer
53ce3854a1
New BartModel (#2745)
* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs
2020-02-20 18:11:13 -05:00
srush
889d3bfdbb
default arg fix (#2937) 2020-02-20 15:31:17 -05:00
srush
b662f0e625
Support for torch-lightning in NER examples (#2890)
* initial pytorch lightning commit

* tested multigpu

* Fix learning rate schedule

* black formatting

* fix flake8

* isort

* isort

* .

Co-authored-by: Check your git settings! <chris@chris-laptop>
2020-02-20 11:50:05 -05:00
VictorSanh
2ae98336d1 fix vocab size in binarized_data (distil): int16 vs int32 2020-02-18 16:17:35 +00:00
VictorSanh
0dbddba6d2 fix typo in hans example call 2020-02-17 20:19:57 +00:00
Manuel Romero
4e597c8e4d Fix typo 2020-02-14 09:07:42 -05:00
Julien Chaumond
4d36472b96 [run_ner] Don't crash if fine-tuning local model that doesn't end with digit 2020-02-14 03:25:29 +00:00
Lysandre
f54a5bd37f Raise error when using an mlm flag for a clm model + correct TextDataset 2020-02-12 13:23:14 -05:00
Lysandre
569897ce2c Fix a few issues regarding the language modeling script 2020-02-12 13:23:14 -05:00
VictorSanh
ee5a6856ca distilbert-base-cased weights + Readmes + omissions 2020-02-07 15:28:13 -05:00
Julien Chaumond
42f08e596f [examples] rename run_lm_finetuning to run_language_modeling 2020-02-07 09:15:28 -05:00
Julien Chaumond
4f7bdb0958 [examples] Fix broken markdown 2020-02-07 09:15:28 -05:00
Peter Izsak
6fc3d34abd Fix multi-gpu evaluation in run_glue.py 2020-02-06 16:38:55 -05:00
Julien Chaumond
ada24def22 [run_lm_finetuning] Tweak fix for non-long tensor, close #2728
see 1ebfeb7946 and #2728

Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2020-02-05 12:49:18 -05:00
Yuval Pinter
d1ab1fab1b
pass langs parameter to certain XLM models (#2734)
* pass langs parameter to certain XLM models

Adding an argument that specifies the language the SQuAD dataset is in so language-sensitive XLMs (e.g. `xlm-mlm-tlm-xnli15-1024`) don't default to language `0`.
Allows resolution of issue #1799 .

* fixing from `make style`

* fixing style (again)
2020-02-04 17:12:42 -05:00
Lysandre
3bf5417258 Revert erroneous fix 2020-02-04 16:31:07 -05:00
Lysandre
1ebfeb7946 Cast to long when masking tokens 2020-02-04 15:56:16 -05:00
Lysandre
239dd23f64 [Follow up 213]
Masked indices should have -1 and not -100. Updating documentation + scripts that were forgotten
2020-02-03 16:08:05 -05:00
Antonio Carlos Falcão Petri
2ba147ecff Fix typo in examples/utils_ner.py
"%s-%d".format() -> "{}-{}".format()
2020-02-01 11:10:57 -05:00
Lysandre
d18d47be67 run_generation style 2020-01-31 12:05:48 -05:00
Lysandre
7365f01d43 do_sample should be set to True in run_generation.py 2020-01-31 11:49:32 -05:00
Jared Nielsen
71a382319f Correct documentation 2020-01-30 18:41:24 -05:00
Hang Le
f0a4fc6cd6 Add Flaubert 2020-01-30 10:04:18 -05:00
Jared Nielsen
adb8c93134 Remove lines causing a KeyError 2020-01-29 14:01:16 -05:00
Lysandre
335dd5e68a Default save steps 50 to 500 in all scripts 2020-01-28 09:42:11 -05:00
Julien Chaumond
6b4c3ee234 [run_lm_finetuning] GPT2 tokenizer doesn't have a pad_token
ping @lysandrejik
2020-01-27 20:14:02 -05:00
VictorSanh
1ce3fb5cc7 update correct eval metrics (distilbert & co) 2020-01-24 11:45:22 -05:00
Julien Chaumond
1a8e87be4e Line-by-line text dataset (including padding) 2020-01-21 16:57:38 -05:00
Julien Chaumond
b94cf7faac change order 2020-01-21 16:57:38 -05:00
Julien Chaumond
2eaa8b6e56 Easier to not support this, as it could be confusing
cc @lysandrejik
2020-01-21 16:57:38 -05:00
Julien Chaumond
801aaa5508 make style 2020-01-21 16:57:38 -05:00
Julien Chaumond
56d4ba8ddb [run_lm_finetuning] Train from scratch 2020-01-21 16:57:38 -05:00
jiyeon_baek
6d5049a24d Fix typo in examples/run_squad.py
Rul -> Run
2020-01-17 11:22:51 -05:00
Lysandre
6e2c28a14a Run SQuAD warning when the doc stride may be too high 2020-01-16 13:59:26 -05:00
thomwolf
258ed2eaa8 adding details in readme 2020-01-16 13:21:30 +01:00
thomwolf
50ee59578d update formating - make flake8 happy 2020-01-16 13:21:30 +01:00
thomwolf
1c9333584a formating 2020-01-16 13:21:30 +01:00
thomwolf
e25b6fe354 updating readme 2020-01-16 13:21:30 +01:00
thomwolf
27c7b99015 adding details in readme - moving file 2020-01-16 13:21:30 +01:00
Nafise Sadat Moosavi
99d4515572 HANS evaluation 2020-01-16 13:21:30 +01:00
Julien Chaumond
83a41d39b3 💄 super 2020-01-15 18:33:50 -05:00
Julien Chaumond
715fa638a7 Merge branch 'master' into from_scratch_training 2020-01-14 18:58:21 +00:00
Julien Chaumond
b803b067bf Config to Model mapping 2020-01-13 20:05:20 +00:00
IWillPull
a3085020ed Added repetition penalty to PPLM example (#2436)
* Added repetition penalty

* Default PPLM repetition_penalty to neutral

* Minor modifications to comply with reviewer's suggestions. (j -> token_idx)

* Formatted code with `make style`
2020-01-10 23:00:07 -05:00
VictorSanh
e83d9f1c1d cleaning - change ' to " (black requirements) 2020-01-10 19:34:25 -05:00
VictorSanh
ebba9e929d minor spring cleaning - missing configs + processing 2020-01-10 19:14:58 -05:00
Victor SANH
331065e62d missing import 2020-01-10 11:42:53 +01:00
Victor SANH
414e9e7122 indents test 2020-01-10 11:42:53 +01:00
Victor SANH
3cdb38a7c0 indents 2020-01-10 11:42:53 +01:00
Victor SANH
ebd45980a0 Align with run_squad + fix some errors 2020-01-10 11:42:53 +01:00
Victor SANH
45634f87f8 fix Sampler in distributed training - evaluation 2020-01-10 11:42:53 +01:00
Victor SANH
af1ee9e648 Move torch.nn.utils.clip_grad_norm_ 2020-01-10 11:42:53 +01:00
Lysandre
164c794eb3 New SQuAD API for distillation script 2020-01-10 11:42:53 +01:00
Lysandre
16ce15ed4b DistilBERT token type ids removed from inputs in run_squad 2020-01-08 13:18:30 +01:00
Lysandre Debut
f24232cd1b Fix error with global step in run_squad.py 2020-01-08 11:39:00 +01:00
Oren Amsalem
43114b89ba spelling correction (#2434) 2020-01-07 17:25:25 +01:00
Lysandre Debut
27c1b656cc Fix error with global step in run_lm_finetuning.py 2020-01-07 16:16:12 +01:00
Simone Primarosa
176d3b3079 Add support for Albert and XLMRoberta for the Glue example (#2403)
* Add support for Albert and XLMRoberta for the Glue example
2020-01-07 14:55:55 +01:00
alberduris
81d6841b4b GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
alberduris
dd4df80f0b Moved the encoded_prompts to correct device 2020-01-06 15:11:12 +01:00
karajan1001
f01b3e6680 fix #2399 an ImportError in official example (#2400)
* fix #2399 an ImportError in official example

* style

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-01-05 12:50:20 -05:00
Julien Chaumond
629b22adcf [run_lm_finetuning] mask_tokens: document types 2020-01-01 12:55:10 -05:00
Thomas Wolf
0412f3d929
Merge pull request #2291 from aaugustin/fix-flake8-F841
Fix F841 flake8 warning
2019-12-25 22:37:42 +01:00
Aymeric Augustin
a8d34e534e Remove [--editable] in install instructions.
Use -e only in docs targeted at contributors.

If a user copy-pastes  command line with [--editable], they will hit
an error. If they don't know the --editable option, we're giving them
a choice to make before they can move forwards, but this isn't a choice
they need to make right now.
2019-12-24 08:46:08 +01:00
Aymeric Augustin
81422c4e6d Remove unused variables in examples. 2019-12-23 22:29:02 +01:00
Aymeric Augustin
c3783399db Remove redundant requirements with transformers. 2019-12-23 19:17:27 +01:00
Aymeric Augustin
9fc8dcb2a0 Standardize import.
Every other file uses this pattern.
2019-12-23 18:45:42 +01:00
Aymeric Augustin
1c62e87b34 Use built-in open().
On Python 3, `open is io.open`.
2019-12-22 18:38:56 +01:00
Aymeric Augustin
d6eaf4e6d2 Update comments mentioning Python 2. 2019-12-22 18:38:56 +01:00
Aymeric Augustin
75a23d24af Remove import fallbacks. 2019-12-22 18:38:56 +01:00
Aymeric Augustin
798b3b3899 Remove sys.version_info[0] == 2 or 3. 2019-12-22 18:38:42 +01:00
Aymeric Augustin
6b2200fc88 Remove u-prefixes. 2019-12-22 17:47:54 +01:00
Aymeric Augustin
c824d15aa1 Remove __future__ imports. 2019-12-22 17:47:54 +01:00
Aymeric Augustin
7e98e211f0 Remove unittest.main() in test modules.
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
2019-12-22 14:42:03 +01:00
Aymeric Augustin
ced0a94204 Switch test files to the standard test_*.py scheme. 2019-12-22 14:15:13 +01:00
Aymeric Augustin
c11b3e2926 Sort imports for optional third-party libraries.
These libraries aren't always installed in the virtual environment where
isort is running. Declaring them properly avoids mixing these
third-party imports with local imports.
2019-12-22 11:19:13 +01:00
Aymeric Augustin
939148b050 Fix F401 flake8 warning (x28).
Do manually what autoflake couldn't manage.
2019-12-22 10:59:08 +01:00
Aymeric Augustin
783a616999 Fix F401 flake8 warning (x88 / 116).
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
2019-12-22 10:59:08 +01:00
Aymeric Augustin
80327a13ea Fix F401 flake8 warning (x152 / 268).
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
2019-12-22 10:59:08 +01:00
Aymeric Augustin
fa2ccbc081 Fix E266 flake8 warning (x90). 2019-12-22 10:59:08 +01:00
Aymeric Augustin
2ab78325f0 Fix F821 flake8 warning (x47).
Ignore warnings related to Python 2, because it's going away soon.
2019-12-22 10:59:07 +01:00
Aymeric Augustin
631be27078 Fix E722 flake8 warnings (x26). 2019-12-22 10:59:07 +01:00
Aymeric Augustin
b0f7db73cd Fix E741 flake8 warning (x14). 2019-12-22 10:59:07 +01:00
Aymeric Augustin
fd2f17a7a1 Fix E714 flake8 warning (x8). 2019-12-22 10:59:07 +01:00
Aymeric Augustin
5eab3cf6bc Fix W605 flake8 warning (x5). 2019-12-22 10:59:07 +01:00
Aymeric Augustin
7dce8dc7ac Fix E731 flake8 warning (x3). 2019-12-22 10:59:07 +01:00
Aymeric Augustin
357db7098c Fix E712 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
Aymeric Augustin
f9c5317db2 Fix E265 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
Aymeric Augustin
28e608a2c2 Remove trailing whitespace from all Python files.
Fixes flake8 warning W291 (x224).
2019-12-22 10:59:07 +01:00
Aymeric Augustin
158e82e061 Sort imports with isort.
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
2019-12-22 10:57:46 +01:00
Aymeric Augustin
fa84ae26d6 Reformat source code with black.
This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.
2019-12-21 17:52:29 +01:00
Thomas Wolf
73f6e9817c
Merge pull request #2115 from suvrat96/add_mmbt_model
[WIP] Add MMBT Model to Transformers Repo
2019-12-21 15:26:08 +01:00
thomwolf
344126fe58 move example to mm-imdb folder 2019-12-21 15:06:52 +01:00
Thomas Wolf
5b7fb6a4a1
Merge pull request #2134 from bkkaggle/saving-and-resuming
closes #1960 Add saving and resuming functionality for remaining examples
2019-12-21 15:03:53 +01:00
Thomas Wolf
6f68d559ab
Merge pull request #2130 from huggingface/ignored-index-coherence
[BREAKING CHANGE] Setting all ignored index to the PyTorch standard
2019-12-21 14:55:40 +01:00
thomwolf
1ab25c49d3 Merge branch 'master' into pr/2115 2019-12-21 14:54:30 +01:00
thomwolf
b03872aae0 fix merge 2019-12-21 14:49:54 +01:00
Thomas Wolf
518ba748e0
Merge branch 'master' into saving-and-resuming 2019-12-21 14:41:39 +01:00
Thomas Wolf
18601c3b6e
Merge pull request #2173 from erenup/master
run_squad with roberta
2019-12-21 14:33:16 +01:00
Thomas Wolf
eeb70cdd77
Merge branch 'master' into saving-and-resuming 2019-12-21 14:29:59 +01:00
Thomas Wolf
ed9b84816e
Merge pull request #1840 from huggingface/generation_sampler
[WIP] Sampling sequence generator for transformers
2019-12-21 14:27:35 +01:00
thomwolf
cfa0380515 Merge branch 'master' into generation_sampler 2019-12-21 14:12:52 +01:00
thomwolf
300ec3003c fixing run_generation example - using torch.no_grad 2019-12-21 14:02:19 +01:00
thomwolf
1c37746892 fixing run_generation 2019-12-21 13:52:49 +01:00
thomwolf
8a2be93b4e fix merge 2019-12-21 13:31:28 +01:00
Thomas Wolf
562f864038
Merge branch 'master' into fix-xlnet-squad2.0 2019-12-21 12:48:10 +01:00
Thomas Wolf
59941c5d1f
Merge pull request #2189 from stefan-it/xlmr
Add support for XLM-RoBERTa
2019-12-20 13:26:38 +01:00
Julien Chaumond
a5a06a851e [doc] Param name consistency 2019-12-19 16:24:20 -05:00
Aidan Kierans
1718fb9e74 Minor/basic text fixes (#2229)
* Small clarification

Matches line 431 to line 435 for additional clarity and consistency.

* Fixed minor typo

The letter "s" was previously omitted from the word "docstrings".
2019-12-19 16:23:18 -05:00
Francesco
62c1fc3c1e Removed duplicate XLMConfig, XLMForQuestionAnswering and XLMTokenizer from import statement of run_squad.py script 2019-12-19 09:50:56 -05:00
Ejar
284572efc0 Updated typo on the link
Updated documentation due to typo
2019-12-19 09:36:43 -05:00
Stefan Schweter
a26ce4dee1 examples: add XLM-RoBERTa to glue script 2019-12-19 02:23:01 +01:00
thomwolf
3d2096f516 further cleanup 2019-12-18 11:50:54 +01:00
thomwolf
83bc5235cf Merge branch 'master' into pr/2189 2019-12-17 11:47:32 +01:00
Thomas Wolf
f061606277
Merge pull request #2164 from huggingface/cleanup-configs
[SMALL BREAKING CHANGE] Cleaning up configuration classes - Adding Model Cards
2019-12-17 09:10:16 +01:00
Lysandre
18a879f475 fix #2180 2019-12-16 16:44:29 -05:00
Lysandre
d803409215 Fix run squad evaluate during training 2019-12-16 16:31:38 -05:00
Stefan Schweter
71b4750517 examples: add support for XLM-RoBERTa to run_ner script 2019-12-16 16:37:27 +01:00
thomwolf
dc667ce1a7 double check cc @LysandreJik 2019-12-14 09:56:27 +01:00
thomwolf
7140363e09 update bertabs 2019-12-14 09:44:53 +01:00
Thomas Wolf
a52d56c8d9
Merge branch 'master' into cleanup-configs 2019-12-14 09:43:07 +01:00
erenup
c7780700f5 Merge branch 'refs/heads/squad_roberta'
# Conflicts:
#	transformers/data/processors/squad.py
2019-12-14 08:53:59 +08:00
erenup
8e9526b4b5 add multiple processing 2019-12-14 08:43:58 +08:00
Lysandre
c8ed1c82c8 [SQUAD] Load checkpoint when evaluating without training 2019-12-13 12:13:48 -05:00
Pierric Cistac
5a5c4349e8
Fix summarization to_cpu doc 2019-12-13 10:02:33 -05:00
thomwolf
47f0e3cfb7 cleaning up configuration classes 2019-12-13 14:33:24 +01:00
erenup
9b312f9d41 initial version for roberta squad 2019-12-13 14:51:40 +08:00
LysandreJik
7296f1010b Cleanup squad and add allow train_file and predict_file usage 2019-12-12 13:01:04 -05:00
LysandreJik
3fd71c4431 Update example scripts 2019-12-12 12:08:54 -05:00
Alan deLevie
fbf5455a86 Fix typo in examples/run_glue.py args declaration.
deay -> decay
2019-12-12 11:16:19 -05:00
Bilal Khan
6aa919469d Update run_xnli to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
Bilal Khan
89896fe04f Update run_ner to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
Bilal Khan
fdc05cd68f Update run_squad to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
Bilal Khan
854ec5784e Update run_glue to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:30:36 -06:00
LysandreJik
b72f9d340e Correct index in script 2019-12-10 18:33:17 -05:00
LysandreJik
6a73382706 Complete warning + cleanup 2019-12-10 14:33:24 -05:00
Lysandre
dc4e9e5cb3 DataParallel for SQuAD + fix XLM 2019-12-10 19:21:20 +00:00
Rémi Louf
07bc8efbc3 add greedy decoding and sampling 2019-12-10 17:27:50 +01:00
Rémi Louf
4b82c485de remove misplaced summarization documentation 2019-12-10 09:13:33 -05:00
Thomas Wolf
e57d00ee10
Merge pull request #1984 from huggingface/squad-refactor
[WIP] Squad refactor
2019-12-10 11:07:26 +01:00
Suvrat Bhooshan
df3961121f Add MMBT Model to Transformers Repo 2019-12-09 18:36:48 -08:00
Julien Chaumond
1d18930462 Harmonize no_cuda flag with other scripts 2019-12-09 20:37:55 -05:00
Rémi Louf
f7eba09007 clean for release 2019-12-09 20:37:55 -05:00
Rémi Louf
2a64107e44 improve device usage 2019-12-09 20:37:55 -05:00
Rémi Louf
c0707a85d2 add README 2019-12-09 20:37:55 -05:00
Rémi Louf
ade3cdf5ad integrate ROUGE 2019-12-09 20:37:55 -05:00
Rémi Louf
076602bdc4 prevent BERT weights from being downloaded twice 2019-12-09 20:37:55 -05:00
Rémi Louf
a1994a71ee simplified model and configuration 2019-12-09 20:37:55 -05:00
Rémi Louf
3a9a9f7861 default output dir to documents dir 2019-12-09 20:37:55 -05:00
Rémi Louf
693606a75c update the docs 2019-12-09 20:37:55 -05:00
Rémi Louf
2403a66598 give transformers API to BertAbs 2019-12-09 20:37:55 -05:00
Rémi Louf
ba089c780b share pretrained embeddings 2019-12-09 20:37:55 -05:00
Rémi Louf
9660ba1cbd Add beam search 2019-12-09 20:37:55 -05:00
Rémi Louf
1c71ecc880 load the pretrained weights for encoder-decoder
We currently save the pretrained_weights of the encoder and decoder in
two separate directories `encoder` and `decoder`. However, for the
`from_pretrained` function to operate with automodels we need to
specify the type of model in the path to the weights.

The path to the encoder/decoder weights is handled by the
`PreTrainedEncoderDecoder` class in the `save_pretrained` function. Sice
there is no easy way to infer the type of model that was initialized for
the encoder and decoder we add a parameter `model_type` to the function.
This is not an ideal solution as it is error prone, and the model type
should be carried by the Model classes somehow.

This is a temporary fix that should be changed before merging.
2019-12-09 20:37:55 -05:00
Rémi Louf
07f4cd73f6 update function to add special tokens
Since I started my PR the `add_special_token_single_sequence` function
has been deprecated for another; I replaced it with the new function.
2019-12-09 20:37:55 -05:00
Bilal Khan
79526f82f5 Remove unnecessary epoch variable 2019-12-09 16:24:35 -05:00
Bilal Khan
9626e0458c Add functionality to continue training from last saved global_step 2019-12-09 16:24:35 -05:00
Bilal Khan
2d73591a18 Stop saving current epoch 2019-12-09 16:24:35 -05:00
Bilal Khan
0eb973b0d9 Use saved optimizer and scheduler states if available 2019-12-09 16:24:35 -05:00
Bilal Khan
a03fcf570d Save tokenizer after each epoch to be able to resume training from a checkpoint 2019-12-09 16:24:35 -05:00
Bilal Khan
f71b1bb05a Save optimizer state, scheduler state and current epoch 2019-12-09 16:24:35 -05:00
LysandreJik
2a4ef098d6 Add ALBERT and XLM to SQuAD script 2019-12-09 10:46:47 -05:00
Lysandre Debut
00c4e39581
Merge branch 'master' into squad-refactor 2019-12-09 10:41:15 -05:00
Thomas Wolf
5482822a2b
Merge pull request #2046 from jplu/tf2-ner-example
Add NER TF2 example.
2019-12-06 12:12:22 +01:00
LysandreJik
e9217da5ff Cleanup
Improve global visibility on the run_squad script, remove unused files and fixes related to XLNet.
2019-12-05 16:01:51 -05:00
LysandreJik
9ecd83dace Patch evaluation for impossible values + cleanup 2019-12-05 14:44:57 -05:00
VictorSanh
35ff345fc9 update requirements 2019-12-05 12:07:04 -05:00
VictorSanh
552c44a9b1 release distilm-bert 2019-12-05 10:14:58 -05:00
Rosanne Liu
ee53de7aac Pr for pplm (#2060)
* license

* changes

* ok

* Update paper link and commands to run

* pointer to uber repo
2019-12-05 09:20:07 -05:00
Julien Plu
9200a759d7 Add few tests on the TF optimization file with some info in the documentation. Complete the README. 2019-12-05 12:56:43 +01:00
thomwolf
75a97af6bc fix #1450 - add doc 2019-12-05 11:26:55 +01:00
LysandreJik
f7e4a7cdfa Cleanup 2019-12-04 16:24:15 -05:00
LysandreJik
cca75e7884 Kill the demon spawn 2019-12-04 15:42:29 -05:00
LysandreJik
9ddc3f1a12 Naming update + XLNet/XLM evaluation 2019-12-04 10:37:00 -05:00
thomwolf
5bfcd0485e fix #1991 2019-12-04 14:53:11 +01:00
Julien Plu
ecb923da9c Create a NER example similar to the Pytorch one. It takes the same options, and can be run the same way. 2019-12-04 09:43:15 +01:00
LysandreJik
de276de1c1 Working evaluation 2019-12-03 17:15:51 -05:00
Julien Chaumond
7edb51f3a5 [pplm] split classif head into its own file 2019-12-03 22:07:25 +00:00
VictorSanh
48cbf267c9 Use full dataset for eval (SequentialSampler in Distributed setting) 2019-12-03 11:01:37 -05:00
Julien Chaumond
f434bfc623 [pplm] Update S3 links
Co-Authored-By: Piero Molino <w4nderlust@gmail.com>
2019-12-03 10:53:02 -05:00
Ethan Perez
96e83506d1 Always use SequentialSampler during evaluation
When evaluating, shouldn't we always use the SequentialSampler instead of DistributedSampler? Evaluation only runs on 1 GPU no matter what, so if you use the DistributedSampler with N GPUs, I think you'll only evaluate on 1/N of the evaluation set. That's at least what I'm finding when I run an older/modified version of this repo.
2019-12-03 10:15:39 -05:00
Julien Chaumond
3b48806f75 [pplm] README: add setup + tweaks 2019-12-03 10:14:02 -05:00
Julien Chaumond
0cb2c90890 readme
Co-Authored-By: Rosanne Liu <mimosavvy@gmail.com>
2019-12-03 10:14:02 -05:00
Julien Chaumond
1efb2ae7fc [pplm] move scripts under examples/pplm/ 2019-12-03 10:14:02 -05:00
Piero Molino
a59fdd1627 generate_text_pplm now works with batch_size > 1 2019-12-03 10:14:02 -05:00
w4nderlust
893d0d64fe Changed order of some parameters to be more consistent. Identical results. 2019-12-03 10:14:02 -05:00
w4nderlust
f42816e7fc Added additional check for url and path in discriminator model params 2019-12-03 10:14:02 -05:00
w4nderlust
f10b925015 Imrpovements: model_path renamed pretrained_model, tokenizer loaded from pretrained_model, pretrained_model set to discriminator's when discrim is specified, sample = False by default but cli parameter introduced. To obtain identical samples call the cli with --sample 2019-12-03 10:14:02 -05:00
w4nderlust
75904dae66 Removed global variable device 2019-12-03 10:14:02 -05:00
piero
7fd54b55a3 Added support for generic discriminators 2019-12-03 10:14:02 -05:00
piero
b0eaff36e6 Added a +1 to epoch when saving weights 2019-12-03 10:14:02 -05:00
piero
611961ade7 Added tqdm to preprocessing 2019-12-03 10:14:02 -05:00
piero
afc7dcd94d Now run_pplm works on cpu. Identical output as before (when using gpu). 2019-12-03 10:14:02 -05:00