Commit Graph

1536 Commits

Author SHA1 Message Date
Patrick von Platen
68a3215949
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:27:40 +03:00
Patrick von Platen
03df3fbcb4
Update FINE_TUNE_XLSR_WAV2VEC2.md 2021-03-19 00:26:49 +03:00
Patrick von Platen
e84adbed40
Add XLSR-Wav2Vec2 Fine-Tuning README.md (#10786)
* upload

* upload fine-tuning script

* improve

* adapt

* Apply suggestions from code review

* correct

* upload

* finalize

* remove @

* correct typos
2021-03-19 00:22:43 +03:00
Stas Bekman
9352b5151a
[examples/seq2seq/README.md] fix t5 examples (#10734)
* [examples/seq2seq] fix t5 examples

This PR:
* fixes T5 examples to include `--source_prefix` - it's **not** optional. If you give it a try you will see that you get 10x worse bleu scores w/o it. w/ `27.6849`, w/ `2.374`
* added a normal translation example w/o the peculiarities of MBart and T5
* reduces the default max samples to 50 so it's much faster to test quickly

summarization seems to be broken for t5 score-wise: https://github.com/huggingface/transformers/issues/10733

@sgugger

* specify explicitly the t5 models requiring the special handling

* one more

* update the t5 summarization example to use cnn_dailymail

* move max*samples into the top level README.md

* better wording

* better wording
2021-03-18 09:55:39 -07:00
Julien Chaumond
4f3e93cfaf
[file_utils] do not gobble certain kinds of requests.ConnectionError (#10235)
* do not gobble certain kinds of requests.ConnectionError

* Apply review comments

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-18 12:37:45 -04:00
Suraj Patil
5f19c07a70
add run_common_voice script (#10767)
* add initial script

* finish script

* add shell script example

* accept chars_to_ignor as cl arg

* align the script with other example scripts

* add torchaudio dep
2021-03-18 17:21:16 +05:30
Mohamed El-Geish
af8afdc88d
wav2vec2: support datasets other than LibriSpeech (#10581)
* wav2vec2: support datasets other than LibriSpeech

* Formatting run_asr.py to pass code quality test

* bundled orthography options and added verbose logs

* fixing a typo in timit fine-tuning script

* update comment for clarity

* resize_lm_head and load custom vocab from file

* adding a max_duration_in_seconds filter

* do not assign `duration_filter` lambda, use a def

* log untransliterated text as well

* fix base model for arabic

* fix duration filter when target_sr is not set

* drop duration_in_seconds when unneeded

* script for wav2vec2-large-lv60-timit-asr

* fix for "tha" in arabic corpus (huggingface#10581)

* adding more options to work with common_voice

* PR feedback (huggingface#10581)

* small README change
2021-03-18 10:20:26 +03:00
Stas Bekman
393739194e
[examples] document resuming (#10776)
* document resuming in examples

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* put trainer code last, adjust notes

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-17 12:48:35 -07:00
Stas Bekman
cd8c93f701
[DeepSpeed] improve checkpoint loading code plus tests (#10760)
* deepspeed checkpoint loading code plus tests

* style

* style
2021-03-17 10:22:58 -07:00
Cheng Li
c83fbc5f2d
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed (#10464)
* pass hf optimizer and scheduler to deepspeed if not specified in ds config

* pass hf optimizer and scheduler to deepspeed if not specified in ds config

* update

* make init_deepspeed support config dict

* fix docstring formatting

* clean up trainer's comments

* add new tests

* fix type

* composit argparse doesn't work

* style

* add a new test, rename others

* document new functionality

* complete tests, add docs

* style

* correct level

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add new methods to the doc

* must tell DS we are using a non-native optimizer

* add protection against cpu_offload + HF optimizer combo

* fix the cli overrides

* sync docs + tests

* restore AdamW

* better docs

* need new version

* no longer needed

* remove outdate information

* refactor duplicated code

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-16 15:51:09 -07:00
Lysandre
1b5ce1e63b Development on v4.5.0dev0 2021-03-16 11:41:15 -04:00
Lysandre
c988db5af2 Release v4.4.0 2021-03-16 11:33:35 -04:00
Russell Klopfer
87d685b8a9
independent training / eval with local files (#10710)
* independent training / eval with local files

* remove redundant assert
2021-03-15 19:35:26 -04:00
Sylvain Gugger
4c379daf64
Add minimum version check in examples (#10724)
* Add minimum version check in examples

* Style

* No need for new line maybe?

* Add helpful comment
2021-03-15 19:29:54 -04:00
Joe Davison
966ba081c9
zero-shot pipeline multi_class -> multi_label (#10727) 2021-03-15 16:02:46 -06:00
Théo Matussière
6f840990a7
split seq2seq script into summarization & translation (#10611)
* split seq2seq script, update docs

* needless diff

* fix readme

* remove test diff

* s/summarization/translation

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* cr

* fix arguments & better mbart/t5 refs

* copyright

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* reword readme

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* s/summarization/translation

* short script names

* fix tests

* fix isort, include mbart doc

* delete old script, update tests

* automate source prefix

* automate source prefix for translation

* s/translation/trans

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* fix script name (short version)

* typos

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* exact parameter

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* remove superfluous source_prefix calls in docs

* rename scripts & warn for source prefix

* black

* flake8

Co-authored-by: theo <theo@matussie.re>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-15 09:11:42 -04:00
Stas Bekman
4c32f9f26e
AdamW is now supported by default (#9624) 2021-03-12 13:40:07 -08:00
Lysandre Debut
9fbb4cdc80
Specify minimum version for sacrebleu (#10662) 2021-03-11 13:45:06 -05:00
ArvidYin
27d9e05ce2
Update README.md (#10647)
correct spell error: 'nether'
2021-03-11 08:58:04 -05:00
Sylvain Gugger
efb5c0a453
Add new GLUE example with no Trainer. (#10555)
* Add new GLUE example with no Trainer.

* Style

* Address review comments
2021-03-10 09:29:19 -05:00
Allen Wang
6f52fce673
Fixes an issue in text-classification where MNLI eval/test datasets are not being preprocessed. (#10621)
* Fix MNLI tests

* Linter fix
2021-03-09 22:13:45 -05:00
Sylvain Gugger
0d909f6bd8
Fairscale FSDP fix model save (#10596)
* Hotfix fairscale FSDP

* Evaluation works

* Save on process zero
2021-03-09 14:42:07 -05:00
Stas Bekman
f284089ec4
[examples tests on multigpu] resolving require_torch_non_multi_gpu_but_fix_me (#10561)
* batch 1

* this is tpu

* deebert attempt

* the rest
2021-03-08 11:11:40 -08:00
Bhadresh Savani
dfd16af832
Added max_sample_ arguments (#10551)
* reverted changes of logging and saving metrics

* added max_sample arguments

* fixed code

* white space diff

* reformetting code

* reformatted code
2021-03-08 13:57:10 -05:00
Stas Bekman
917f104502
[examples tests] various fixes (#10584)
* fix sharded ddp enum

* test fixes

* stronger validation + apex breaks other tests
2021-03-08 10:28:44 -08:00
Stas Bekman
e6ce636e02
fix nltk lookup (#10585) 2021-03-07 22:09:58 -08:00
Stas Bekman
88a951e3cc
offline mode for firewalled envs (#10407)
* offline mode start

* add specific values

* fix fallback

* add test

* better values check and range

* test that actually works

* document the offline mode

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more strict check

* cleaner test

* pt-only test

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-05 17:27:48 -08:00
Patrick von Platen
395ffcd757
fix run seq2seq (#10547) 2021-03-05 18:17:12 +03:00
Sylvain Gugger
a5bd40b75c
Not always consider a local model a checkpoint in run_glue (#10517) 2021-03-04 11:11:39 -05:00
Sylvain Gugger
745ea78dcc Revert "Not always consider a local model a checkpoint in run_glue"
This reverts commit f3660613bc.
2021-03-04 09:45:18 -05:00
Sylvain Gugger
f3660613bc Not always consider a local model a checkpoint in run_glue 2021-03-04 09:44:02 -05:00
Patrick von Platen
0234de8418
Add Fine-Tuning for Wav2Vec2 (#10145)
* add encode labels function to tokenizer

* start adding finetuning

* init dropout

* upload

* correct convert script

* apply changes

* fix second typo

* make first dummy training run

* adapt convert script

* push confg for comparison

* remove conf

* finish training

* adapt data collator

* add research folder

* update according to fairseq feedback

* some minor corrections

* refactor masking indices a bit

* some minor changes

* clean tokenizer

* finish clean-up

* remove previous logic

* update run script

* correct training

* finish changes

* finish model

* correct bug

* fix training a bit more

* add some tests

* finish gradient checkpointing

* finish example

* correct gradient checkpointing

* improve tokenization method

* revert changes in tokenizer

* revert general change

* adapt fine-tuning

* update

* save intermediate test

* Update README.md

* finish finetuning

* delete conversion script

* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* finish wav2vec2 script

* finish wav2vec2 fine-tuning

* finalize test

* correct test

* adapt tests

* finish

* remove test file

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-01 12:13:17 +03:00
Bhadresh Savani
aca6288ff4
updated logging and saving metrics (#10436)
* updated logging and saving metrics

* space removal
2021-02-27 09:53:44 -08:00
Stas Bekman
f52a15897b
[run_seq2seq.py] restore functionality: saving to test_generations.txt (#10428)
This PR restores the original functionality that for some reason was modified.

Fixes: https://github.com/huggingface/transformers/issues/10381

@sgugger
2021-02-27 08:21:50 -08:00
Stas Bekman
ee04b69822
[examples] better model example (#10427)
* refactors

* typo
2021-02-26 17:01:01 -08:00
Sylvain Gugger
17b6e0d474
Fix run_glue evaluation when model has a label correspondence (#10401) 2021-02-25 15:30:38 -05:00
Sylvain Gugger
9d14be5c20
Add support for ZeRO-2/3 and ZeRO-offload in fairscale (#10354)
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale

* Quality

* Rework from review comments

* Add doc

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-02-25 11:07:53 -05:00
Patrick von Platen
cb38ffcc5e
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324)
* push to show

* small improvement

* small improvement

* Update src/transformers/feature_extraction_utils.py

* Update src/transformers/feature_extraction_utils.py

* implement base

* add common tests

* make all tests pass for wav2vec2

* make padding work & add more tests

* finalize feature extractor utils

* add call method to feature extraction

* finalize feature processor

* finish tokenizer

* finish general processor design

* finish tests

* typo

* remove bogus file

* finish docstring

* add docs

* finish docs

* small fix

* correct docs

* save intermediate

* load changes

* apply changes

* apply changes to doc

* change tests

* apply surajs recommend

* final changes

* Apply suggestions from code review

* fix typo

* fix import

* correct docstring
2021-02-25 17:42:46 +03:00
Stas Bekman
3437d12134
[Trainer/Deepspeed] handle get_last_lr() before first step() (#10362)
* handle get_last_lr() before first step()

* abstract away the lr getting logic

* cleanup

* add test

* move to utils
2021-02-23 17:42:25 -08:00
Akmal
23e87c27be
Fix broken examples/seq2seq/README.md markdown (#10344) 2021-02-23 10:49:25 -05:00
Stas Bekman
622a8c5995
[trainer] add Trainer methods for metrics logging and saving (#10266)
* make logging and saving trainer built-in

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-22 13:02:53 -08:00
Stas Bekman
eab0afc19c
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310)
* implement gradient_accumulation_steps support in DeepSpeed integration

* typo

* cleanup

* cleanup
2021-02-22 11:15:59 -08:00
Stas Bekman
f991daed18
defensive programming + expand/correct README (#10295) 2021-02-22 10:58:50 -08:00
Julien Plu
536aee99bb
Move the TF NER example (#10276) 2021-02-19 16:06:13 -05:00
Joe Davison
cbadb5243c
Zero shot distillation script cuda patch (#10284) 2021-02-19 14:06:57 -05:00
Joe Davison
c6fe17557e
Script for distilling zero-shot classifier to more efficient student (#10244)
* add zero-shot distillation script

* readme wordsmithing

* clean up code

* add multi-gpu teacher inference
plus tidying up more code

* add use_fast_tokenizer arg

* update results in readme

* more readme wordsmithing

* style

* Add handle to readme

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix code block

* add error+docs about distributed & tpu

* add @sgugger format requests

* xla -> tpu

* support fp16 for teacher preds

* no checkpoint by default

* add demo colab link

* add model sharing prompt + model link

* correct resulting acc of example

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-02-18 17:08:45 -05:00
Stas Bekman
97e688bc22
[Trainer] memory tracker metrics (#10225)
* memory tracker metrics

* go back to eval for somewhat consistency

* handle no-gpu case

* deal with stackable eval calls

* restore callback order

* style

* simplify the API

* add test

* docs

* consistently use eval_ prefix

* improve docs

* Update src/transformers/trainer_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename method

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 09:27:32 -08:00
Stas Bekman
d1eb88f42d
[CI] 2 fixes (#10248)
* fix invalid port

* missing requirements
2021-02-17 14:12:39 -08:00
Zhang Cheng
df1b0fb54d
set tgt_lang of MBart Tokenizer for summarization (#10205) 2021-02-16 09:39:37 -05:00
Suraj Patil
1c8c2d9ab3
[WIP][examples/seq2seq] move old s2s scripts to legacy (#10136)
* move old s2s scripts to legacy

* add the tests back

* proper rename

* restore

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-15 10:48:02 -08:00
Stas Bekman
0b1f552a24
fix run_seq2seq.py; porting trainer tests to it (#10162)
* fix run_seq2seq.py; porting DeepSpeed tests to it

* unrefactor

* defensive programming

* defensive programming 2

* port the rest of the trainer tests

* style

* a cleaner scripts dir finder

* cleanup
2021-02-15 09:12:17 -08:00
Suraj Patil
f51188cbe7
[examples/run_s2s] remove task_specific_params and update rouge computation (#10133)
* fix rouge metrics and task specific params

* fix typo

* round metrics

* typo

* remove task_specific_params
2021-02-12 17:18:21 +05:30
Stas Bekman
b54cb0bd82
[DeepSpeed in notebooks] Jupyter + Colab (#10130)
* init devices/setup explicitly

* docs + test

* simplify

* cleanup

* cleanup

* cleanup

* correct the required dist setup

* derive local_rank from env LOCAL_RANK
2021-02-11 14:02:05 -08:00
Qbiwan
8dcfaea08d
Update run_xnli.py to use Datasets library (#9829)
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* fix

* fix

* fix

* push

* fix

* everything works

* fix init

* fix

* special treatment for sepconv1d

* style

* 🙏🏽

* add doc and cleanup


* fix doc

* fix doc again

* fix doc again

* Apply suggestions from code review

* make style

* Proposal that should work

* Remove needless code

* Fix test

* Apply suggestions from code review

* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* amend README

* removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README.

* removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset()

* removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README
2021-02-11 10:27:23 +05:30
Stas Bekman
77b862847b
[DeepSpeed] restore memory for evaluation (#10114)
* free up memory at the end of train

* rework tests

* consistent formatting

* correction
2021-02-10 09:09:48 -08:00
Lysandre Debut
0d8e554d42
Line endings should be LF across repo and not CRLF (#10119) 2021-02-10 10:50:00 -05:00
Boris Dayma
7c7962ba89
doc: update W&B related doc (#10086)
* doc: update W&B related doc

* doc(wandb): mention report_to

* doc(wandb): commit suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* doc(wandb): fix typo

* doc(wandb): remove WANDB_DISABLED

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-09 14:47:52 -05:00
Suraj Patil
63fddcf69c
[examples/s2s] add test set predictions (#10085)
* add do_predict, pass eval_beams durig eval

* update help

* apply suggestions from code review
2021-02-09 20:41:41 +05:30
Stas Bekman
781220acab
transition to new tests dir (#10080) 2021-02-08 12:41:52 -08:00
Stas Bekman
322037e842
[trainer] deepspeed bug fixes and tests (#10039)
* deepspeed bug fixes and tests

* manual wrap?
2021-02-08 09:44:02 -08:00
Olivier
ece6c51458
[s2s examples] Replace -100 token ids with the tokenizer pad_id for compute_metrics (#10046)
* replace -100 token ids with the tokenizer pad_id for compute_metrics

* fixed typo for label_ids
2021-02-08 10:08:16 -05:00
Sylvain Gugger
b01483faa0
Truncate max length if needed in all examples (#10034) 2021-02-08 05:03:55 -05:00
Stas Bekman
24db8cc329
Can't mix --fp16 and --device cpu (#10041) 2021-02-07 17:54:20 -08:00
Stas Bekman
769948fad2
json to jsonlines, and doc, and typo (#10043) 2021-02-07 17:51:34 -08:00
Stas Bekman
8ea412a86f
[examples] make run scripts executable (#10037)
* make executable

* make executable

* same for the template

* cleanup
2021-02-05 15:51:18 -08:00
Suraj Patil
1cd16512dc
[examples/seq2seq] support label smoothing (#9844)
* add prepare_decoder_input_ids_from_labels in s2s models

* support lbl smoothing and enc/emb freezing

* fix freezing

* use pad_token_id from config

* remove embed freezing and add warning

* prepare decoder_input_ids inside DataCollatorForSeq2Seq
2021-02-05 23:21:57 +05:30
Suraj Patil
bca0dd5ee3
[run_clm.py] fix getting extention 2021-02-03 20:14:42 +05:30
Stas Bekman
d55e10beab
[research proj] [lxmert] rm bleach dependency (#9970)
Looks like a vulnerability and it's not really used anywhere in the code, so just as well remove it completely from deps.
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/bleach/open
2021-02-03 05:24:40 -05:00
Patrick von Platen
538b3b4607
[Tokenizer Utils Base] Make pad function more flexible (#9928)
* change tokenizer requirement

* split line

* Correct typo from list to str

* improve style

* make other function pretty as well

* add comment

* correct typo

* add new test

* pass tests for tok without padding token

* Apply suggestions from code review
2021-02-02 10:35:27 +03:00
Sylvain Gugger
115d97dd2f
Remove subclass for sortish sampler (#9907)
* Remove subclass for sortish sampler

* Use old Seq2SeqTrainer in script

* Styling
2021-02-01 08:06:32 -05:00
wlhgtc
1682804ebd
Fit chinese wwm to new datasets (#9887)
* MOD: fit chinese wwm to new datasets

* MOD: move wwm to new folder

* MOD: formate code

* Styling

* MOD add param and recover trainer

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-02-01 03:37:59 -05:00
Stas Bekman
6bab83683b
fix logger format for non-main process (#9911) 2021-02-01 03:08:12 -05:00
Stas Bekman
6bf94bc0b6
correctly handle mt5 (#9879) 2021-01-29 08:11:22 -08:00
Sylvain Gugger
b4e559cfa1
Deprecate model_path in Trainer.train (#9854) 2021-01-28 08:32:46 -05:00
Sylvain Gugger
f2fabedbab
Setup logging with a stdout handler (#9816) 2021-01-27 03:39:11 -05:00
Yusuke Mori
059bb25817
Fix a bug in run_glue.py (#9812) (#9815) 2021-01-26 14:32:19 -05:00
Magdalena Biesialska
8f6c12d306
Fix fine-tuning translation scripts (#9809) 2021-01-26 11:30:31 -05:00
Andrea Cappelli
10e5f28212
Improve pytorch examples for fp16 (#9796)
* Pad to 8x for fp16 multiple choice example (#9752)

* Pad to 8x for fp16 squad trainer example (#9752)

* Pad to 8x for fp16 ner example (#9752)

* Pad to 8x for fp16 swag example (#9752)

* Pad to 8x for fp16 qa beam search example (#9752)

* Pad to 8x for fp16 qa example (#9752)

* Pad to 8x for fp16 seq2seq example (#9752)

* Pad to 8x for fp16 glue example (#9752)

* Pad to 8x for fp16 new ner example (#9752)

* update script template #9752

* Update examples/multiple-choice/run_swag.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa_beam_search.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve code quality #9752

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-26 04:47:07 -05:00
Sylvain Gugger
caf4abf768
Auto-resume training from checkpoint (#9776)
* Auto-resume training from checkpoint

* Update examples/text-classification/run_glue.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Roll out to other examples

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 12:03:51 -05:00
Wilfried L. Bounsi
9152f16023
Fix broken [Open in Colab] links (#9761) 2021-01-23 15:11:46 +05:30
Sylvain Gugger
411c582109
Fixes to run_seq2seq and instructions (#9734)
* Fixes to run_seq2seq and instructions

* Add more defaults for summarization
2021-01-22 10:03:57 -05:00
Stefan Schweter
08b22722c7
examples: fix XNLI url (#9741) 2021-01-22 18:13:52 +05:30
Sylvain Gugger
5f80c15ef5
Fix memory regression in Seq2Seq example (#9713)
* Fix memory regression in Seq2Seq example

* Fix test and properly deal with -100

* Easier condition with device safety

* Patch for MBartTokenzierFast
2021-01-21 12:05:46 -05:00
Sylvain Gugger
582f516adb
Use datasets squad_v2 metric in run_qa (#9677) 2021-01-20 04:52:13 -05:00
Sylvain Gugger
a1ad16a446
Restrain tokenizer.model_max_length default (#9681)
* Restrain tokenizer.model_max_length default

* Fix indent
2021-01-20 04:17:39 -05:00
Sylvain Gugger
e4c06ed664
New run_seq2seq script (#9605)
* New run_seq2seq script

* Add tests

* Mark as slow

* Update examples/seq2seq/run_seq2seq.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/data/data_collator.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/data/data_collator.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Address review comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-01-19 15:22:17 -05:00
Sylvain Gugger
97b787fb4e
Fix old Seq2SeqTrainer (#9675) 2021-01-19 09:56:25 -05:00
Stas Bekman
c60e0e1ee4
deepspeed + grad acumm (#9622) 2021-01-15 10:12:26 -08:00
Sylvain Gugger
329fe2746a
Upstream (and rename) sortish sampler (#9574)
* Upstream (and rename) sortish sampler

* Use proper sampler

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-14 10:38:14 -05:00
Sylvain Gugger
46ed56cfd1
Switch metrics in run_ner to datasets (#9567)
* Switch metrics in run_ner to datasets

* Add flag to return all metrics

* Upstream (and rename) sortish_sampler

* Revert "Upstream (and rename) sortish_sampler"

This reverts commit e07d0dcf65.
2021-01-14 03:37:07 -05:00
Yusuke Mori
eabad8fd9c
Update run_glue for do_predict with local test data (#9442) (#9486)
* Update run_glue for do_predict with local test data (#9442)

* Update run_glue (#9442): fix comments ('files' to 'a file')

* Update run_glue (#9442): reflect the code review

* Update run_glue (#9442): auto format

* Update run_glue (#9442): reflect the code review
2021-01-13 07:48:35 -05:00
Pavel Tarashkevich
27d0e01d75
Fix classification script: enable dynamic padding with truncation (#9554)
Co-authored-by: Pavel Tarashkevich <Pavel.Tarashkievich@orange.com>
2021-01-13 07:46:48 -05:00
Stas Bekman
2df34f4aba
[trainer] deepspeed integration (#9211)
* deepspeed integration

* style

* add test

* ds wants to do its own backward

* fp16 assert

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* for clarity extract what args are being passed to deepspeed

* introduce the concept of self.wrapped_model

* s/self.wrapped_model/self.model_wrapped/

* complete transition to self.wrapped_model / self.model

* fix

* doc

* give ds its own init

* add custom overrides, handle bs correctly

* fix test

* clean up model_init logic, fix small bug

* complete fix

* collapse --deepspeed_config into --deepspeed

* style

* start adding doc notes

* style

* implement hf2ds optimizer and scheduler configuration remapping

* oops

* call get_num_training_steps absolutely when needed

* workaround broken auto-formatter

* deepspeed_config arg is no longer needed - fixed in deepspeed master

* use hf's fp16 args in config

* clean

* start on the docs

* rebase cleanup

* finish up --fp16

* clarify the supported stages

* big refactor thanks to discovering deepspeed.init_distributed

* cleanup

* revert fp16 part

* add checkpoint-support

* more init ds into integrations

* extend docs

* cleanup

* unfix docs

* clean up old code

* imports

* move docs

* fix logic

* make it clear which file it's referring to

* document nodes/gpus

* style

* wrong format

* style

* deepspeed handles gradient clipping

* easier to read

* major doc rewrite

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* docs

* switch to AdamW optimizer

* style

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* clarify doc

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 19:05:18 -08:00
Sylvain Gugger
3ec40299c1
Remove nested lxmert (#9440) 2021-01-07 04:10:41 -05:00
Sylvain Gugger
453a70d4cb
Allow example to use a revision and work with private models (#9407)
* Allow example to use a revision and work with private models

* Copy to other examples and template

* Styling
2021-01-06 06:49:23 -05:00
Patrick von Platen
eef66035a2
[PyTorch Bart] Split Bart into different models (#9343)
* first try

* remove old template

* finish bart

* finish mbart

* delete unnecessary line

* init pegasus

* save intermediate

* correct pegasus

* finish pegasus

* remove cookie cutter leftover

* add marian

* finish blenderbot

* replace in file

* correctly split blenderbot

* delete "old" folder

* correct "add statement"

* adapt config for tf comp

* correct configs for tf

* remove ipdb

* fix more stuff

* fix mbart

* push pegasus fix

* fix mbart

* more fixes

* fix research projects code

* finish docs for bart, mbart, and marian

* delete unnecessary file

* correct attn typo

* correct configs

* remove pegasus for seq class

* correct peg docs

* correct peg docs

* finish configs

* further improve docs

* add copied from statements to mbart

* fix copied from in mbart

* add copy statements to marian

* add copied from to marian

* add pegasus copied from

* finish pegasus

* finish copied from

* Apply suggestions from code review

* make style

* backward comp blenderbot

* apply lysandres and sylvains suggestions

* apply suggestions

* push last fixes

* fix docs

* fix tok tests

* fix imports code style

* fix doc
2021-01-05 22:00:05 +01:00
Yusuke Mori
57a6626929
[examples/text-classification] Fix a bug for using one's own dataset of a regression task (#9411) 2021-01-05 08:15:06 -05:00
dependabot[bot]
5dd389d1c7
Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/lxmert (#9402)
Bumps [notebook](https://github.com/jupyter/jupyterhub) from 6.1.4 to 6.1.5.
- [Release notes](https://github.com/jupyter/jupyterhub/releases)
- [Changelog](https://github.com/jupyterhub/jupyterhub/blob/master/CHECKLIST-Release.md)
- [Commits](https://github.com/jupyter/jupyterhub/commits)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-04 10:02:07 -05:00
Sylvain Gugger
23a71449c0
Put back LXMert example (#9401) 2021-01-04 09:59:07 -05:00
Sam Shleifer
8eb7f26d5d
simplify marian distillation script (#9394) 2021-01-04 11:21:24 +05:30