Patrick von Platen
68a3215949
Update FINE_TUNE_XLSR_WAV2VEC2.md
2021-03-19 00:27:40 +03:00
Patrick von Platen
03df3fbcb4
Update FINE_TUNE_XLSR_WAV2VEC2.md
2021-03-19 00:26:49 +03:00
Patrick von Platen
e84adbed40
Add XLSR-Wav2Vec2 Fine-Tuning README.md ( #10786 )
...
* upload
* upload fine-tuning script
* improve
* adapt
* Apply suggestions from code review
* correct
* upload
* finalize
* remove @
* correct typos
2021-03-19 00:22:43 +03:00
Stas Bekman
9352b5151a
[examples/seq2seq/README.md] fix t5 examples ( #10734 )
...
* [examples/seq2seq] fix t5 examples
This PR:
* fixes T5 examples to include `--source_prefix` - it's **not** optional. If you give it a try you will see that you get 10x worse bleu scores w/o it. w/ `27.6849`, w/ `2.374`
* added a normal translation example w/o the peculiarities of MBart and T5
* reduces the default max samples to 50 so it's much faster to test quickly
summarization seems to be broken for t5 score-wise: https://github.com/huggingface/transformers/issues/10733
@sgugger
* specify explicitly the t5 models requiring the special handling
* one more
* update the t5 summarization example to use cnn_dailymail
* move max*samples into the top level README.md
* better wording
* better wording
2021-03-18 09:55:39 -07:00
Julien Chaumond
4f3e93cfaf
[file_utils] do not gobble certain kinds of requests.ConnectionError ( #10235 )
...
* do not gobble certain kinds of requests.ConnectionError
* Apply review comments
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-18 12:37:45 -04:00
Suraj Patil
5f19c07a70
add run_common_voice script ( #10767 )
...
* add initial script
* finish script
* add shell script example
* accept chars_to_ignor as cl arg
* align the script with other example scripts
* add torchaudio dep
2021-03-18 17:21:16 +05:30
Mohamed El-Geish
af8afdc88d
wav2vec2: support datasets other than LibriSpeech ( #10581 )
...
* wav2vec2: support datasets other than LibriSpeech
* Formatting run_asr.py to pass code quality test
* bundled orthography options and added verbose logs
* fixing a typo in timit fine-tuning script
* update comment for clarity
* resize_lm_head and load custom vocab from file
* adding a max_duration_in_seconds filter
* do not assign `duration_filter` lambda, use a def
* log untransliterated text as well
* fix base model for arabic
* fix duration filter when target_sr is not set
* drop duration_in_seconds when unneeded
* script for wav2vec2-large-lv60-timit-asr
* fix for "tha" in arabic corpus (huggingface#10581)
* adding more options to work with common_voice
* PR feedback (huggingface#10581)
* small README change
2021-03-18 10:20:26 +03:00
Stas Bekman
393739194e
[examples] document resuming ( #10776 )
...
* document resuming in examples
* fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* put trainer code last, adjust notes
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-17 12:48:35 -07:00
Stas Bekman
cd8c93f701
[DeepSpeed] improve checkpoint loading code plus tests ( #10760 )
...
* deepspeed checkpoint loading code plus tests
* style
* style
2021-03-17 10:22:58 -07:00
Cheng Li
c83fbc5f2d
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed ( #10464 )
...
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* update
* make init_deepspeed support config dict
* fix docstring formatting
* clean up trainer's comments
* add new tests
* fix type
* composit argparse doesn't work
* style
* add a new test, rename others
* document new functionality
* complete tests, add docs
* style
* correct level
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add new methods to the doc
* must tell DS we are using a non-native optimizer
* add protection against cpu_offload + HF optimizer combo
* fix the cli overrides
* sync docs + tests
* restore AdamW
* better docs
* need new version
* no longer needed
* remove outdate information
* refactor duplicated code
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-16 15:51:09 -07:00
Lysandre
1b5ce1e63b
Development on v4.5.0dev0
2021-03-16 11:41:15 -04:00
Lysandre
c988db5af2
Release v4.4.0
2021-03-16 11:33:35 -04:00
Russell Klopfer
87d685b8a9
independent training / eval with local files ( #10710 )
...
* independent training / eval with local files
* remove redundant assert
2021-03-15 19:35:26 -04:00
Sylvain Gugger
4c379daf64
Add minimum version check in examples ( #10724 )
...
* Add minimum version check in examples
* Style
* No need for new line maybe?
* Add helpful comment
2021-03-15 19:29:54 -04:00
Joe Davison
966ba081c9
zero-shot pipeline multi_class -> multi_label ( #10727 )
2021-03-15 16:02:46 -06:00
Théo Matussière
6f840990a7
split seq2seq script into summarization & translation ( #10611 )
...
* split seq2seq script, update docs
* needless diff
* fix readme
* remove test diff
* s/summarization/translation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* cr
* fix arguments & better mbart/t5 refs
* copyright
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* reword readme
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* s/summarization/translation
* short script names
* fix tests
* fix isort, include mbart doc
* delete old script, update tests
* automate source prefix
* automate source prefix for translation
* s/translation/trans
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* fix script name (short version)
* typos
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* exact parameter
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* remove superfluous source_prefix calls in docs
* rename scripts & warn for source prefix
* black
* flake8
Co-authored-by: theo <theo@matussie.re>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-03-15 09:11:42 -04:00
Stas Bekman
4c32f9f26e
AdamW is now supported by default ( #9624 )
2021-03-12 13:40:07 -08:00
Lysandre Debut
9fbb4cdc80
Specify minimum version for sacrebleu ( #10662 )
2021-03-11 13:45:06 -05:00
ArvidYin
27d9e05ce2
Update README.md ( #10647 )
...
correct spell error: 'nether'
2021-03-11 08:58:04 -05:00
Sylvain Gugger
efb5c0a453
Add new GLUE example with no Trainer. ( #10555 )
...
* Add new GLUE example with no Trainer.
* Style
* Address review comments
2021-03-10 09:29:19 -05:00
Allen Wang
6f52fce673
Fixes an issue in text-classification
where MNLI eval/test datasets are not being preprocessed. ( #10621 )
...
* Fix MNLI tests
* Linter fix
2021-03-09 22:13:45 -05:00
Sylvain Gugger
0d909f6bd8
Fairscale FSDP fix model save ( #10596 )
...
* Hotfix fairscale FSDP
* Evaluation works
* Save on process zero
2021-03-09 14:42:07 -05:00
Stas Bekman
f284089ec4
[examples tests on multigpu] resolving require_torch_non_multi_gpu_but_fix_me ( #10561 )
...
* batch 1
* this is tpu
* deebert attempt
* the rest
2021-03-08 11:11:40 -08:00
Bhadresh Savani
dfd16af832
Added max_sample_ arguments ( #10551 )
...
* reverted changes of logging and saving metrics
* added max_sample arguments
* fixed code
* white space diff
* reformetting code
* reformatted code
2021-03-08 13:57:10 -05:00
Stas Bekman
917f104502
[examples tests] various fixes ( #10584 )
...
* fix sharded ddp enum
* test fixes
* stronger validation + apex breaks other tests
2021-03-08 10:28:44 -08:00
Stas Bekman
e6ce636e02
fix nltk lookup ( #10585 )
2021-03-07 22:09:58 -08:00
Stas Bekman
88a951e3cc
offline mode for firewalled envs ( #10407 )
...
* offline mode start
* add specific values
* fix fallback
* add test
* better values check and range
* test that actually works
* document the offline mode
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* more strict check
* cleaner test
* pt-only test
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-05 17:27:48 -08:00
Patrick von Platen
395ffcd757
fix run seq2seq ( #10547 )
2021-03-05 18:17:12 +03:00
Sylvain Gugger
a5bd40b75c
Not always consider a local model a checkpoint in run_glue ( #10517 )
2021-03-04 11:11:39 -05:00
Sylvain Gugger
745ea78dcc
Revert "Not always consider a local model a checkpoint in run_glue"
...
This reverts commit f3660613bc
.
2021-03-04 09:45:18 -05:00
Sylvain Gugger
f3660613bc
Not always consider a local model a checkpoint in run_glue
2021-03-04 09:44:02 -05:00
Patrick von Platen
0234de8418
Add Fine-Tuning for Wav2Vec2 ( #10145 )
...
* add encode labels function to tokenizer
* start adding finetuning
* init dropout
* upload
* correct convert script
* apply changes
* fix second typo
* make first dummy training run
* adapt convert script
* push confg for comparison
* remove conf
* finish training
* adapt data collator
* add research folder
* update according to fairseq feedback
* some minor corrections
* refactor masking indices a bit
* some minor changes
* clean tokenizer
* finish clean-up
* remove previous logic
* update run script
* correct training
* finish changes
* finish model
* correct bug
* fix training a bit more
* add some tests
* finish gradient checkpointing
* finish example
* correct gradient checkpointing
* improve tokenization method
* revert changes in tokenizer
* revert general change
* adapt fine-tuning
* update
* save intermediate test
* Update README.md
* finish finetuning
* delete conversion script
* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py
* Update src/transformers/models/wav2vec2/processing_wav2vec2.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* finish wav2vec2 script
* finish wav2vec2 fine-tuning
* finalize test
* correct test
* adapt tests
* finish
* remove test file
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-01 12:13:17 +03:00
Bhadresh Savani
aca6288ff4
updated logging and saving metrics ( #10436 )
...
* updated logging and saving metrics
* space removal
2021-02-27 09:53:44 -08:00
Stas Bekman
f52a15897b
[run_seq2seq.py] restore functionality: saving to test_generations.txt ( #10428 )
...
This PR restores the original functionality that for some reason was modified.
Fixes: https://github.com/huggingface/transformers/issues/10381
@sgugger
2021-02-27 08:21:50 -08:00
Stas Bekman
ee04b69822
[examples] better model example ( #10427 )
...
* refactors
* typo
2021-02-26 17:01:01 -08:00
Sylvain Gugger
17b6e0d474
Fix run_glue evaluation when model has a label correspondence ( #10401 )
2021-02-25 15:30:38 -05:00
Sylvain Gugger
9d14be5c20
Add support for ZeRO-2/3 and ZeRO-offload in fairscale ( #10354 )
...
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale
* Quality
* Rework from review comments
* Add doc
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-02-25 11:07:53 -05:00
Patrick von Platen
cb38ffcc5e
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer ( #10324 )
...
* push to show
* small improvement
* small improvement
* Update src/transformers/feature_extraction_utils.py
* Update src/transformers/feature_extraction_utils.py
* implement base
* add common tests
* make all tests pass for wav2vec2
* make padding work & add more tests
* finalize feature extractor utils
* add call method to feature extraction
* finalize feature processor
* finish tokenizer
* finish general processor design
* finish tests
* typo
* remove bogus file
* finish docstring
* add docs
* finish docs
* small fix
* correct docs
* save intermediate
* load changes
* apply changes
* apply changes to doc
* change tests
* apply surajs recommend
* final changes
* Apply suggestions from code review
* fix typo
* fix import
* correct docstring
2021-02-25 17:42:46 +03:00
Stas Bekman
3437d12134
[Trainer/Deepspeed] handle get_last_lr() before first step() ( #10362 )
...
* handle get_last_lr() before first step()
* abstract away the lr getting logic
* cleanup
* add test
* move to utils
2021-02-23 17:42:25 -08:00
Akmal
23e87c27be
Fix broken examples/seq2seq/README.md markdown ( #10344 )
2021-02-23 10:49:25 -05:00
Stas Bekman
622a8c5995
[trainer] add Trainer methods for metrics logging and saving ( #10266 )
...
* make logging and saving trainer built-in
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-22 13:02:53 -08:00
Stas Bekman
eab0afc19c
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration ( #10310 )
...
* implement gradient_accumulation_steps support in DeepSpeed integration
* typo
* cleanup
* cleanup
2021-02-22 11:15:59 -08:00
Stas Bekman
f991daed18
defensive programming + expand/correct README ( #10295 )
2021-02-22 10:58:50 -08:00
Julien Plu
536aee99bb
Move the TF NER example ( #10276 )
2021-02-19 16:06:13 -05:00
Joe Davison
cbadb5243c
Zero shot distillation script cuda patch ( #10284 )
2021-02-19 14:06:57 -05:00
Joe Davison
c6fe17557e
Script for distilling zero-shot classifier to more efficient student ( #10244 )
...
* add zero-shot distillation script
* readme wordsmithing
* clean up code
* add multi-gpu teacher inference
plus tidying up more code
* add use_fast_tokenizer arg
* update results in readme
* more readme wordsmithing
* style
* Add handle to readme
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix code block
* add error+docs about distributed & tpu
* add @sgugger format requests
* xla -> tpu
* support fp16 for teacher preds
* no checkpoint by default
* add demo colab link
* add model sharing prompt + model link
* correct resulting acc of example
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-02-18 17:08:45 -05:00
Stas Bekman
97e688bc22
[Trainer] memory tracker metrics ( #10225 )
...
* memory tracker metrics
* go back to eval for somewhat consistency
* handle no-gpu case
* deal with stackable eval calls
* restore callback order
* style
* simplify the API
* add test
* docs
* consistently use eval_ prefix
* improve docs
* Update src/transformers/trainer_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* rename method
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 09:27:32 -08:00
Stas Bekman
d1eb88f42d
[CI] 2 fixes ( #10248 )
...
* fix invalid port
* missing requirements
2021-02-17 14:12:39 -08:00
Zhang Cheng
df1b0fb54d
set tgt_lang of MBart Tokenizer for summarization ( #10205 )
2021-02-16 09:39:37 -05:00
Suraj Patil
1c8c2d9ab3
[WIP][examples/seq2seq] move old s2s scripts to legacy ( #10136 )
...
* move old s2s scripts to legacy
* add the tests back
* proper rename
* restore
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-15 10:48:02 -08:00
Stas Bekman
0b1f552a24
fix run_seq2seq.py; porting trainer tests to it ( #10162 )
...
* fix run_seq2seq.py; porting DeepSpeed tests to it
* unrefactor
* defensive programming
* defensive programming 2
* port the rest of the trainer tests
* style
* a cleaner scripts dir finder
* cleanup
2021-02-15 09:12:17 -08:00
Suraj Patil
f51188cbe7
[examples/run_s2s] remove task_specific_params and update rouge computation ( #10133 )
...
* fix rouge metrics and task specific params
* fix typo
* round metrics
* typo
* remove task_specific_params
2021-02-12 17:18:21 +05:30
Stas Bekman
b54cb0bd82
[DeepSpeed in notebooks] Jupyter + Colab ( #10130 )
...
* init devices/setup explicitly
* docs + test
* simplify
* cleanup
* cleanup
* cleanup
* correct the required dist setup
* derive local_rank from env LOCAL_RANK
2021-02-11 14:02:05 -08:00
Qbiwan
8dcfaea08d
Update run_xnli.py to use Datasets library ( #9829 )
...
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric
* fix
* fix
* fix
* push
* fix
* everything works
* fix init
* fix
* special treatment for sepconv1d
* style
* 🙏🏽
* add doc and cleanup
* fix doc
* fix doc again
* fix doc again
* Apply suggestions from code review
* make style
* Proposal that should work
* Remove needless code
* Fix test
* Apply suggestions from code review
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric
* amend README
* removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README.
* removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset()
* removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README
2021-02-11 10:27:23 +05:30
Stas Bekman
77b862847b
[DeepSpeed] restore memory for evaluation ( #10114 )
...
* free up memory at the end of train
* rework tests
* consistent formatting
* correction
2021-02-10 09:09:48 -08:00
Lysandre Debut
0d8e554d42
Line endings should be LF across repo and not CRLF ( #10119 )
2021-02-10 10:50:00 -05:00
Boris Dayma
7c7962ba89
doc: update W&B related doc ( #10086 )
...
* doc: update W&B related doc
* doc(wandb): mention report_to
* doc(wandb): commit suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* doc(wandb): fix typo
* doc(wandb): remove WANDB_DISABLED
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-09 14:47:52 -05:00
Suraj Patil
63fddcf69c
[examples/s2s] add test set predictions ( #10085 )
...
* add do_predict, pass eval_beams durig eval
* update help
* apply suggestions from code review
2021-02-09 20:41:41 +05:30
Stas Bekman
781220acab
transition to new tests dir ( #10080 )
2021-02-08 12:41:52 -08:00
Stas Bekman
322037e842
[trainer] deepspeed bug fixes and tests ( #10039 )
...
* deepspeed bug fixes and tests
* manual wrap?
2021-02-08 09:44:02 -08:00
Olivier
ece6c51458
[s2s examples] Replace -100 token ids with the tokenizer pad_id for compute_metrics ( #10046 )
...
* replace -100 token ids with the tokenizer pad_id for compute_metrics
* fixed typo for label_ids
2021-02-08 10:08:16 -05:00
Sylvain Gugger
b01483faa0
Truncate max length if needed in all examples ( #10034 )
2021-02-08 05:03:55 -05:00
Stas Bekman
24db8cc329
Can't mix --fp16 and --device cpu ( #10041 )
2021-02-07 17:54:20 -08:00
Stas Bekman
769948fad2
json to jsonlines, and doc, and typo ( #10043 )
2021-02-07 17:51:34 -08:00
Stas Bekman
8ea412a86f
[examples] make run scripts executable ( #10037 )
...
* make executable
* make executable
* same for the template
* cleanup
2021-02-05 15:51:18 -08:00
Suraj Patil
1cd16512dc
[examples/seq2seq] support label smoothing ( #9844 )
...
* add prepare_decoder_input_ids_from_labels in s2s models
* support lbl smoothing and enc/emb freezing
* fix freezing
* use pad_token_id from config
* remove embed freezing and add warning
* prepare decoder_input_ids inside DataCollatorForSeq2Seq
2021-02-05 23:21:57 +05:30
Suraj Patil
bca0dd5ee3
[run_clm.py] fix getting extention
2021-02-03 20:14:42 +05:30
Stas Bekman
d55e10beab
[research proj] [lxmert] rm bleach dependency ( #9970 )
...
Looks like a vulnerability and it's not really used anywhere in the code, so just as well remove it completely from deps.
https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/bleach/open
2021-02-03 05:24:40 -05:00
Patrick von Platen
538b3b4607
[Tokenizer Utils Base] Make pad function more flexible ( #9928 )
...
* change tokenizer requirement
* split line
* Correct typo from list to str
* improve style
* make other function pretty as well
* add comment
* correct typo
* add new test
* pass tests for tok without padding token
* Apply suggestions from code review
2021-02-02 10:35:27 +03:00
Sylvain Gugger
115d97dd2f
Remove subclass for sortish sampler ( #9907 )
...
* Remove subclass for sortish sampler
* Use old Seq2SeqTrainer in script
* Styling
2021-02-01 08:06:32 -05:00
wlhgtc
1682804ebd
Fit chinese wwm to new datasets ( #9887 )
...
* MOD: fit chinese wwm to new datasets
* MOD: move wwm to new folder
* MOD: formate code
* Styling
* MOD add param and recover trainer
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-02-01 03:37:59 -05:00
Stas Bekman
6bab83683b
fix logger format for non-main process ( #9911 )
2021-02-01 03:08:12 -05:00
Stas Bekman
6bf94bc0b6
correctly handle mt5 ( #9879 )
2021-01-29 08:11:22 -08:00
Sylvain Gugger
b4e559cfa1
Deprecate model_path in Trainer.train ( #9854 )
2021-01-28 08:32:46 -05:00
Sylvain Gugger
f2fabedbab
Setup logging with a stdout handler ( #9816 )
2021-01-27 03:39:11 -05:00
Yusuke Mori
059bb25817
Fix a bug in run_glue.py ( #9812 ) ( #9815 )
2021-01-26 14:32:19 -05:00
Magdalena Biesialska
8f6c12d306
Fix fine-tuning translation scripts ( #9809 )
2021-01-26 11:30:31 -05:00
Andrea Cappelli
10e5f28212
Improve pytorch examples for fp16 ( #9796 )
...
* Pad to 8x for fp16 multiple choice example (#9752 )
* Pad to 8x for fp16 squad trainer example (#9752 )
* Pad to 8x for fp16 ner example (#9752 )
* Pad to 8x for fp16 swag example (#9752 )
* Pad to 8x for fp16 qa beam search example (#9752 )
* Pad to 8x for fp16 qa example (#9752 )
* Pad to 8x for fp16 seq2seq example (#9752 )
* Pad to 8x for fp16 glue example (#9752 )
* Pad to 8x for fp16 new ner example (#9752 )
* update script template #9752
* Update examples/multiple-choice/run_swag.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa_beam_search.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve code quality #9752
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-26 04:47:07 -05:00
Sylvain Gugger
caf4abf768
Auto-resume training from checkpoint ( #9776 )
...
* Auto-resume training from checkpoint
* Update examples/text-classification/run_glue.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Roll out to other examples
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 12:03:51 -05:00
Wilfried L. Bounsi
9152f16023
Fix broken [Open in Colab] links ( #9761 )
2021-01-23 15:11:46 +05:30
Sylvain Gugger
411c582109
Fixes to run_seq2seq and instructions ( #9734 )
...
* Fixes to run_seq2seq and instructions
* Add more defaults for summarization
2021-01-22 10:03:57 -05:00
Stefan Schweter
08b22722c7
examples: fix XNLI url ( #9741 )
2021-01-22 18:13:52 +05:30
Sylvain Gugger
5f80c15ef5
Fix memory regression in Seq2Seq example ( #9713 )
...
* Fix memory regression in Seq2Seq example
* Fix test and properly deal with -100
* Easier condition with device safety
* Patch for MBartTokenzierFast
2021-01-21 12:05:46 -05:00
Sylvain Gugger
582f516adb
Use datasets squad_v2 metric in run_qa ( #9677 )
2021-01-20 04:52:13 -05:00
Sylvain Gugger
a1ad16a446
Restrain tokenizer.model_max_length default ( #9681 )
...
* Restrain tokenizer.model_max_length default
* Fix indent
2021-01-20 04:17:39 -05:00
Sylvain Gugger
e4c06ed664
New run_seq2seq script ( #9605 )
...
* New run_seq2seq script
* Add tests
* Mark as slow
* Update examples/seq2seq/run_seq2seq.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-01-19 15:22:17 -05:00
Sylvain Gugger
97b787fb4e
Fix old Seq2SeqTrainer ( #9675 )
2021-01-19 09:56:25 -05:00
Stas Bekman
c60e0e1ee4
deepspeed + grad acumm ( #9622 )
2021-01-15 10:12:26 -08:00
Sylvain Gugger
329fe2746a
Upstream (and rename) sortish sampler ( #9574 )
...
* Upstream (and rename) sortish sampler
* Use proper sampler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-14 10:38:14 -05:00
Sylvain Gugger
46ed56cfd1
Switch metrics in run_ner to datasets ( #9567 )
...
* Switch metrics in run_ner to datasets
* Add flag to return all metrics
* Upstream (and rename) sortish_sampler
* Revert "Upstream (and rename) sortish_sampler"
This reverts commit e07d0dcf65
.
2021-01-14 03:37:07 -05:00
Yusuke Mori
eabad8fd9c
Update run_glue for do_predict with local test data ( #9442 ) ( #9486 )
...
* Update run_glue for do_predict with local test data (#9442 )
* Update run_glue (#9442 ): fix comments ('files' to 'a file')
* Update run_glue (#9442 ): reflect the code review
* Update run_glue (#9442 ): auto format
* Update run_glue (#9442 ): reflect the code review
2021-01-13 07:48:35 -05:00
Pavel Tarashkevich
27d0e01d75
Fix classification script: enable dynamic padding with truncation ( #9554 )
...
Co-authored-by: Pavel Tarashkevich <Pavel.Tarashkievich@orange.com>
2021-01-13 07:46:48 -05:00
Stas Bekman
2df34f4aba
[trainer] deepspeed integration ( #9211 )
...
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 19:05:18 -08:00
Sylvain Gugger
3ec40299c1
Remove nested lxmert ( #9440 )
2021-01-07 04:10:41 -05:00
Sylvain Gugger
453a70d4cb
Allow example to use a revision and work with private models ( #9407 )
...
* Allow example to use a revision and work with private models
* Copy to other examples and template
* Styling
2021-01-06 06:49:23 -05:00
Patrick von Platen
eef66035a2
[PyTorch Bart] Split Bart into different models ( #9343 )
...
* first try
* remove old template
* finish bart
* finish mbart
* delete unnecessary line
* init pegasus
* save intermediate
* correct pegasus
* finish pegasus
* remove cookie cutter leftover
* add marian
* finish blenderbot
* replace in file
* correctly split blenderbot
* delete "old" folder
* correct "add statement"
* adapt config for tf comp
* correct configs for tf
* remove ipdb
* fix more stuff
* fix mbart
* push pegasus fix
* fix mbart
* more fixes
* fix research projects code
* finish docs for bart, mbart, and marian
* delete unnecessary file
* correct attn typo
* correct configs
* remove pegasus for seq class
* correct peg docs
* correct peg docs
* finish configs
* further improve docs
* add copied from statements to mbart
* fix copied from in mbart
* add copy statements to marian
* add copied from to marian
* add pegasus copied from
* finish pegasus
* finish copied from
* Apply suggestions from code review
* make style
* backward comp blenderbot
* apply lysandres and sylvains suggestions
* apply suggestions
* push last fixes
* fix docs
* fix tok tests
* fix imports code style
* fix doc
2021-01-05 22:00:05 +01:00
Yusuke Mori
57a6626929
[examples/text-classification] Fix a bug for using one's own dataset of a regression task ( #9411 )
2021-01-05 08:15:06 -05:00
dependabot[bot]
5dd389d1c7
Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/lxmert ( #9402 )
...
Bumps [notebook](https://github.com/jupyter/jupyterhub ) from 6.1.4 to 6.1.5.
- [Release notes](https://github.com/jupyter/jupyterhub/releases )
- [Changelog](https://github.com/jupyterhub/jupyterhub/blob/master/CHECKLIST-Release.md )
- [Commits](https://github.com/jupyter/jupyterhub/commits )
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-04 10:02:07 -05:00
Sylvain Gugger
23a71449c0
Put back LXMert example ( #9401 )
2021-01-04 09:59:07 -05:00
Sam Shleifer
8eb7f26d5d
simplify marian distillation script ( #9394 )
2021-01-04 11:21:24 +05:30