Daniel Stancl
1867d9a8d7
Add head_mask/decoder_head_mask for TF BART models ( #9639 )
...
* Add head_mask/decoder_head_mask for TF BART models
* Add head_mask and decoder_head_mask input arguments for TF BART-based
models as a TF counterpart to the PR #9569
* Add test_headmasking functionality to tests/test_modeling_tf_common.py
* TODO: Add a test to verify that we can get a gradient back for
importance score computation
* Remove redundant #TODO note
Remove redundant #TODO note from tests/test_modeling_tf_common.py
* Fix assertions
* Make style
* Fix ...Model input args and adjust one new test
* Add back head_mask and decoder_head_mask to BART-based ...Model
after the last commit
* Remove head_mask ande decoder_head_mask from input_dict
in TF test_train_pipeline_custom_model as these two have different
shape than other input args (Necessary for passing this test)
* Revert adding global_rng in test_modeling_tf_common.py
2021-01-26 03:50:00 -05:00
Yusuke Mori
cb73ab5a38
Fix broken links in the converting tf ckpt document ( #9791 )
...
* Fix broken links in the converting tf ckpt document
* Update docs/source/converting_tensorflow_models.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Reflect the review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-26 03:37:57 -05:00
Patrick von Platen
d94cc2f904
[Flaky Generation Tests] Make sure that no early stopping is happening for beam search ( #9794 )
...
* fix ci
* fix ci
* renaming
* fix dup line
2021-01-26 03:21:44 -05:00
Stas Bekman
0fdbf0850a
[PR/Issue templates] normalize, group, sort + add myself for deepspeed ( #9706 )
...
* normalize, group, sort + add myself for deepspeed
* new structure
* add ray
* typo
* more suggestions
* more suggestions
* white space
* Update .github/ISSUE_TEMPLATE/bug-report.md
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* add bullets
* sync
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* sync
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 21:09:01 -08:00
Sylvain Gugger
af41da5097
Fix style
2021-01-25 12:40:58 -05:00
Sylvain Gugger
caf4abf768
Auto-resume training from checkpoint ( #9776 )
...
* Auto-resume training from checkpoint
* Update examples/text-classification/run_glue.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Roll out to other examples
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-25 12:03:51 -05:00
Lysandre Debut
0f443436fb
Actual fix ( #9787 )
2021-01-25 11:12:07 -05:00
Stas Bekman
fac7cfb16a
[fsmt] onnx triu workaround ( #9738 )
...
* onnx triu workaround
* style
* working this time
* add test
* more efficient version
2021-01-25 08:57:37 -05:00
Sorami Hisamoto
626116b7d7
Fix a typo in Trainer.hyperparameter_search docstring ( #9762 )
...
`compute_objectie` => `compute_objective`
2021-01-25 06:40:03 -05:00
Kai Fricke
d63ab61525
Use object store to pass trainer object to Ray Tune ( #9749 )
2021-01-25 05:01:55 -05:00
Maria Janina Sarol
6312fed47d
Fix TFTrainer prediction output ( #9662 )
...
* Fix TFTrainer prediction output
* Update trainer_tf.py
* Fix TFTrainer prediction output
* Fix evaluation_loss update in TFTrainer
* Fix TFTrainer prediction output
2021-01-25 10:27:12 +01:00
Wilfried L. Bounsi
9152f16023
Fix broken [Open in Colab] links ( #9761 )
2021-01-23 15:11:46 +05:30
Stas Bekman
b7b7e5d049
token_type_ids isn't used ( #9736 )
2021-01-22 20:38:53 -08:00
Julien Plu
a449ffcbd2
Fix test ( #9755 )
2021-01-22 17:40:16 +01:00
Sylvain Gugger
82d46febeb
Add report_to
training arguments to control the reporting integrations used ( #9735 )
2021-01-22 10:34:34 -05:00
Sylvain Gugger
411c582109
Fixes to run_seq2seq and instructions ( #9734 )
...
* Fixes to run_seq2seq and instructions
* Add more defaults for summarization
2021-01-22 10:03:57 -05:00
Julien Plu
d7c31abf38
Fix some TF slow tests ( #9728 )
...
* Fix saved model tests + fix a graph issue in longformer
* Apply style
2021-01-22 14:50:46 +01:00
Stefan Schweter
08b22722c7
examples: fix XNLI url ( #9741 )
2021-01-22 18:13:52 +05:30
Sylvain Gugger
5f80c15ef5
Fix memory regression in Seq2Seq example ( #9713 )
...
* Fix memory regression in Seq2Seq example
* Fix test and properly deal with -100
* Easier condition with device safety
* Patch for MBartTokenzierFast
2021-01-21 12:05:46 -05:00
Julien Plu
a7dabfb3d1
Fix TF s2s models ( #9478 )
...
* Fix Seq2Seq models for serving
* Apply style
* Fix lonfgormer
* Fix mBart/Pegasus/Blenderbot
* Apply style
* Add a main intermediate layer
* Apply style
* Remove import
* Apply tf.function to Longformer
* Fix utils check_copy
* Update S2S template
* Fix BART + Blenderbot
* Fix BlenderbotSmall
* Fix BlenderbotSmall
* Fix BlenderbotSmall
* Fix MBart
* Fix Marian
* Fix Pegasus + template
* Apply style
* Fix common attributes test
* Forgot to fix the LED test
* Apply Patrick's comment on LED Decoder
2021-01-21 17:03:29 +01:00
Nicolas Patry
23e5a36ee6
Changing model default for TableQuestionAnsweringPipeline. ( #9729 )
...
* Changing model default for TableQuestionAnsweringPipeline.
- Discussion: https://discuss.huggingface.co/t/table-question-answering-is-not-an-available-task-under-pipeline/3284/6
* Updating slow tests that were out of sync.
2021-01-21 14:31:51 +01:00
Julien Plu
3f290e6c84
Fix mixed precision in TF models ( #9163 )
...
* Fix Gelu precision
* Fix gelu_fast
* Naming
* Fix usage and apply style
* add TF gelu approximate version
* add TF gelu approximate version
* add TF gelu approximate version
* Apply style
* Fix albert
* Remove the usage of the Activation layer
2021-01-21 07:00:11 -05:00
Suraj Patil
248fa1ae72
fix T5 head mask in model_parallel ( #9726 )
...
* fix head mask in model_parallel
* pass correct head mask
2021-01-21 12:16:14 +01:00
Patrick von Platen
ca422e3d7d
finish ( #9721 )
2021-01-21 05:17:13 -05:00
Patrick von Platen
c8ea582ed6
reduce led memory ( #9723 )
2021-01-21 05:16:15 -05:00
guillaume-be
fb36c273a2
Allow text generation for ProphetNetForCausalLM ( #9707 )
...
* Moved ProphetNetForCausalLM's parent initialization after config update
* Added unit tests for generation for ProphetNetForCausalLM
2021-01-21 11:13:38 +01:00
Lysandre Debut
910aa89671
Temporarily deactivate TPU tests while we work on fixing them ( #9720 )
2021-01-21 04:17:39 -05:00
Muennighoff
6a346f0358
fix typo ( #9708 )
...
* fix typo
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-01-21 13:51:01 +05:30
Stas Bekman
4a20b7c450
[trainer] no --deepspeed and --sharded_ddp together ( #9712 )
...
* no --deepspeed and --sharded_ddp together
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-20 16:50:21 -08:00
Sylvain Gugger
7acfa95afb
Add missing new line
2021-01-20 14:13:16 -05:00
Darigov Research
5a307ece82
Adds flashcards to Glossary & makes small corrections ( #8949 )
...
* fix: Makes small typo corrections & standardises glossary
* feat: Adds introduction & links to transformer flashcards
* feat: Adds attribution & adjustments requested in #8949
* feat: Adds flashcards to community.md
* refactor: Removes flashcards from glossary
2021-01-20 13:28:40 -05:00
Sylvain Gugger
3cd91e8162
Fix WAND_DISABLED test ( #9703 )
...
* Fix WAND_DISABLED test
* Remove duplicate import
* Make a test that actually works...
* Fix style
2021-01-20 12:30:24 -05:00
Sylvain Gugger
2a703773aa
Fix style
2021-01-20 12:17:40 -05:00
Stas Bekman
cd5565bed3
fix the backward for deepspeed ( #9705 )
2021-01-20 09:07:07 -08:00
Gunjan Chhablani
538245b0c2
Fix Trainer and Args to mention AdamW, not Adam. ( #9685 )
...
* Fix Trainer and Args to mention AdamW, not Adam.
* Update the docs for Training Arguments.
* Change arguments adamw_* to adam_*
* Fixed links to AdamW in TrainerArguments docs
* Fix line length in Training Args docs.
2021-01-20 11:59:31 -05:00
NielsRogge
88583d4958
Add notebook ( #9696 )
2021-01-20 10:19:26 -05:00
NielsRogge
d1370d29b1
Add DeBERTa head models ( #9691 )
...
* Add DebertaForMaskedLM, DebertaForTokenClassification, DebertaForQuestionAnswering
* Add docs and fix quality
* Fix Deberta not having pooler
2021-01-20 10:18:50 -05:00
Sylvain Gugger
a7b62fece5
Fix Funnel Transformer conversion script ( #9683 )
2021-01-20 09:50:20 -05:00
acul3
8940c7662d
Add t5 convert to transformers-cli ( #9654 )
...
* Update run_mlm.py
* add t5 model to transformers-cli convert
* update rum_mlm.py same as master
* update converting model docs
* update converting model docs
* Update convert.py
* Trigger notification
* update import sorted
* fix typo t5
2021-01-20 09:34:27 -05:00
Julien Plu
7251a4736d
Fix template ( #9697 )
2021-01-20 09:04:53 -05:00
Julien Plu
14042d560f
New TF embeddings (cleaner and faster) ( #9418 )
...
* Create new embeddings + add to BERT
* Add Albert
* Add DistilBert
* Add Albert + Electra + Funnel
* Add Longformer + Lxmert
* Add last models
* Apply style
* Update the template
* Remove unused imports
* Rename attribute
* Import embeddings in their own model file
* Replace word_embeddings per weight
* fix naming
* Fix Albert
* Fix Albert
* Fix Longformer
* Fix Lxmert Mobilebert and MPNet
* Fix copy
* Fix template
* Update the get weights function
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/electra/modeling_tf_electra.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* address Sylvain's comments
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-20 12:08:12 +01:00
Julien Plu
12f0d7e8e0
Fix label datatype in TF Trainer ( #9616 )
...
* Fix label datatype
* Apply style
2021-01-20 12:08:00 +01:00
Sylvain Gugger
76f36e183a
Add a community page to the docs ( #9682 )
2021-01-20 04:54:36 -05:00
Sylvain Gugger
582f516adb
Use datasets squad_v2 metric in run_qa ( #9677 )
2021-01-20 04:52:13 -05:00
LSinev
a98173cc45
make RepetitionPenaltyLogitsProcessor faster ( #9600 )
2021-01-20 10:23:01 +01:00
Sylvain Gugger
a1ad16a446
Restrain tokenizer.model_max_length default ( #9681 )
...
* Restrain tokenizer.model_max_length default
* Fix indent
2021-01-20 04:17:39 -05:00
Sylvain Gugger
7e662e6a3b
Fix model templates and use less than 119 chars ( #9684 )
...
* Fix model templates and use less than 119 chars
* Missing new line
2021-01-19 17:11:22 -05:00
Daniel Stancl
2ebbbf558c
Add separated decoder_head_mask for T5 Models ( #9634 )
...
* Add decoder_head_mask for PyTorch T5 model
* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration
* Slightly change the order of input args to be in accordance
with the convention from BART-based models introduced within the PR #9569 .
* Make style for modeling_t5.py
* Add decoder_head_mask for TF T5 models
* Separate head_mask and decoder_head_mask args in TF T5 models
* Slightly change the order of input args to follow convention
of BART-based models updated in PR #9569
* Update test_forward_signature tests/test_modeling_tf_common.py
w.r.t. the changed order of input args
* Add FutureWarnings for T5 and TFT5 models
* Add FutureWarnings for T5 and TFT5 models warning a user that
input argument `head_mask` was split into two arguments -
`head_mask` and `decoder_head_mask`
* Add default behaviour - `decoder_head_mask` is set to copy
`head_mask`
* Fix T5 modeling and FutureWarning
* Make proper usage of head_mask and decoder_head_mask
in cross_attention
* Fix conditions for raising FutureWarning
* Reformat FutureWarning in T5 modeling
* Refactor the warning message
2021-01-19 22:50:25 +01:00
Sylvain Gugger
e4c06ed664
New run_seq2seq script ( #9605 )
...
* New run_seq2seq script
* Add tests
* Mark as slow
* Update examples/seq2seq/run_seq2seq.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-01-19 15:22:17 -05:00
Julien Plu
fa876aee2a
Fix TF Flaubert and XLM ( #9661 )
...
* Fix Flaubert and XLM
* Fix Flaubert and XLM
* Apply style
2021-01-19 18:02:57 +01:00