Patrick von Platen
b39bd763e8
Update README.md
2021-01-19 12:25:51 +01:00
Sergey Mkrtchyan
917dbb15e0
Fix DPRReaderTokenizer's attention_mask ( #9663 )
...
* Fix the attention_mask in DPRReaderTokenizer
* Add an integration test for DPRReader inference
* Run make style
2021-01-19 05:43:11 -05:00
Patrick von Platen
12c1b5b8f4
fix test ( #9669 )
2021-01-19 09:06:24 +01:00
Daniel Stancl
357fb1c5d8
Add head_mask/decoder_head_mask for BART ( #9569 )
...
* Add head_mask/decoder_head_mask for BART
This branch implement head_mask and decoder_head_mask
for BART-based models. Full list below:
- BART
- MBart
- Blenderbot
- BlenderbotSmall
- Marian
- Pegasus
Everything is accompanied with updated testing.
* Fix test_headmasking for BART models
* Fix text_headmasking for BART-like models
which has only 2 layers in each modules.
The condition
```
self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0)
```
is, therefore, invalid for encoder-decoder models considering
the `head_mask`
```
head_mask = torch.ones(
self.model_tester.num_hidden_layers,
self.model_tester.num_attention_heads,
device=torch_device,
)
head_mask[0, 0] = 0
head_mask[-1, :-1] = 0
```
specified in the `test_headmasking` test/function.
* Adjust test_modeling_common.py to reflect T5 input args
* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
* make fix-copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-18 13:35:22 +01:00
Devrim
65eb5d9ac5
Fix: torch.utils.checkpoint import error. ( #9626 )
2021-01-18 04:33:39 -05:00
Anthony MOI
72fc9abf17
Remove duplicated extra["retrieval"] ( #9621 )
2021-01-18 04:24:21 -05:00
Stas Bekman
c60e0e1ee4
deepspeed + grad acumm ( #9622 )
2021-01-15 10:12:26 -08:00
Lysandre Debut
6d3b688b04
Ignore lm_head decoder bias warning ( #9615 )
...
* Ignore lm_head decoder bias warning
* Revert "Ignore lm_head decoder bias warning"
This reverts commit f25177a9da
.
* predictions -> lm_head
2021-01-15 09:40:21 -05:00
Julien Plu
8eba1f8ca8
Remove unused token_type_ids in MPNet ( #9564 )
...
* Add warning
* Remove unused import
* Fix missing call
* Fix missing call
* Completely remove token_type_ids
* Apply style
* Remove unused import
* Update src/transformers/models/mpnet/modeling_tf_mpnet.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-15 08:06:29 -05:00
Patrick von Platen
90ca8d36e9
[TF Led] Fix wrong decoder attention mask behavior ( #9601 )
...
* fix tf led
* remove loop file
2021-01-15 06:40:27 -05:00
Kiyoung Kim
85788bae5c
Revert "Gradient accumulation for TFTrainer ( #9585 )"
...
This reverts commit 3f40070c88
.
2021-01-15 10:47:01 +01:00
Stas Bekman
82498cbc37
[deepspeed doc] install issues + 1-gpu deployment ( #9582 )
...
* [doc] install + 1-gpu deployment
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improvements
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-14 11:05:04 -08:00
Sylvain Gugger
329fe2746a
Upstream (and rename) sortish sampler ( #9574 )
...
* Upstream (and rename) sortish sampler
* Use proper sampler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-14 10:38:14 -05:00
Kiyoung Kim
3f40070c88
Gradient accumulation for TFTrainer ( #9585 )
...
* gradient accumulation for tftrainer
* label naming
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* label naming
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-14 10:16:39 -05:00
Lysandre
e43f3b6190
v4.2.1 in docs
2021-01-14 14:25:30 +01:00
Lysandre Debut
280db79ac1
BatchEncoding.to with device with tests ( #9584 )
2021-01-14 07:57:58 -05:00
Lysandre Debut
8bf27075a2
Fix conda build ( #9589 )
...
* conda build -> conda-build
* Syntax error
* conda build -> conda-build + 4.2.0
* Prepare to merge in `master`
2021-01-14 05:51:52 -05:00
Stas Bekman
c99751dd9d
[setup.py] note on how to get to transformers exact dependencies from shell ( #9553 )
...
* note on how to get to deps from shell
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix text
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-14 05:04:08 -05:00
Julien Plu
a26536f0c8
Make logs tf compliant ( #9565 )
2021-01-14 04:56:53 -05:00
Julien Plu
14d677ca4a
Compliancy with tf-nightly ( #9570 )
...
* Compliancy with tf-nightly
* Add more version + restore min version check
2021-01-14 04:35:35 -05:00
Sylvain Gugger
46ed56cfd1
Switch metrics in run_ner to datasets ( #9567 )
...
* Switch metrics in run_ner to datasets
* Add flag to return all metrics
* Upstream (and rename) sortish_sampler
* Revert "Upstream (and rename) sortish_sampler"
This reverts commit e07d0dcf65
.
2021-01-14 03:37:07 -05:00
Sylvain Gugger
5e1bea4f16
Fix Trainer with a parallel model ( #9578 )
...
* Fix Trainer with a parallel model
* More clean up
2021-01-14 03:23:41 -05:00
Patrick von Platen
126fd281bc
Update README.md
2021-01-13 16:55:59 +01:00
Lysandre
e63cad7936
v4.3.0.dev0
2021-01-13 16:16:54 +01:00
Lysandre
33a8497db8
v4.2.0 documentation
2021-01-13 16:15:40 +01:00
Lysandre
7d9a9d0c72
Release: v4.2.0
2021-01-13 16:01:51 +01:00
Lysandre Debut
c949516695
Fix slow tests v4.2.0 ( #9561 )
...
* Fix conversational pipeline test
* LayoutLM
* ProphetNet
* BART
* Blenderbot & small
* Marian
* mBART
* Pegasus
* Tapas tokenizer
* BERT2BERT test
* Style
* Example requirements
* TF BERT2BERT test
2021-01-13 09:55:48 -05:00
Sylvain Gugger
04dc65e5c6
Fix data parallelism in Trainer ( #9566 )
...
* Fix data parallelism in Trainer
* Update src/transformers/training_args.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-13 09:54:41 -05:00
Stas Bekman
b2dfcc567b
use correct deps for torchhub ( #9552 )
2021-01-13 08:02:53 -05:00
Yusuke Mori
eabad8fd9c
Update run_glue for do_predict with local test data ( #9442 ) ( #9486 )
...
* Update run_glue for do_predict with local test data (#9442 )
* Update run_glue (#9442 ): fix comments ('files' to 'a file')
* Update run_glue (#9442 ): reflect the code review
* Update run_glue (#9442 ): auto format
* Update run_glue (#9442 ): reflect the code review
2021-01-13 07:48:35 -05:00
LSinev
0c9f01a8e5
Speed up TopKLogitsWarper and TopPLogitsWarper (pytorch) ( #9557 )
...
* make TopKLogitsWarper faster
* make TopPLogitsWarper faster
2021-01-13 07:47:47 -05:00
Pavel Tarashkevich
27d0e01d75
Fix classification script: enable dynamic padding with truncation ( #9554 )
...
Co-authored-by: Pavel Tarashkevich <Pavel.Tarashkievich@orange.com>
2021-01-13 07:46:48 -05:00
Lysandre Debut
245cdb469d
Fix barthez tokenizer ( #9562 )
2021-01-13 06:24:10 -05:00
Julien Chaumond
247a7b2029
Doc: Update pretrained_models wording ( #9545 )
...
* Update pretrained_models.rst
To clarify things cf. this tweet for instance https://twitter.com/RTomMcCoy/status/1349094111505211395
* format
2021-01-13 05:58:05 -05:00
Suraj Patil
69ed36063a
fix BlenderbotSmallTokenizer ( #9538 )
...
* add model_input_names
* fix test
2021-01-13 10:53:43 +05:30
Stas Bekman
2df34f4aba
[trainer] deepspeed integration ( #9211 )
...
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 19:05:18 -08:00
Sylvain Gugger
5f6721032a
Use the right version of tokenizers ( #9550 )
...
* Use the right version of tokenizers
* Try another way
* Try another way
* Deps are installed from there...
* Deps are installed from there...
* Revert last
* remove needless comment
2021-01-12 18:55:45 -05:00
Sylvain Gugger
063d8d27f4
Refactor prepare_seq2seq_batch
( #9524 )
...
* Add target contextmanager and rework prepare_seq2seq_batch
* Fix tests, treat BART and Barthez
* Add last tokenizers
* Fix test
* Set src token before calling the superclass
* Remove special behavior for T5
* Remove needless imports
* Remove needless asserts
2021-01-12 18:19:38 -05:00
Sylvain Gugger
e6ecef711e
Revert, it was not the issue.
2021-01-12 18:00:22 -05:00
Sylvain Gugger
250f27f207
Fix tokenizers install for now
2021-01-12 17:50:27 -05:00
Lysandre Debut
dfbf0f5598
topk -> top_k ( #9541 )
2021-01-12 16:21:29 -05:00
Lysandre Debut
a1100fac67
LayoutLM Config ( #9539 )
2021-01-12 10:03:50 -05:00
NielsRogge
e45eba3b1c
Improve LayoutLM ( #9476 )
...
* Add LayoutLMForSequenceClassification and integration tests
Improve docs
Add LayoutLM notebook to list of community notebooks
* Make style & quality
* Address comments by @sgugger, @patrickvonplaten and @LysandreJik
* Fix rebase with master
* Reformat in one line
* Improve code examples as requested by @patrickvonplaten
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 09:26:32 -05:00
Suraj Patil
ccd1923f46
[T5] enable T5 fp16 ( #9487 )
...
* fix t5 fp16
2021-01-12 17:12:33 +05:30
Patrick von Platen
2aa9c2f204
fix blenderbot tok ( #9532 )
2021-01-12 05:53:32 -05:00
Lysandre Debut
406cbf58b2
Shouldn't stale issues/PRs with feature request label ( #9511 )
2021-01-12 04:49:15 -05:00
Simon Brandeis
3b67c5abb0
Update 'Develop on Windows' guidelines ( #9519 )
2021-01-12 04:15:16 -05:00
Patrick von Platen
a051d8928a
[ProphetNet] Fix naming and wrong config ( #9514 )
...
* fix naming issues
* better names
2021-01-12 04:10:05 -05:00
Patrick von Platen
7f28613213
[TFBart] Split TF-Bart ( #9497 )
...
* make templates ready
* make add_new_model_command_ready
* finish tf bart
* prepare tf mbart
* finish tf bart
* add tf mbart
* add marian
* prep pegasus
* add tf pegasus
* push blenderbot tf
* add blenderbot
* add blenderbot small
* clean-up
* make fix copy
* define blend bot tok
* fix
* up
* make style
* add to docs
* add copy statements
* overwrite changes
* improve
* fix docs
* finish
* fix last slow test
* fix missing git conflict line
* fix blenderbot
* up
* fix blenderbot small
* load changes
* finish copied from
* upload fix
2021-01-12 02:06:32 +01:00
Stas Bekman
0ecbb69806
[make docs] parallel build ( #9522 )
...
After experimenting with different number of workers https://github.com/huggingface/transformers/issues/9496#issuecomment-758145868 4-5 workers seems to be the most optimal - let's go with 4 as surely we wouldn't find a cpu with less cores these days.
Fixes part of https://github.com/huggingface/transformers/issues/9496
@sgugger
2021-01-11 13:00:08 -08:00