Sylvain Gugger
9d8e8a8703
Avoid using no_sync on SageMaker DP ( #11229 )
2021-04-13 15:34:00 -04:00
Philipp Schmid
9fa2995993
added cache_dir=model_args.cache_dir to all example with cache_dir arg ( #11220 )
2021-04-13 18:35:18 +02:00
Sylvain Gugger
3312e96bfb
Doc check: a bit of clean up ( #11224 )
2021-04-13 12:14:25 -04:00
Suraj Patil
edca520d0f
Refactor GPT2 ( #11225 )
...
* refactor GPT2
* fix mlp and head pruning
* address Sylvains comments
* apply suggestion from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-13 21:15:24 +05:30
Sylvain Gugger
893e51a53f
Document v4.5.1
2021-04-13 11:28:17 -04:00
Sylvain Gugger
81009b7a5c
Replace error by warning when loading an architecture in another ( #11207 )
...
* Replace error by warning when loading an architecture in another
* Style
* Style again
* Add a test
* Adapt old test
2021-04-13 10:33:52 -04:00
Yusuke Mori
22fa0a6004
Add documentation for BertJapanese ( #11219 )
...
* Start writing BERT-Japanese doc
* Fix typo, Update toctree
* Modify model file to use comment for document, Add examples
* Clean bert_japanese by make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Split a big code block into two
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add prefix >>> to all lines in code blocks
* Clean bert_japanese by make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-13 09:49:15 -04:00
Suraj Patil
896d7be974
fix docstrings ( #11221 )
2021-04-13 08:58:08 -04:00
Lysandre Debut
823df93955
Fix GPT-2 warnings ( #11213 )
...
* Fix GPT-2 warnings
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-04-13 08:53:03 -04:00
Lysandre Debut
0cd89d8c83
Add Matt as the TensorFlow reference ( #11212 )
2021-04-13 08:52:30 -04:00
Ceyda Cinarel
7c205bf40c
wav2vec2 converter: create the proper vocab.json while converting fairseq wav2vec2 finetuned model ( #11041 )
...
* add vocab while converting wav2vec2 original finetuned model
* check save directory exists
* return_attention_mask fix
* quality
2021-04-13 15:54:33 +05:30
calpt
d49d3cf6d6
Use MSELoss in (M)BartForSequenceClassification ( #11178 )
2021-04-13 15:24:46 +05:30
Philipp Schmid
f243a5ec0d
Sagemaker test docs update for framework upgrade ( #11206 )
...
* increased train_runtime for model parallelism
* added documentation for framework upgrade
2021-04-12 19:08:33 -04:00
Lysandre Debut
74d7c24d8d
Import torch.utils.checkpoint in ProphetNet ( #11214 )
2021-04-12 18:56:17 -04:00
cronoik
38a10c6b52
Replaced which
with who
( #11183 )
2021-04-12 18:08:28 -04:00
NielsRogge
9f1260971f
Add DeiT (PyTorch) ( #11056 )
...
* First draft of deit
* More improvements
* Remove DeiTTokenizerFast from init
* Conversion script works
* Add DeiT to ViT conversion script
* Add tests, add head model, add support for deit in vit conversion script
* Update model checkpoint names
* Update image_mean and image_std, set resample to bicubic
* Improve docs
* Docs improvements
* Add DeiTForImageClassificationWithTeacher to init
* Address comments by @sgugger
* Improve feature extractors
* Make fix-copies
* Minor fixes
* Address comments by @patil-suraj
* All models uploaded
* Fix tests
* Remove labels argument from DeiTForImageClassificationWithTeacher
* Fix-copies, style and quality
* Fix tests
* Fix typo
* Multiple docs improvements
* More docs fixes
2021-04-12 18:07:10 -04:00
Takuya Makino
cb251ba619
Fix typo ( #11188 )
2021-04-12 17:35:32 -04:00
fghuman
0c6fcd3034
Added documentation for data collator. ( #10941 )
...
* Added documentation for data collator.
* Update docs/source/data_collator.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added documentation for data collator.
* Added documentation for the data collator.
* Merge branch 'doc_DataCollator' of C:\Users\mahii\PycharmProjects\transformers with conflicts.
* Update documentation for the data collator.
* Update documentation for the data collator.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Amna <A.A.Ahmad@student.tudelft.nl>
2021-04-12 11:59:46 -04:00
Masatoshi TSUCHIYA
ef102c4886
model_path should be ignored as the checkpoint path ( #11157 )
...
* model_path is refered as the path of the trainer, and should be ignored as the checkpoint path.
* Improved according to Sgugger's comment.
2021-04-12 09:06:41 -04:00
Sylvain Gugger
623cd6aef9
Fix style
2021-04-12 08:14:29 -04:00
cronoik
a99f7f5c75
Minor typos fixed ( #11182 )
2021-04-12 07:55:40 -04:00
Sylvain Gugger
26212c14e5
Reactivate Megatron tests an use less workers
2021-04-09 18:09:53 -04:00
Lysandre
716120cbd6
Fix Typo
2021-04-09 17:46:52 -04:00
Philipp Schmid
6f90c29eaa
added json dump and extraction of train run time ( #11167 )
...
* added json dump and extraction of train run time
* make style happy
2021-04-09 15:18:00 -04:00
Stas Bekman
07f0bb691d
[examples run_clm] fix _LazyModule hasher error ( #11168 )
...
* fix _LazyModule hasher error
* reword
2021-04-09 11:39:12 -07:00
Suraj Patil
c161dd56df
[examples/translation] support mBART-50 and M2M100 fine-tuning ( #11170 )
...
* keep a list of multilingual tokenizers
* add forced_bos_token argument
2021-04-09 23:58:42 +05:30
Kevin Canwen Xu
fb41f9f50c
Add a special tokenizer for CPM model ( #11068 )
...
* Add a special tokenizer for CPM model
* make style
* fix
* Add docs
* styles
* cpm doc
* fix ci
* fix the overview
* add test
* make style
* typo
* Custom tokenizer flag
* Add REAMDE.md
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-10 02:07:47 +08:00
Sylvain Gugger
45fc8c7951
Make get_special_tokens_mask
consider all tokens ( #11163 )
2021-04-09 11:57:44 -04:00
Saviour Owolabi
6060746570
Update README.md ( #11161 )
...
Corrected a typo ('Downlowd' to 'Download')
2021-04-09 11:52:21 -04:00
Keisuke Hirota
b9b60c1630
Fix LogitsProcessor documentation ( #11130 )
...
* Change duplicated LogitsProcessor to LogitsWarper in LogitsProcessorList document
* Write more detailed information about LogitsProcessor's scores argument
* apply suggestion from review
* style
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-04-09 12:39:44 +05:30
Niklas Muennighoff
8b78a32be1
[Community notebooks] Add Wav2Vec notebook for creating captions for YT Clips ( #11142 )
...
* Add Wav2Vec Inference notebook
* Update docs/source/community.md
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-04-09 12:10:37 +05:30
Stas Bekman
0311ba2153
typo ( #11152 )
...
* typo
* style
2021-04-08 19:47:31 -07:00
Sylvain Gugger
269c9638df
Merge branch 'master' of github.com:huggingface/transformers
2021-04-08 21:14:56 -04:00
Sylvain Gugger
d31c7b104e
Skip Megatron tests for now
2021-04-08 21:14:43 -04:00
Stas Bekman
c2e0fd5283
[setup] make fairscale and deepspeed setup extras ( #11151 )
...
* make fairscale and deepspeed setup extras
* fix default
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* no reason not to ask for the good version
* update the CIs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-08 15:46:54 -07:00
Sylvain Gugger
ba8b1f4754
Add support for multiple models for one config in auto classes ( #11150 )
...
* Add support for multiple models for one config in auto classes
* Use get_values everywhere
* Prettier doc
2021-04-08 18:41:36 -04:00
Stas Bekman
97ccf67bb3
[setup] extras[docs] must include 'all' ( #11148 )
...
* extras[doc] must include 'all'
* fix
* better
* regroup
2021-04-08 18:10:44 -04:00
Stas Bekman
66446909b2
[tests] relocate core integration tests ( #11146 )
...
* relocate core integration tests
* add sys.path context manager
* cleanup
* try
* try2
* fix path
* doc
* style
* add dep
* add 2 more deps
2021-04-08 13:13:17 -07:00
Andrea Cappelli
6c40e49712
Run mlm pad to multiple for fp16 ( #11128 )
...
* Add mlm collator pad to multiple option (#10627 )
* Use padding to 8x in run mlm (#10627 )
2021-04-08 16:12:49 -04:00
Sylvain Gugger
dfed4ec263
Don't duplicate logs in TensorBoard and handle --use_env ( #11141 )
2021-04-08 16:12:36 -04:00
Philipp Schmid
9c9b8e707b
Updates SageMaker docs for updating DLCs ( #11140 )
2021-04-08 16:05:53 -04:00
Lysandre Debut
ba2cf5f90d
Add fairscale and deepspeed back to the CI ( #11147 )
...
* Add fairscale and deepspeed back to the CI
* Add deepspeed to single GPU tests
2021-04-08 11:36:45 -07:00
Stas Bekman
1ed24afe91
[trainer] solve "scheduler before optimizer step" warning ( #11144 )
...
* solve "scheduler before optimizer step" warning
* style
* correct the state evaluation test
2021-04-08 11:28:48 -07:00
Julien Demouth
02ec02d6d3
Add nvidia megatron models ( #10911 )
...
* Add support for NVIDIA Megatron models
* Add support for NVIDIA Megatron GPT2 and BERT
Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.
Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.
* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Remove model.half in tests + add "# Copied ..."
Remove the model.half() instruction which makes tests fail on the CPU.
Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.
* Fix issues
* Fix Flax/TF tests
* Fix copyright
* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update docs/source/model_doc/megatron_bert.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/model_doc/megatron_gpt2.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Resolve most of 'sgugger' comments
* Fix conversion issue + Run make fix-copies/quality/docs
* Apply suggestions from code review
* Causal LM & merge
* Fix init
* Add CausalLM to last auto class
Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-08 14:09:11 -04:00
Stas Bekman
c6d664849b
[DeepSpeed] ZeRO Stage 3 ( #10753 )
...
* synced gpus
* fix
* fix
* need to use t5-small for quality tests
* notes
* complete merge
* fix a disappearing std stream problem
* start zero3 tests
* wip
* tune params
* sorting out the pre-trained model loading
* reworking generate loop wip
* wip
* style
* fix tests
* split the tests
* refactor tests
* wip
* parameterized
* fix
* workout the resume from non-ds checkpoint pass + test
* cleanup
* remove no longer needed code
* split getter/setter functions
* complete the docs
* suggestions
* gpus and their compute capabilities link
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* style
* remove invalid paramgd
* automatically configure zero3 params that rely on hidden size
* make _get_resized_embeddings zero3-aware
* add test exercising resize_token_embeddings()
* add docstring
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-08 09:53:01 -07:00
Stas Bekman
acc851e1ff
[run_clm] clarify why we get the tokenizer warning on long input ( #11145 )
...
* clarify why we get the warning here
* Update examples/language-modeling/run_clm.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* wording
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-08 09:46:28 -07:00
Yusuke Mori
5bf5d50c8d
Typo fix of the name of BertLMHeadModel in BERT doc ( #11133 )
2021-04-08 08:22:58 -04:00
Jannis Born
f8e90d6fb9
Fix typing error in Trainer class (prediction_step) ( #11138 )
...
* fix: docstrings in prediction_step
* ci: Satisfy line length requirements
* ci: character length requirements
2021-04-08 08:22:25 -04:00
Sylvain Gugger
ffe0761777
Fix and refactor check_repo ( #11127 )
2021-04-07 17:56:21 -04:00
Philipp Schmid
3fd7eee18f
Adds use_auth_token with pipelines ( #11123 )
...
* added model_kwargs to infer_framework_from_model
* added model_kwargs to tokenizer
* added use_auth_token as named parameter
* added dynamic get for use_auth_token
2021-04-07 20:32:59 +02:00