Commit Graph

6940 Commits

Author SHA1 Message Date
Julien Demouth
02ec02d6d3
Add nvidia megatron models (#10911)
* Add support for NVIDIA Megatron models

* Add support for NVIDIA Megatron GPT2 and BERT

Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.

Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove model.half in tests + add "# Copied ..."

Remove the model.half() instruction which makes tests fail on the CPU.

Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.

* Fix issues

* Fix Flax/TF tests

* Fix copyright

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/megatron_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/megatron_gpt2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Resolve most of 'sgugger' comments

* Fix conversion issue + Run make fix-copies/quality/docs

* Apply suggestions from code review

* Causal LM & merge

* Fix init

* Add CausalLM to last auto class

Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-08 14:09:11 -04:00
Stas Bekman
c6d664849b
[DeepSpeed] ZeRO Stage 3 (#10753)
* synced gpus

* fix

* fix

* need to use t5-small for quality tests

* notes

* complete merge

* fix a disappearing std stream problem

* start zero3 tests

* wip

* tune params

* sorting out the pre-trained model loading

* reworking generate loop wip

* wip

* style

* fix tests

* split the tests

* refactor tests

* wip

* parameterized

* fix

* workout the resume from non-ds checkpoint pass + test

* cleanup

* remove no longer needed code

* split getter/setter functions

* complete the docs

* suggestions

* gpus and their compute capabilities link

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* style

* remove invalid paramgd

* automatically configure zero3 params that rely on hidden size

* make _get_resized_embeddings zero3-aware

* add test exercising resize_token_embeddings()

* add docstring

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-08 09:53:01 -07:00
Stas Bekman
acc851e1ff
[run_clm] clarify why we get the tokenizer warning on long input (#11145)
* clarify why we get the warning here

* Update examples/language-modeling/run_clm.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* wording

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-08 09:46:28 -07:00
Yusuke Mori
5bf5d50c8d
Typo fix of the name of BertLMHeadModel in BERT doc (#11133) 2021-04-08 08:22:58 -04:00
Jannis Born
f8e90d6fb9
Fix typing error in Trainer class (prediction_step) (#11138)
* fix: docstrings in prediction_step

* ci: Satisfy line length requirements

* ci: character length requirements
2021-04-08 08:22:25 -04:00
Sylvain Gugger
ffe0761777
Fix and refactor check_repo (#11127) 2021-04-07 17:56:21 -04:00
Philipp Schmid
3fd7eee18f
Adds use_auth_token with pipelines (#11123)
* added model_kwargs to infer_framework_from_model

* added model_kwargs to tokenizer

* added use_auth_token as named parameter

* added dynamic get for use_auth_token
2021-04-07 20:32:59 +02:00
Stas Bekman
1c15128312
[versions] handle version requirement ranges (#11110)
* handle version requirement ranges

* add mixed requirement test

* cleanup
2021-04-07 09:09:38 -07:00
Vasudev Gupta
7442801df5
fix tests (#11109) 2021-04-07 10:07:26 -04:00
Lysandre Debut
c0d97cee13
Adds a note to resize the token embedding matrix when adding special … (#11120)
* Adds a note to resize the token embedding matrix when adding special tokens

* Remove superfluous space
2021-04-07 10:06:45 -04:00
Sylvain Gugger
02f7c2fe66
Some styling of the training table in Notebooks (#11118) 2021-04-07 10:00:33 -04:00
Sylvain Gugger
11505fa139
Dummies multi backend (#11100)
* Replaces requires_xxx by one generic method

* Quality and update check_dummies

* Fix inits check

* Post-merge cleanup
2021-04-07 09:56:40 -04:00
Stas Bekman
424419f549
[examples] fix white space (#11099)
these get concatenated without whitespace, so fix it
2021-04-07 09:20:58 -04:00
Stas Bekman
c9035e4537
fix: The 'warn' method is deprecated (#11105)
* The 'warn' method is deprecated

* fix test
2021-04-07 09:20:06 -04:00
Leo Gao
247bed3857
GPTNeo: handle padded wte (#11079)
* GPTNeo: handle padded wte

* Switch to config.vocab_size

* apply review suggestion

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-04-07 17:35:20 +05:30
cronoik
083ad7d46c
dead link fixed (#11103) 2021-04-07 07:50:47 -04:00
Sylvain Gugger
fd338abdeb Style 2021-04-06 19:54:13 -04:00
SHYAM SUNDER KUMAR
aef4cf8c52
accelerate question answering examples with no trainer (#11091)
* accelerate question answering examples with no trainer

* removed train and eval flags also fixed fill np array function

* Update examples/question-answering/run_qa_beam_search_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/question-answering/run_qa_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-06 19:35:21 -04:00
Sylvain Gugger
403d530eec
Auto feature extractor (#11097)
* AutoFeatureExtractor

* Init and first tests

* Tests

* Damn you gitignore

* Quality

* Defensive test for when not all backends are here

* Use pattern for Speech2Text models
2021-04-06 19:20:08 -04:00
Stas Bekman
520198f56f
[doc] gpt-neo (#11098)
make the example work
2021-04-06 16:42:06 -04:00
Lysandre
9853c5dd58 Development on v4.6.0dev0 2021-04-06 12:53:25 -04:00
Lysandre
4906a29f7f Release v4.5.0 2021-04-06 12:37:47 -04:00
Suraj Patil
2a8115f083
[WIP] GPT Neo cleanup (#10985)
* better names

* add attention mixin

* all slow tests in one class

* make helper methods static so we can test

* add local attention tests

* better names

* doc

* apply review suggestions
2021-04-06 12:24:15 -04:00
Philipp Schmid
76800fb8e6
added new merged Trainer test (#11090) 2021-04-06 15:12:21 +02:00
Philipp Schmid
b219d6b5a5
added social thumbnail for docs (#11083) 2021-04-06 14:56:18 +02:00
Sylvain Gugger
6c1bee7d89 Link to new blog 2021-04-06 08:55:40 -04:00
Stas Bekman
f7328de46d
HF emoji unicode doesn't work in console (#11081)
It doesn't look like using 🤗 is a great idea for printing to console. See attachment.

This PR proposes to replace 🤗 with "HuggingFace" for an exception message.

@LysandreJik
2021-04-06 08:03:00 -04:00
Hemil Desai
6ab7d1a429
Add Readme for language modeling scripts with accelerate (#11073) 2021-04-05 20:56:12 -04:00
Sylvain Gugger
2199608ca6
Make a base init in FeatureExtractionMixin (#11074) 2021-04-05 18:02:28 -04:00
Sylvain Gugger
04ceee7d24
Fix distributed gather for tuples of tensors of varying sizes (#11071) 2021-04-05 16:21:49 -04:00
Sylvain Gugger
f05a8a0c5e
Document common config attributes (#11070) 2021-04-05 15:29:01 -04:00
Sylvain Gugger
090e3e6896
Add center_crop to ImageFeatureExtractoMixin (#11066) 2021-04-05 15:28:51 -04:00
konstin
abb7430003
Replace pkg_resources with importlib_metadata (#11061)
* Replace pkg_resources with importlib_metadata

Fixes #10964. The other reason for this change is that pkg_resources has been [deprecated](8fe85c22ce) in favor of importlib_metadata.

* Reduce to a single importlib_metadata import switch

* Trigger CI

Co-authored-by: Stas Bekman <stas@stason.org>
2021-04-05 12:12:19 -07:00
Hemil Desai
b51b87c41d
Add examples/language_modeling/run_clm_no_trainer.py (#11026)
* Initial draft for clm no trainer

* Remove unwanted args

* Fix bug

* Update examples/language-modeling/run_clm_no_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-05 12:27:52 -04:00
Amala Deshmukh
e1c02e018c
Add example for registering callbacks with trainers (#10928)
* Add example for callback registry

Resolves: #9036

* Update callback registry documentation

* Added comments for other ways to register callback
2021-04-05 12:27:23 -04:00
Lysandre Debut
9f4e0c23d6
Documentation about loading a fast tokenizer within Transformers (#11029)
* Documentation about loading a fast tokenizer within Transformers

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-05 10:51:16 -04:00
Sylvain Gugger
6c25f5228e
Refactor AutoModel classes and add Flax Auto classes (#11027)
* Refactor AutoModel classes and add Flax Auto classes

* Add new objects to the init

* Fix hubconf and sort models

* Fix TF tests

* Missing coma

* Update src/transformers/models/auto/auto_factory.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix init

* Fix dummies

* Other init to fix

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-05 10:11:28 -04:00
Lysandre Debut
eb3479e7cf
Some models have no tokenizers (#11064) 2021-04-05 09:37:49 -04:00
Lysandre Debut
773e4c7263
Remove unnecessary space (#11060) 2021-04-05 09:36:20 -04:00
Lysandre Debut
ef62f038fd
Pin docutils (#11062)
* Pin docutils

* Versions table
2021-04-05 09:35:21 -04:00
Eren Şahin
6e31014110
[doc] update code-block rendering (#11053)
double : prevents code-block section to be rendered, so made it single :
2021-04-05 09:06:07 -04:00
Stas Bekman
3d39226a51
s|Pretrained|PreTrained| (#11048) 2021-04-04 18:08:42 -07:00
Sylvain Gugger
b0d49fd536
Add a script to check inits are consistent (#11024) 2021-04-04 20:41:34 -04:00
versis
335c0ca35c
fixed typo: logging instead of logger (#11025) 2021-04-02 09:22:22 -04:00
Philipp Schmid
34e1bec649
added new notebook and merge of trainer (#11015)
* added new notebook and merge of trainer

* Update docs/source/sagemaker.md

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-01 23:13:47 +02:00
Julien Chaumond
e8da77d181
[doc] no more bucket 2021-04-01 14:25:47 -04:00
Joe Davison
f4ad3d8cea
minor typo fix
*negative* log-likelihood
2021-04-01 11:58:37 -06:00
cronoik
57c1749efa
DebertaTokenizer Rework closes #10258 (#10703)
* closes #10258

* typo

* reworked deberta test

* implemented the comments from BigBird01 regarding sequence pair encoding of deberta

* Update style

* VOCAB_FILES_NAMES is now a oneliner as suggested by @sgugger

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added #fmt: on as requested by @sgugger

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-01 13:53:53 -04:00
NielsRogge
30677dc743
Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one

* Update ViTFeatureExtractor to use image_utils instead of torchvision

* Remove torchvision and add Pillow

* Small docs improvement

* Address most comments by @sgugger

* Fix tests

* Clean up conversion script

* Pooler first draft

* Fix quality

* Improve conversion script

* Make style and quality

* Make fix-copies

* Minor docs improvements

* Should use fix-copies instead of manual handling

* Revert "Should use fix-copies instead of manual handling"

This reverts commit fd4e591bce.

* Place ViT in alphabetical order

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-01 11:16:05 -04:00
cchen-dialpad
af6732225c
Improve the speed of adding tokens from added_tokens.json (#10780)
* use bisect to add one token to unique_no_split_tokens

* fix style
2021-04-01 08:56:12 -04:00