transformers/utils
Julien Demouth 02ec02d6d3
Add nvidia megatron models (#10911)
* Add support for NVIDIA Megatron models

* Add support for NVIDIA Megatron GPT2 and BERT

Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.

Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove model.half in tests + add "# Copied ..."

Remove the model.half() instruction which makes tests fail on the CPU.

Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.

* Fix issues

* Fix Flax/TF tests

* Fix copyright

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/megatron_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/megatron_gpt2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Resolve most of 'sgugger' comments

* Fix conversion issue + Run make fix-copies/quality/docs

* Apply suggestions from code review

* Causal LM & merge

* Fix init

* Add CausalLM to last auto class

Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-08 14:09:11 -04:00
..
tf_ops Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
check_copies.py Check copies blackify (#10775) 2021-03-17 18:11:20 -04:00
check_dummies.py Dummies multi backend (#11100) 2021-04-07 09:56:40 -04:00
check_inits.py Dummies multi backend (#11100) 2021-04-07 09:56:40 -04:00
check_repo.py Add nvidia megatron models (#10911) 2021-04-08 14:09:11 -04:00
check_table.py Check table as independent script (#8976) 2020-12-07 19:55:12 -05:00
check_tf_ops.py Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
class_mapping_update.py remap MODEL_FOR_QUESTION_ANSWERING_MAPPING classes to names auto-generated file (#10487) 2021-03-03 08:54:00 -08:00
custom_init_isort.py Sort init import (#10801) 2021-03-19 16:17:13 -04:00
download_glue_data.py Enforce string-formatting with f-strings (#10980) 2021-03-31 10:00:27 -04:00
get_modified_files.py a more reliable version of branching point discovery (#9449) 2021-01-07 04:47:50 -05:00
link_tester.py Enforce string-formatting with f-strings (#10980) 2021-03-31 10:00:27 -04:00
notification_service.py Tests run on Docker (#10681) 2021-03-15 17:28:01 -04:00
release.py Release utils (#10735) 2021-03-16 08:41:47 -04:00
style_doc.py Fix 9918 (#9932) 2021-02-02 05:22:20 -05:00