transformers/docs/source/model_doc
Julien Demouth 02ec02d6d3
Add nvidia megatron models (#10911)
* Add support for NVIDIA Megatron models

* Add support for NVIDIA Megatron GPT2 and BERT

Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.

Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove model.half in tests + add "# Copied ..."

Remove the model.half() instruction which makes tests fail on the CPU.

Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.

* Fix issues

* Fix Flax/TF tests

* Fix copyright

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/megatron_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/megatron_gpt2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Resolve most of 'sgugger' comments

* Fix conversion issue + Run make fix-copies/quality/docs

* Apply suggestions from code review

* Causal LM & merge

* Fix init

* Add CausalLM to last auto class

Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-08 14:09:11 -04:00
..
albert.rst Enforce all objects in the main init are documented (#9014) 2020-12-10 11:57:12 -05:00
auto.rst Auto feature extractor (#11097) 2021-04-06 19:20:08 -04:00
bart.rst BartForCausalLM analogs to ProphetNetForCausalLM (#9128) 2021-02-04 11:56:12 +03:00
barthez.rst Fix documentation links always pointing to master. (#9217) 2021-01-05 06:18:48 -05:00
bert.rst Typo fix of the name of BertLMHeadModel in BERT doc (#11133) 2021-04-08 08:22:58 -04:00
bertgeneration.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
bertweet.rst Improve documentation coverage for Bertweet (#9379) 2021-01-04 13:12:59 -05:00
bigbird.rst add blog to docs (#10997) 2021-03-31 18:36:00 +03:00
blenderbot_small.rst BartForCausalLM analogs to ProphetNetForCausalLM (#9128) 2021-02-04 11:56:12 +03:00
blenderbot.rst BartForCausalLM analogs to ProphetNetForCausalLM (#9128) 2021-02-04 11:56:12 +03:00
bort.rst ADD BORT (#9813) 2021-01-27 21:25:11 +03:00
camembert.rst Enforce all objects in the main init are documented (#9014) 2020-12-10 11:57:12 -05:00
convbert.rst Fix doc for TFConverBertModel 2021-02-04 10:14:46 -05:00
ctrl.rst Added TF CTRL Sequence Classification (#9151) 2020-12-17 18:10:57 -05:00
deberta_v2.rst Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018) 2021-02-19 18:34:44 -05:00
deberta.rst Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018) 2021-02-19 18:34:44 -05:00
dialogpt.rst ADD BORT (#9813) 2021-01-27 21:25:11 +03:00
distilbert.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
dpr.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
electra.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
encoderdecoder.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
flaubert.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
fsmt.rst Deprecate prepare_seq2seq_batch (#10287) 2021-02-22 12:36:16 -05:00
funnel.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
gpt_neo.rst [doc] gpt-neo (#11098) 2021-04-06 16:42:06 -04:00
gpt.rst [doc] update code-block rendering (#11053) 2021-04-05 09:06:07 -04:00
gpt2.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
herbert.rst Improve documentation coverage for Herbert (#9428) 2021-01-06 09:13:43 -05:00
ibert.rst Update ibert.rst (#10445) 2021-02-28 19:03:49 +03:00
layoutlm.rst Layout lm tf 2 (#10636) 2021-03-25 12:32:38 -04:00
led.rst Upgrade styler to better handle lists (#9423) 2021-01-06 07:46:17 -05:00
longformer.rst Add message to documentation that longformer doesn't support token_type_ids (#9152) 2020-12-16 11:06:14 -05:00
lxmert.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
m2m_100.rst fix M2M100 example (#10745) 2021-03-16 20:20:00 +05:30
marian.rst Deprecate prepare_seq2seq_batch (#10287) 2021-02-22 12:36:16 -05:00
mbart.rst Deprecate prepare_seq2seq_batch (#10287) 2021-02-22 12:36:16 -05:00
megatron_bert.rst Add nvidia megatron models (#10911) 2021-04-08 14:09:11 -04:00
megatron_gpt2.rst Add nvidia megatron models (#10911) 2021-04-08 14:09:11 -04:00
mobilebert.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
mpnet.rst MPNet copyright files (#9015) 2020-12-10 09:29:38 -05:00
mt5.rst Enforce all objects in the main init are documented (#9014) 2020-12-10 11:57:12 -05:00
pegasus.rst Fix broken link (#10656) 2021-03-11 14:29:02 -05:00
phobert.rst Improve documentation coverage for Phobert (#9427) 2021-01-06 10:04:32 -05:00
prophetnet.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
rag.rst Add TFRag (#9002) 2021-03-09 00:49:51 +03:00
reformer.rst Enforce all objects in the main init are documented (#9014) 2020-12-10 11:57:12 -05:00
retribert.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
roberta.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
speech_to_text.rst Fix S2T example (#10741) 2021-03-16 08:55:07 -04:00
squeezebert.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
t5.rst Deprecate prepare_seq2seq_batch (#10287) 2021-02-22 12:36:16 -05:00
tapas.rst Fix URLs to TAPAS notebooks (#9435) 2021-01-06 07:20:41 -05:00
transformerxl.rst Fix script that check objects are documented (#9259) 2020-12-22 11:12:58 -05:00
vit.rst Add Vision Transformer and ViTFeatureExtractor (#10950) 2021-04-01 11:16:05 -04:00
wav2vec2.rst Add Fine-Tuning for Wav2Vec2 (#10145) 2021-03-01 12:13:17 +03:00
xlm.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
xlmprophetnet.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
xlmroberta.rst Enforce all objects in the main init are documented (#9014) 2020-12-10 11:57:12 -05:00
xlnet.rst Enforce all objects in the main init are documented (#9014) 2020-12-10 11:57:12 -05:00
xlsr_wav2vec2.rst [XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models (#10648) 2021-03-11 17:44:18 +03:00