transformers/docs/source/model_doc
Stella Biderman c02cd95c56
GPT-J-6B (#13022)
* Test GPTJ implementation

* Fixed conflicts

* Update __init__.py

* Update __init__.py

* change GPT_J to GPTJ

* fix missing imports and typos

* use einops for now
(need to change to torch ops later)

* Use torch ops instead of einsum

* remove einops deps

* Update configuration_auto.py

* Added GPT J

* Update gptj.rst

* Update __init__.py

* Update test_modeling_gptj.py

* Added GPT J

* Changed configs to match GPT2 instead of GPT Neo

* Removed non-existent sequence model

* Update configuration_auto.py

* Update configuration_auto.py

* Update configuration_auto.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Progress on updating configs to agree with GPT2

* Update modeling_gptj.py

* num_layers -> n_layer

* layer_norm_eps -> layer_norm_epsilon

* attention_layers -> num_hidden_layers

* Update modeling_gptj.py

* attention_pdrop -> attn_pdrop

* hidden_act -> activation_function

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* fix layernorm and lm_head size
delete attn_type

* Update docs/source/model_doc/gptj.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* removed claim that GPT J uses local attention

* Removed GPTJForSequenceClassification

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Removed unsupported boilerplate

* Update tests/test_modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update tests/test_modeling_gptj.py

Co-authored-by: Eric Hallahan <eric@hallahans.name>

* Update tests/test_modeling_gptj.py

Co-authored-by: Eric Hallahan <eric@hallahans.name>

* Update tests/test_modeling_gptj.py

Co-authored-by: Eric Hallahan <eric@hallahans.name>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update __init__.py

* Update configuration_gptj.py

* Update modeling_gptj.py

* Corrected indentation

* Remove stray backslash

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

* Update docs to match

* Remove tf loading

* Remove config.jax

* Remove stray `else:` statement

* Remove references to `load_tf_weights_in_gptj`

* Adapt tests to match output from GPT-J 6B

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Default `activation_function` to `gelu_new`

- Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()`

* Fix part of the config documentation

* Revert "Update configuration_auto.py"

This reverts commit e9860e9c04.

* Revert "Update configuration_auto.py"

This reverts commit cfaaae4c4d.

* Revert "Update configuration_auto.py"

This reverts commit 687788954f.

* Revert "Update configuration_auto.py"

This reverts commit 194d024ea8.

* Hyphenate GPT-J

* Undid sorting of the models alphabetically

* Reverting previous commit

* fix style and quality issues

* Update docs/source/model_doc/gptj.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Replaced GPTJ-specific code with generic code

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Made the code always use rotary positional encodings

* Update index.rst

* Fix documentation

* Combine attention classes

- Condense all attention operations into `GPTJAttention`
- Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout`

* Removed `config.rotary_dim` from tests

* Update test_modeling_gptj.py

* Update test_modeling_gptj.py

* Fix formatting

* Removed depreciated argument `layer_id` to `GPTJAttention`

* Update modeling_gptj.py

* Update modeling_gptj.py

* Fix code quality

* Restore model functionality

* Save `lm_head.weight` in checkpoints

* Fix crashes when loading with reduced precision

* refactor self._attn(...)` and rename layer weights"

* make sure logits are in fp32 for sampling

* improve docs

* Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist

* Added GPT-J to the README

* Fix doc/readme consistency

* Add rough parallelization support

- Remove unused imports and variables
- Clean up docstrings
- Port experimental parallelization code from GPT-2 into GPT-J

* Clean up loose ends

* Fix index.rst

Co-authored-by: kurumuz <kurumuz1@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Eric Hallahan <eric@hallahans.name>
Co-authored-by: Leo Gao <54557097+leogao2@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-08-31 17:53:02 +02:00
..
albert.rst albert flax (#13294) 2021-08-30 17:29:27 +02:00
auto.rst add FlaxAutoModelForImageClassification in main init (#12298) 2021-06-22 18:26:05 +05:30
bart.rst FlaxBart (#11537) 2021-06-14 15:16:08 +05:30
barthez.rst Examples reorg (#11350) 2021-04-21 11:11:20 -04:00
beit.rst Add BEiT (#12994) 2021-08-04 18:29:23 +02:00
bert_japanese.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
bert.rst [Flax] Correct flax docs (#12782) 2021-08-04 16:31:23 +02:00
bertgeneration.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
bertweet.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
bigbird_pegasus.rst Add BigBirdPegasus (#10991) 2021-05-07 09:27:43 +02:00
bigbird.rst Flax Big Bird (#11967) 2021-06-14 20:01:03 +01:00
blenderbot_small.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
blenderbot.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
bort.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
byt5.rst ByT5 model (#11971) 2021-06-01 19:07:37 +01:00
camembert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
canine.rst Wrong model is used in example, should be character instead of subword model (#12676) 2021-07-13 08:40:27 -04:00
clip.rst add and fix examples (#12810) 2021-07-20 09:28:50 -04:00
convbert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
cpm.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
ctrl.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
deberta_v2.rst Deberta_v2 tf (#13120) 2021-08-31 06:32:47 -04:00
deberta.rst Deberta tf (#12972) 2021-08-12 05:01:26 -04:00
deit.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
detr.rst Improve detr (#12147) 2021-06-17 10:37:54 -04:00
dialogpt.rst ADD BORT (#9813) 2021-01-27 21:25:11 +03:00
distilbert.rst distilbert-flax (#13324) 2021-08-30 14:16:18 +02:00
dpr.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
electra.rst [Flax] Add Electra models (#11426) 2021-05-04 20:56:09 +02:00
encoderdecoder.rst Make Flax GPT2 working with cross attention (#13008) 2021-08-23 17:57:29 +02:00
flaubert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
fsmt.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
funnel.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
gpt_neo.rst FlaxGPTNeo (#12493) 2021-07-06 18:55:18 +05:30
gpt.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
gpt2.rst Add GPT2ForTokenClassification (#13290) 2021-08-31 12:19:04 +02:00
gptj.rst GPT-J-6B (#13022) 2021-08-31 17:53:02 +02:00
herbert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
hubert.rst Add Wav2Vec2 & Hubert ForSequenceClassification (#13153) 2021-08-27 20:52:51 +03:00
ibert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
layoutlm.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
layoutlmv2.rst Add LayoutLMv2 + LayoutXLM (#12604) 2021-08-30 12:35:42 +02:00
layoutxlm.rst Add LayoutLMv2 + LayoutXLM (#12604) 2021-08-30 12:35:42 +02:00
led.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
longformer.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
luke.rst Add LUKE (#11223) 2021-05-03 09:07:29 -04:00
lxmert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
m2m_100.rst replace tgt_lang by tgt_text (#13061) 2021-08-09 22:47:05 +05:30
marian.rst Rely on huggingface_hub for common tools (#13100) 2021-08-12 14:59:02 +02:00
mbart.rst [Flax] Add FlaxMBart (#12236) 2021-07-07 12:20:38 +05:30
megatron_bert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
megatron_gpt2.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
mobilebert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
mpnet.rst MPNet copyright files (#9015) 2020-12-10 09:29:38 -05:00
mt5.rst [Flax] Correctly Add MT5 (#12988) 2021-08-04 16:03:13 +02:00
pegasus.rst Typo in usage example, changed to device instead of torch_device (#11979) 2021-06-01 14:58:49 -04:00
phobert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
prophetnet.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
rag.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
reformer.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
rembert.rst Add RemBERT model code to huggingface (#10692) 2021-07-24 11:31:42 -04:00
retribert.rst Examples reorg (#11350) 2021-04-21 11:11:20 -04:00
roberta.rst [FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py (#11470) 2021-05-04 19:57:59 +02:00
roformer.rst [RoFormer] Fix some issues (#12397) 2021-07-06 03:31:57 -04:00
speech_to_text.rst fix: typo spelling grammar (#13212) 2021-08-30 08:09:14 -04:00
splinter.rst Add splinter (#12955) 2021-08-17 08:29:01 -04:00
squeezebert.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
t5.rst Flax T5 (#12150) 2021-06-23 13:13:32 +01:00
tapas.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
transformerxl.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
visual_bert.rst Fix VisualBERT docs (#13106) 2021-08-13 11:44:04 +05:30
vit.rst Add DINO conversion script (#13265) 2021-08-26 17:25:20 +02:00
wav2vec2.rst Add Wav2Vec2 & Hubert ForSequenceClassification (#13153) 2021-08-27 20:52:51 +03:00
xlm.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
xlmprophetnet.rst Copyright (#8970) 2020-12-07 18:36:34 -05:00
xlmroberta.rst Honor contributors to models (#11329) 2021-04-21 09:47:27 -04:00
xlnet.rst Examples reorg (#11350) 2021-04-21 11:11:20 -04:00
xlsr_wav2vec2.rst [XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models (#10648) 2021-03-11 17:44:18 +03:00