transformers/docs/source
Stella Biderman c02cd95c56
GPT-J-6B (#13022)
* Test GPTJ implementation

* Fixed conflicts

* Update __init__.py

* Update __init__.py

* change GPT_J to GPTJ

* fix missing imports and typos

* use einops for now
(need to change to torch ops later)

* Use torch ops instead of einsum

* remove einops deps

* Update configuration_auto.py

* Added GPT J

* Update gptj.rst

* Update __init__.py

* Update test_modeling_gptj.py

* Added GPT J

* Changed configs to match GPT2 instead of GPT Neo

* Removed non-existent sequence model

* Update configuration_auto.py

* Update configuration_auto.py

* Update configuration_auto.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Progress on updating configs to agree with GPT2

* Update modeling_gptj.py

* num_layers -> n_layer

* layer_norm_eps -> layer_norm_epsilon

* attention_layers -> num_hidden_layers

* Update modeling_gptj.py

* attention_pdrop -> attn_pdrop

* hidden_act -> activation_function

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update configuration_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* Update modeling_gptj.py

* fix layernorm and lm_head size
delete attn_type

* Update docs/source/model_doc/gptj.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* removed claim that GPT J uses local attention

* Removed GPTJForSequenceClassification

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Removed unsupported boilerplate

* Update tests/test_modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update tests/test_modeling_gptj.py

Co-authored-by: Eric Hallahan <eric@hallahans.name>

* Update tests/test_modeling_gptj.py

Co-authored-by: Eric Hallahan <eric@hallahans.name>

* Update tests/test_modeling_gptj.py

Co-authored-by: Eric Hallahan <eric@hallahans.name>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update __init__.py

* Update configuration_gptj.py

* Update modeling_gptj.py

* Corrected indentation

* Remove stray backslash

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

* Delete .DS_Store

* Update docs to match

* Remove tf loading

* Remove config.jax

* Remove stray `else:` statement

* Remove references to `load_tf_weights_in_gptj`

* Adapt tests to match output from GPT-J 6B

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Default `activation_function` to `gelu_new`

- Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()`

* Fix part of the config documentation

* Revert "Update configuration_auto.py"

This reverts commit e9860e9c04.

* Revert "Update configuration_auto.py"

This reverts commit cfaaae4c4d.

* Revert "Update configuration_auto.py"

This reverts commit 687788954f.

* Revert "Update configuration_auto.py"

This reverts commit 194d024ea8.

* Hyphenate GPT-J

* Undid sorting of the models alphabetically

* Reverting previous commit

* fix style and quality issues

* Update docs/source/model_doc/gptj.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/configuration_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Replaced GPTJ-specific code with generic code

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Made the code always use rotary positional encodings

* Update index.rst

* Fix documentation

* Combine attention classes

- Condense all attention operations into `GPTJAttention`
- Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout`

* Removed `config.rotary_dim` from tests

* Update test_modeling_gptj.py

* Update test_modeling_gptj.py

* Fix formatting

* Removed depreciated argument `layer_id` to `GPTJAttention`

* Update modeling_gptj.py

* Update modeling_gptj.py

* Fix code quality

* Restore model functionality

* Save `lm_head.weight` in checkpoints

* Fix crashes when loading with reduced precision

* refactor self._attn(...)` and rename layer weights"

* make sure logits are in fp32 for sampling

* improve docs

* Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist

* Added GPT-J to the README

* Fix doc/readme consistency

* Add rough parallelization support

- Remove unused imports and variables
- Clean up docstrings
- Port experimental parallelization code from GPT-2 into GPT-J

* Clean up loose ends

* Fix index.rst

Co-authored-by: kurumuz <kurumuz1@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Eric Hallahan <eric@hallahans.name>
Co-authored-by: Leo Gao <54557097+leogao2@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-08-31 17:53:02 +02:00
..
_static Docs for v4.10.0 2021-08-31 16:02:31 +02:00
imgs [doc] DP/PP/TP/etc parallelism (#12524) 2021-07-09 17:39:09 -07:00
internal Fix doc building error 2021-08-12 05:49:02 -04:00
main_classes TF/Numpy variants for all DataCollator classes (#13105) 2021-08-31 13:06:48 +01:00
model_doc GPT-J-6B (#13022) 2021-08-31 17:53:02 +02:00
add_new_model.rst consistent nn. and nn.functional: part 5 docs (#12161) 2021-06-14 13:34:32 -07:00
benchmarks.rst [Docs] fixed broken link (#12205) 2021-06-16 15:14:53 -04:00
bertology.rst Fix documentation links always pointing to master. (#9217) 2021-01-05 06:18:48 -05:00
community.md docs: add HuggingArtists to community notebooks (#13050) 2021-08-10 09:36:44 +02:00
conf.py Docs for v4.10.0 2021-08-31 16:02:31 +02:00
contributing.md Update installation page and add contributing to the doc (#5084) 2020-06-17 14:01:10 -04:00
converting_tensorflow_models.rst Examples reorg (#11350) 2021-04-21 11:11:20 -04:00
custom_datasets.rst Rename NLP library to Datasets library (#10920) 2021-03-26 08:07:59 -04:00
debugging.rst [debug] DebugUnderflowOverflow doesn't work with DP (#12816) 2021-07-21 09:36:02 -07:00
examples.md per_device instead of per_gpu/error thrown when argument unknown (#4618) 2020-05-27 11:36:55 -04:00
fast_tokenizers.rst Documentation about loading a fast tokenizer within Transformers (#11029) 2021-04-05 10:51:16 -04:00
favicon.ico Adding usage examples for common tasks (#2850) 2020-02-25 13:48:24 -05:00
glossary.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
index.rst GPT-J-6B (#13022) 2021-08-31 17:53:02 +02:00
installation.md Add mention of the huggingface_hub methods for offline mode (#12320) 2021-06-23 09:45:30 -04:00
migration.md consistent nn. and nn.functional: part 5 docs (#12161) 2021-06-14 13:34:32 -07:00
model_sharing.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
model_summary.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
multilingual.rst Examples reorg (#11350) 2021-04-21 11:11:20 -04:00
notebooks.md Update notebooks (#3620) 2020-04-06 14:32:39 -04:00
parallelism.md docs: fix minor typo (#13289) 2021-08-31 06:49:05 -04:00
performance.md [doc] performance: batch sizes (#12725) 2021-07-15 09:39:34 -07:00
perplexity.rst Create perplexity.rst (#13004) 2021-08-05 02:56:13 -04:00
philosophy.rst Minor documentation revisions from copyediting (#9266) 2020-12-23 10:15:49 -05:00
preprocessing.rst doc mismatch fixed (#13345) 2021-08-31 06:28:37 -04:00
pretrained_models.rst GPT Neo few fixes (#10968) 2021-03-30 11:15:55 -04:00
quicktour.rst Doctests job (#13088) 2021-08-12 03:42:25 -04:00
sagemaker.md remove documentation (#12657) 2021-07-12 18:02:51 +02:00
serialization.rst Add to ONNX docs (#13048) 2021-08-09 09:51:49 -04:00
task_summary.rst Doctests job (#13088) 2021-08-12 03:42:25 -04:00
testing.rst [doc] testing: how to trigger a self-push workflow (#12724) 2021-07-15 16:18:56 -07:00
tokenizer_summary.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
training.rst fix: typo spelling grammar (#13212) 2021-08-30 08:09:14 -04:00
troubleshooting.md [troubleshooting] add 2 points of reference to the offline mode (#11236) 2021-04-14 08:39:23 -07:00