Stella Biderman
|
c02cd95c56
|
GPT-J-6B (#13022)
* Test GPTJ implementation
* Fixed conflicts
* Update __init__.py
* Update __init__.py
* change GPT_J to GPTJ
* fix missing imports and typos
* use einops for now
(need to change to torch ops later)
* Use torch ops instead of einsum
* remove einops deps
* Update configuration_auto.py
* Added GPT J
* Update gptj.rst
* Update __init__.py
* Update test_modeling_gptj.py
* Added GPT J
* Changed configs to match GPT2 instead of GPT Neo
* Removed non-existent sequence model
* Update configuration_auto.py
* Update configuration_auto.py
* Update configuration_auto.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Progress on updating configs to agree with GPT2
* Update modeling_gptj.py
* num_layers -> n_layer
* layer_norm_eps -> layer_norm_epsilon
* attention_layers -> num_hidden_layers
* Update modeling_gptj.py
* attention_pdrop -> attn_pdrop
* hidden_act -> activation_function
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* fix layernorm and lm_head size
delete attn_type
* Update docs/source/model_doc/gptj.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* removed claim that GPT J uses local attention
* Removed GPTJForSequenceClassification
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Removed unsupported boilerplate
* Update tests/test_modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update tests/test_modeling_gptj.py
Co-authored-by: Eric Hallahan <eric@hallahans.name>
* Update tests/test_modeling_gptj.py
Co-authored-by: Eric Hallahan <eric@hallahans.name>
* Update tests/test_modeling_gptj.py
Co-authored-by: Eric Hallahan <eric@hallahans.name>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update __init__.py
* Update configuration_gptj.py
* Update modeling_gptj.py
* Corrected indentation
* Remove stray backslash
* Delete .DS_Store
* Delete .DS_Store
* Delete .DS_Store
* Delete .DS_Store
* Delete .DS_Store
* Update docs to match
* Remove tf loading
* Remove config.jax
* Remove stray `else:` statement
* Remove references to `load_tf_weights_in_gptj`
* Adapt tests to match output from GPT-J 6B
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Default `activation_function` to `gelu_new`
- Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()`
* Fix part of the config documentation
* Revert "Update configuration_auto.py"
This reverts commit e9860e9c04 .
* Revert "Update configuration_auto.py"
This reverts commit cfaaae4c4d .
* Revert "Update configuration_auto.py"
This reverts commit 687788954f .
* Revert "Update configuration_auto.py"
This reverts commit 194d024ea8 .
* Hyphenate GPT-J
* Undid sorting of the models alphabetically
* Reverting previous commit
* fix style and quality issues
* Update docs/source/model_doc/gptj.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Replaced GPTJ-specific code with generic code
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Made the code always use rotary positional encodings
* Update index.rst
* Fix documentation
* Combine attention classes
- Condense all attention operations into `GPTJAttention`
- Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout`
* Removed `config.rotary_dim` from tests
* Update test_modeling_gptj.py
* Update test_modeling_gptj.py
* Fix formatting
* Removed depreciated argument `layer_id` to `GPTJAttention`
* Update modeling_gptj.py
* Update modeling_gptj.py
* Fix code quality
* Restore model functionality
* Save `lm_head.weight` in checkpoints
* Fix crashes when loading with reduced precision
* refactor self._attn(...)` and rename layer weights"
* make sure logits are in fp32 for sampling
* improve docs
* Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist
* Added GPT-J to the README
* Fix doc/readme consistency
* Add rough parallelization support
- Remove unused imports and variables
- Clean up docstrings
- Port experimental parallelization code from GPT-2 into GPT-J
* Clean up loose ends
* Fix index.rst
Co-authored-by: kurumuz <kurumuz1@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Eric Hallahan <eric@hallahans.name>
Co-authored-by: Leo Gao <54557097+leogao2@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2021-08-31 17:53:02 +02:00 |
|