transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-06 22:30:09 +06:00

History

Stella Biderman c02cd95c56 GPT-J-6B (#13022 ) * Test GPTJ implementation * Fixed conflicts * Update __init__.py * Update __init__.py * change GPT_J to GPTJ * fix missing imports and typos * use einops for now (need to change to torch ops later) * Use torch ops instead of einsum * remove einops deps * Update configuration_auto.py * Added GPT J * Update gptj.rst * Update __init__.py * Update test_modeling_gptj.py * Added GPT J * Changed configs to match GPT2 instead of GPT Neo * Removed non-existent sequence model * Update configuration_auto.py * Update configuration_auto.py * Update configuration_auto.py * Update modeling_gptj.py * Update modeling_gptj.py * Progress on updating configs to agree with GPT2 * Update modeling_gptj.py * num_layers -> n_layer * layer_norm_eps -> layer_norm_epsilon * attention_layers -> num_hidden_layers * Update modeling_gptj.py * attention_pdrop -> attn_pdrop * hidden_act -> activation_function * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * fix layernorm and lm_head size delete attn_type * Update docs/source/model_doc/gptj.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * removed claim that GPT J uses local attention * Removed GPTJForSequenceClassification * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Removed unsupported boilerplate * Update tests/test_modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update __init__.py * Update configuration_gptj.py * Update modeling_gptj.py * Corrected indentation * Remove stray backslash * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Update docs to match * Remove tf loading * Remove config.jax * Remove stray `else:` statement * Remove references to `load_tf_weights_in_gptj` * Adapt tests to match output from GPT-J 6B * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Default `activation_function` to `gelu_new` - Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()` * Fix part of the config documentation * Revert "Update configuration_auto.py" This reverts commit `e9860e9c04`. * Revert "Update configuration_auto.py" This reverts commit `cfaaae4c4d`. * Revert "Update configuration_auto.py" This reverts commit `687788954f`. * Revert "Update configuration_auto.py" This reverts commit `194d024ea8`. * Hyphenate GPT-J * Undid sorting of the models alphabetically * Reverting previous commit * fix style and quality issues * Update docs/source/model_doc/gptj.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Replaced GPTJ-specific code with generic code * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Made the code always use rotary positional encodings * Update index.rst * Fix documentation * Combine attention classes - Condense all attention operations into `GPTJAttention` - Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout` * Removed `config.rotary_dim` from tests * Update test_modeling_gptj.py * Update test_modeling_gptj.py * Fix formatting * Removed depreciated argument `layer_id` to `GPTJAttention` * Update modeling_gptj.py * Update modeling_gptj.py * Fix code quality * Restore model functionality * Save `lm_head.weight` in checkpoints * Fix crashes when loading with reduced precision * refactor self._attn(...)` and rename layer weights" * make sure logits are in fp32 for sampling * improve docs * Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist * Added GPT-J to the README * Fix doc/readme consistency * Add rough parallelization support - Remove unused imports and variables - Clean up docstrings - Port experimental parallelization code from GPT-2 into GPT-J * Clean up loose ends * Fix index.rst Co-authored-by: kurumuz <kurumuz1@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Eric Hallahan <eric@hallahans.name> Co-authored-by: Leo Gao <54557097+leogao2@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: your_github_username <your_github_email> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>		2021-08-31 17:53:02 +02:00
..
_static	Docs for v4.10.0	2021-08-31 16:02:31 +02:00
imgs	[doc] DP/PP/TP/etc parallelism (#12524 )	2021-07-09 17:39:09 -07:00
internal	Fix doc building error	2021-08-12 05:49:02 -04:00
main_classes	TF/Numpy variants for all DataCollator classes (#13105 )	2021-08-31 13:06:48 +01:00
model_doc	GPT-J-6B (#13022 )	2021-08-31 17:53:02 +02:00
add_new_model.rst	consistent nn. and nn.functional: part 5 docs (#12161 )	2021-06-14 13:34:32 -07:00
benchmarks.rst	[Docs] fixed broken link (#12205 )	2021-06-16 15:14:53 -04:00
bertology.rst	Fix documentation links always pointing to master. (#9217 )	2021-01-05 06:18:48 -05:00
community.md	docs: add HuggingArtists to community notebooks (#13050 )	2021-08-10 09:36:44 +02:00
conf.py	Docs for v4.10.0	2021-08-31 16:02:31 +02:00
contributing.md	Update installation page and add contributing to the doc (#5084 )	2020-06-17 14:01:10 -04:00
converting_tensorflow_models.rst	Examples reorg (#11350 )	2021-04-21 11:11:20 -04:00
custom_datasets.rst	Rename NLP library to Datasets library (#10920 )	2021-03-26 08:07:59 -04:00
debugging.rst	[debug] DebugUnderflowOverflow doesn't work with DP (#12816 )	2021-07-21 09:36:02 -07:00
examples.md	per_device instead of per_gpu/error thrown when argument unknown (#4618 )	2020-05-27 11:36:55 -04:00
fast_tokenizers.rst	Documentation about loading a fast tokenizer within Transformers (#11029 )	2021-04-05 10:51:16 -04:00
favicon.ico	Adding usage examples for common tasks (#2850 )	2020-02-25 13:48:24 -05:00
glossary.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
index.rst	GPT-J-6B (#13022 )	2021-08-31 17:53:02 +02:00
installation.md	Add mention of the huggingface_hub methods for offline mode (#12320 )	2021-06-23 09:45:30 -04:00
migration.md	consistent nn. and nn.functional: part 5 docs (#12161 )	2021-06-14 13:34:32 -07:00
model_sharing.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
model_summary.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
multilingual.rst	Examples reorg (#11350 )	2021-04-21 11:11:20 -04:00
notebooks.md	Update notebooks (#3620 )	2020-04-06 14:32:39 -04:00
parallelism.md	docs: fix minor typo (#13289 )	2021-08-31 06:49:05 -04:00
performance.md	[doc] performance: batch sizes (#12725 )	2021-07-15 09:39:34 -07:00
perplexity.rst	Create perplexity.rst (#13004 )	2021-08-05 02:56:13 -04:00
philosophy.rst	Minor documentation revisions from copyediting (#9266 )	2020-12-23 10:15:49 -05:00
preprocessing.rst	doc mismatch fixed (#13345 )	2021-08-31 06:28:37 -04:00
pretrained_models.rst	GPT Neo few fixes (#10968 )	2021-03-30 11:15:55 -04:00
quicktour.rst	Doctests job (#13088 )	2021-08-12 03:42:25 -04:00
sagemaker.md	remove documentation (#12657 )	2021-07-12 18:02:51 +02:00
serialization.rst	Add to ONNX docs (#13048 )	2021-08-09 09:51:49 -04:00
task_summary.rst	Doctests job (#13088 )	2021-08-12 03:42:25 -04:00
testing.rst	[doc] testing: how to trigger a self-push workflow (#12724 )	2021-07-15 16:18:56 -07:00
tokenizer_summary.rst	Add video links to the documentation (#12162 )	2021-06-15 06:37:37 -04:00
training.rst	fix: typo spelling grammar (#13212 )	2021-08-30 08:09:14 -04:00
troubleshooting.md	[troubleshooting] add 2 points of reference to the offline mode (#11236 )	2021-04-14 08:39:23 -07:00