transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-12 09:10:05 +06:00

History

Stella Biderman c02cd95c56 GPT-J-6B (#13022 ) * Test GPTJ implementation * Fixed conflicts * Update __init__.py * Update __init__.py * change GPT_J to GPTJ * fix missing imports and typos * use einops for now (need to change to torch ops later) * Use torch ops instead of einsum * remove einops deps * Update configuration_auto.py * Added GPT J * Update gptj.rst * Update __init__.py * Update test_modeling_gptj.py * Added GPT J * Changed configs to match GPT2 instead of GPT Neo * Removed non-existent sequence model * Update configuration_auto.py * Update configuration_auto.py * Update configuration_auto.py * Update modeling_gptj.py * Update modeling_gptj.py * Progress on updating configs to agree with GPT2 * Update modeling_gptj.py * num_layers -> n_layer * layer_norm_eps -> layer_norm_epsilon * attention_layers -> num_hidden_layers * Update modeling_gptj.py * attention_pdrop -> attn_pdrop * hidden_act -> activation_function * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update configuration_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * Update modeling_gptj.py * fix layernorm and lm_head size delete attn_type * Update docs/source/model_doc/gptj.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * removed claim that GPT J uses local attention * Removed GPTJForSequenceClassification * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Removed unsupported boilerplate * Update tests/test_modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update tests/test_modeling_gptj.py Co-authored-by: Eric Hallahan <eric@hallahans.name> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update __init__.py * Update configuration_gptj.py * Update modeling_gptj.py * Corrected indentation * Remove stray backslash * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store * Update docs to match * Remove tf loading * Remove config.jax * Remove stray `else:` statement * Remove references to `load_tf_weights_in_gptj` * Adapt tests to match output from GPT-J 6B * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Default `activation_function` to `gelu_new` - Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()` * Fix part of the config documentation * Revert "Update configuration_auto.py" This reverts commit `e9860e9c04`. * Revert "Update configuration_auto.py" This reverts commit `cfaaae4c4d`. * Revert "Update configuration_auto.py" This reverts commit `687788954f`. * Revert "Update configuration_auto.py" This reverts commit `194d024ea8`. * Hyphenate GPT-J * Undid sorting of the models alphabetically * Reverting previous commit * fix style and quality issues * Update docs/source/model_doc/gptj.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/configuration_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Replaced GPTJ-specific code with generic code * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Made the code always use rotary positional encodings * Update index.rst * Fix documentation * Combine attention classes - Condense all attention operations into `GPTJAttention` - Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout` * Removed `config.rotary_dim` from tests * Update test_modeling_gptj.py * Update test_modeling_gptj.py * Fix formatting * Removed depreciated argument `layer_id` to `GPTJAttention` * Update modeling_gptj.py * Update modeling_gptj.py * Fix code quality * Restore model functionality * Save `lm_head.weight` in checkpoints * Fix crashes when loading with reduced precision * refactor self._attn(...)` and rename layer weights" * make sure logits are in fp32 for sampling * improve docs * Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist * Added GPT-J to the README * Fix doc/readme consistency * Add rough parallelization support - Remove unused imports and variables - Clean up docstrings - Port experimental parallelization code from GPT-2 into GPT-J * Clean up loose ends * Fix index.rst Co-authored-by: kurumuz <kurumuz1@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Eric Hallahan <eric@hallahans.name> Co-authored-by: Leo Gao <54557097+leogao2@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: your_github_username <your_github_email> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>		2021-08-31 17:53:02 +02:00
..
albert.rst	albert flax (#13294 )	2021-08-30 17:29:27 +02:00
auto.rst	add FlaxAutoModelForImageClassification in main init (#12298 )	2021-06-22 18:26:05 +05:30
bart.rst	FlaxBart (#11537 )	2021-06-14 15:16:08 +05:30
barthez.rst	Examples reorg (#11350 )	2021-04-21 11:11:20 -04:00
beit.rst	Add BEiT (#12994 )	2021-08-04 18:29:23 +02:00
bert_japanese.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
bert.rst	[Flax] Correct flax docs (#12782 )	2021-08-04 16:31:23 +02:00
bertgeneration.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
bertweet.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
bigbird_pegasus.rst	Add BigBirdPegasus (#10991 )	2021-05-07 09:27:43 +02:00
bigbird.rst	Flax Big Bird (#11967 )	2021-06-14 20:01:03 +01:00
blenderbot_small.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
blenderbot.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
bort.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
byt5.rst	ByT5 model (#11971 )	2021-06-01 19:07:37 +01:00
camembert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
canine.rst	Wrong model is used in example, should be character instead of subword model (#12676 )	2021-07-13 08:40:27 -04:00
clip.rst	add and fix examples (#12810 )	2021-07-20 09:28:50 -04:00
convbert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
cpm.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
ctrl.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
deberta_v2.rst	Deberta_v2 tf (#13120 )	2021-08-31 06:32:47 -04:00
deberta.rst	Deberta tf (#12972 )	2021-08-12 05:01:26 -04:00
deit.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
detr.rst	Improve detr (#12147 )	2021-06-17 10:37:54 -04:00
dialogpt.rst	ADD BORT (#9813 )	2021-01-27 21:25:11 +03:00
distilbert.rst	distilbert-flax (#13324 )	2021-08-30 14:16:18 +02:00
dpr.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
electra.rst	[Flax] Add Electra models (#11426 )	2021-05-04 20:56:09 +02:00
encoderdecoder.rst	Make Flax GPT2 working with cross attention (#13008 )	2021-08-23 17:57:29 +02:00
flaubert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
fsmt.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
funnel.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
gpt_neo.rst	FlaxGPTNeo (#12493 )	2021-07-06 18:55:18 +05:30
gpt.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
gpt2.rst	Add GPT2ForTokenClassification (#13290 )	2021-08-31 12:19:04 +02:00
gptj.rst	GPT-J-6B (#13022 )	2021-08-31 17:53:02 +02:00
herbert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
hubert.rst	Add Wav2Vec2 & Hubert ForSequenceClassification (#13153 )	2021-08-27 20:52:51 +03:00
ibert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
layoutlm.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
layoutlmv2.rst	Add LayoutLMv2 + LayoutXLM (#12604 )	2021-08-30 12:35:42 +02:00
layoutxlm.rst	Add LayoutLMv2 + LayoutXLM (#12604 )	2021-08-30 12:35:42 +02:00
led.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
longformer.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
luke.rst	Add LUKE (#11223 )	2021-05-03 09:07:29 -04:00
lxmert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
m2m_100.rst	replace tgt_lang by tgt_text (#13061 )	2021-08-09 22:47:05 +05:30
marian.rst	Rely on huggingface_hub for common tools (#13100 )	2021-08-12 14:59:02 +02:00
mbart.rst	[Flax] Add FlaxMBart (#12236 )	2021-07-07 12:20:38 +05:30
megatron_bert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
megatron_gpt2.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
mobilebert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
mpnet.rst	MPNet copyright files (#9015 )	2020-12-10 09:29:38 -05:00
mt5.rst	[Flax] Correctly Add MT5 (#12988 )	2021-08-04 16:03:13 +02:00
pegasus.rst	Typo in usage example, changed to device instead of torch_device (#11979 )	2021-06-01 14:58:49 -04:00
phobert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
prophetnet.rst	Copyright (#8970 )	2020-12-07 18:36:34 -05:00
rag.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
reformer.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
rembert.rst	Add RemBERT model code to huggingface (#10692 )	2021-07-24 11:31:42 -04:00
retribert.rst	Examples reorg (#11350 )	2021-04-21 11:11:20 -04:00
roberta.rst	[FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py (#11470 )	2021-05-04 19:57:59 +02:00
roformer.rst	[RoFormer] Fix some issues (#12397 )	2021-07-06 03:31:57 -04:00
speech_to_text.rst	fix: typo spelling grammar (#13212 )	2021-08-30 08:09:14 -04:00
splinter.rst	Add splinter (#12955 )	2021-08-17 08:29:01 -04:00
squeezebert.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
t5.rst	Flax T5 (#12150 )	2021-06-23 13:13:32 +01:00
tapas.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
transformerxl.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
visual_bert.rst	Fix VisualBERT docs (#13106 )	2021-08-13 11:44:04 +05:30
vit.rst	Add DINO conversion script (#13265 )	2021-08-26 17:25:20 +02:00
wav2vec2.rst	Add Wav2Vec2 & Hubert ForSequenceClassification (#13153 )	2021-08-27 20:52:51 +03:00
xlm.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
xlmprophetnet.rst	Copyright (#8970 )	2020-12-07 18:36:34 -05:00
xlmroberta.rst	Honor contributors to models (#11329 )	2021-04-21 09:47:27 -04:00
xlnet.rst	Examples reorg (#11350 )	2021-04-21 11:11:20 -04:00
xlsr_wav2vec2.rst	[XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models (#10648 )	2021-03-11 17:44:18 +03:00