transformers/docs/source/en/model_doc
Ariel Ekgren 5f94855dc3
Add gpt-sw3 model to transformers (#20209)
* Add templates for gpt-sw3

* Add templates for gpt-sw3

* Added sentencepiece tokenizer

* intermediate commit with many changes

* fixed conflicts

* Init commit for tokenization port

* Tokenization progress

* Remove fast tokenizer

* Clean up and rename spm.model -> spiece.model

* Remove TF -> PT conversion script template, Clean up Megatron -> PT script

* Optimize encode & decode performance

* added new attention

* added new attention

* attention for gpt-sw3 working

* attention good

* Cache is now working

* fixed attention mask so that it works with causal attention

* fixed badbmm bug for cpu and caching

* updated config with correct parameters

* Refactor and leave optimizations as separate functions to avoid breaking expected functionality

* Fix special tokens mapping for both tokenizers

* cleaning up of code and comments

* HF compatible attention outputs

* Tokenizer now passing tests, add documentation

* Update documentation

* reverted back to base implementation after checking that it is identical to pretrained model

* updated gpt-sw3 config

* updated conversion script

* aligned parameters with gpt-sw3 config

* changed default scale_attn_by_inverse_layer_idx to true

* removed flag from conversion script

* added temporary model path

* reverted back to functioning convert script

* small changes to default config

* updated tests for gpt-sw3

* make style, make quality, minor cleanup

* Change local paths to testing online repository

* Change name: GptSw3 -> GPTSw3

* Remove GPTSw3TokenizerFast references

* Use official model repository and add more model sizes

* Added reference to 6.7b model

* Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel

* Remove pointers to non-existing TFGPTSw3

* Add GPTSw3 to docs/_toctree.yml

* Remove TF artifacts from GPTSw3 in __init__ files

* Update README:s with 'make fix-copies'

* Add 20b model to archive list

* Add documentation for GPT-Sw3

* Fix typo in documentation for GPT-Sw3

* Do 'make fix-copies' again after having updated docs

* Fix some typos in docs

* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Resolve comments from PR feedback

* Resolve more comments from PR feedback, also set use_cache=True in convert script

* Add '# Copied from' comments for GPTSw3 modeling

* Set 'is_parallelizable = False'

* Remove '# Copied from' where code was modified and add 'with x->y' when appropriate

* Remove parallelize in mdx

* make style, make quality

* Update GPTSw3Config default values and corresponding documentation

* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available

* Make style, make quality

* Add dummy object for GPTSw3Tokenizer via 'make fix-copies'

* make fix-copies

* Remove GPTSw3 modeling classes

* make style, make quality

* Add GPTSw3 auto-mappings for other GPT2 heads

* Update docs/source/en/model_doc/gpt-sw3.mdx

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove old TODO-comment

* Add example usage to GPTSw3Tokenizer docstring

* make style, make quality

* Add implementation details and example usage to gpt-sw3.mdx

Co-authored-by: JoeyOhman <joeyoh@kth.se>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-12-12 13:12:13 -05:00
..
albert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
audio-spectrogram-transformer.mdx Add Audio Spectogram Transformer (#19981) 2022-11-21 18:58:54 +01:00
auto.mdx Split autoclasses on modality (#20559) 2022-12-05 12:28:44 -08:00
bart.mdx Add TFBartForSequenceClassification (#20570) 2022-12-07 18:05:39 +01:00
barthez.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bartpho.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
beit.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
bert-generation.mdx Result of new doc style with fixes (#17015) 2022-04-29 17:42:15 -04:00
bert-japanese.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bert.mdx Add BERT resources (#19852) 2022-11-01 11:09:53 -07:00
bertweet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
big_bird.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
bigbird_pegasus.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
biogpt.mdx Add BioGPT (#20420) 2022-12-05 10:12:03 -05:00
bit.mdx Add BiT + ViT hybrid (#20550) 2022-12-07 11:03:39 +01:00
blenderbot-small.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
blenderbot.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
bloom.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
bort.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
byt5.mdx [Doctests] Fix all T5 doc tests (#16646) 2022-04-13 11:36:54 +02:00
camembert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
canine.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
chinese_clip.mdx Add Chinese-CLIP implementation (#20368) 2022-11-30 19:22:23 +01:00
clip.mdx Add clip resources to the transformers documentation (#20190) 2022-11-15 13:26:46 -05:00
clipseg.mdx [CLIPSeg] Add resources (#20118) 2022-11-09 18:31:22 +01:00
codegen.mdx Add CodeGen model (#17443) 2022-06-24 17:10:38 +02:00
conditional_detr.mdx Add segmentation + object detection image processors (#20160) 2022-11-30 10:24:03 +00:00
convbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
convnext.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
cpm.mdx Allow all imports from transformers (#17050) 2022-05-02 12:47:39 -04:00
ctrl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
cvt.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
data2vec.mdx Add TFData2VecVision for semantic segmentation (#17271) 2022-06-08 14:03:18 +01:00
deberta-v2.mdx Add DebertaV2ForMultipleChoice (#17135) 2022-05-10 16:21:44 -04:00
deberta.mdx Add to DeBERTa resources (#20155) 2022-11-15 13:26:07 -05:00
decision_transformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
deformable_detr.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
deit.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
detr.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
dialogpt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dinat.mdx Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models (#20219) 2022-11-18 13:08:26 -05:00
distilbert.mdx add resources for distilbert (#19930) 2022-10-28 13:16:07 -07:00
dit.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
donut.mdx Add Donut image processor (#20425) 2022-11-29 10:38:01 +00:00
dpr.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dpt.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
electra.mdx [FlaxBert] Add ForCausalLM (#16995) 2022-05-03 11:26:19 +02:00
encoder-decoder.mdx [EncoderDecoder] Improve docs (#18271) 2022-07-27 10:08:59 +02:00
ernie.mdx add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686) 2022-09-09 07:36:46 -04:00
esm.mdx Add ESMFold (#19977) 2022-10-31 21:32:58 -04:00
flan-t5.mdx flan-t5.mdx: fix link to large model (#20555) 2022-12-02 19:27:46 +01:00
flaubert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
flava.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
fnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
fsmt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
funnel.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
glpn.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
gpt_neo.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
gpt_neox_japanese.mdx Add support for Japanese GPT-NeoX-based model by ABEJA, Inc. (#18814) 2022-09-14 10:17:40 -04:00
gpt_neox.mdx [WIP] Adding GPT-NeoX-20B (#16659) 2022-05-24 09:31:10 -04:00
gpt-sw3.mdx Add gpt-sw3 model to transformers (#20209) 2022-12-12 13:12:13 -05:00
gpt2.mdx add in layer gpt2 tokenizer (#20421) 2022-11-29 10:02:40 -05:00
gptj.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
groupvit.mdx [TensorFlow] Adding GroupViT (#18020) 2022-09-29 10:48:04 +01:00
herbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
hubert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
ibert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
imagegpt.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
jukebox.mdx Add Jukebox model (replaces #16875) (#17826) 2022-11-10 21:05:27 +01:00
layoutlm.mdx [LayoutLM] Add clarification to docs (#18716) 2022-09-02 14:48:19 +02:00
layoutlmv2.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
layoutlmv3.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
layoutxlm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
led.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
levit.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
lilt.mdx Add docs (#19729) 2022-10-18 17:42:46 +02:00
longformer.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
longt5.mdx Update longt5.mdx (#18634) 2022-08-16 10:20:46 -05:00
luke.mdx Adding fine-tuning models to LUKE (#18353) 2022-08-01 11:09:47 -04:00
lxmert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
m2m_100.mdx Fix m2m_100.mdx doc example missing labels (#19149) 2022-09-29 13:27:58 +02:00
marian.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
markuplm.mdx Fix doctest for MarkupLM (#19845) 2022-10-24 17:54:23 +02:00
maskformer.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
mbart.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
mctct.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
megatron_gpt2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
megatron-bert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mluke.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mobilebert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mobilenet_v1.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
mobilenet_v2.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
mobilevit.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
mpnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mt5.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
mvp.mdx Add MVP model (#17787) 2022-06-29 09:30:55 -04:00
nat.mdx Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models (#20219) 2022-11-18 13:08:26 -05:00
nezha.mdx Nezha Pytorch implementation (#17776) 2022-06-23 12:36:22 -04:00
nllb.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
nystromformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
openai-gpt.mdx Very small edit to change name to OpenAI GPT (#20722) 2022-12-12 09:43:43 -05:00
opt.mdx Add OPTForQuestionAnswering (#19402) 2022-10-10 09:30:59 -04:00
owlvit.mdx Add segmentation + object detection image processors (#20160) 2022-11-30 10:24:03 +00:00
pegasus_x.mdx PEGASUS-X (#18551) 2022-09-02 19:54:02 +02:00
pegasus.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
perceiver.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
phobert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
plbart.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
poolformer.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
prophetnet.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
qdqbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
rag.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
realm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
reformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
regnet.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
rembert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
resnet.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
retribert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
roberta.mdx Add RoBERTa resources (#19911) 2022-10-27 11:33:15 -07:00
roc_bert.mdx Add RocBert (#20013) 2022-11-08 10:03:43 -05:00
roformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
segformer.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
sew-d.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
sew.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
speech_to_text_2.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
speech_to_text.mdx Fix some doctests after PR 15775 (#20036) 2022-11-03 14:18:45 +01:00
speech-encoder-decoder.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
splinter.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
squeezebert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
swin.mdx Fix link to Swin Model contributor novice03 (#20557) 2022-12-05 11:42:29 -05:00
swinv2.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
switch_transformers.mdx Add Switch transformers (#19323) 2022-11-15 13:06:45 +01:00
t5.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
t5v1.1.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
table-transformer.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
tapas.mdx Fix tapas scatter (#20149) 2022-11-14 01:04:26 -05:00
tapex.mdx Add TAPEX (#16473) 2022-04-08 10:57:51 +02:00
time_series_transformer.mdx time series forecasting model (#17965) 2022-09-30 15:32:59 -04:00
timesformer.mdx [New Model] Add TimeSformer model (#18908) 2022-12-02 09:13:25 +01:00
trajectory_transformer.mdx Add trajectory transformer (#17141) 2022-05-17 19:07:43 -04:00
transfo-xl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
trocr.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
ul2.mdx Add UL2 (just docs) (#17740) 2022-06-21 10:24:50 +02:00
unispeech-sat.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
unispeech.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
van.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
videomae.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
vilt.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
vision-encoder-decoder.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
vision-text-dual-encoder.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
visual_bert.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
vit_hybrid.mdx Add BiT + ViT hybrid (#20550) 2022-12-07 11:03:39 +01:00
vit_mae.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
vit_msn.mdx MSN (Masked Siamese Networks) for ViT (#18815) 2022-09-22 07:15:03 -04:00
vit.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
wav2vec2_phoneme.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
wav2vec2-conformer.mdx [Wav2Vec2Conformer] Official release (#17709) 2022-06-15 18:34:15 +02:00
wav2vec2.mdx Add wav2vec2 resources (#19931) 2022-10-28 13:28:18 -07:00
wavlm.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
whisper.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
xclip.mdx Improve vision models docs (#19103) 2022-09-19 19:22:34 +02:00
xglm.mdx Add TF implementation of XGLMModel (#16543) 2022-08-24 10:51:05 +01:00
xlm-prophetnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm-roberta-xl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm-roberta.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xls_r.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlsr_wav2vec2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
yolos.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
yoso.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00