mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-16 11:08:23 +06:00
![]() * Add templates for gpt-sw3 * Add templates for gpt-sw3 * Added sentencepiece tokenizer * intermediate commit with many changes * fixed conflicts * Init commit for tokenization port * Tokenization progress * Remove fast tokenizer * Clean up and rename spm.model -> spiece.model * Remove TF -> PT conversion script template, Clean up Megatron -> PT script * Optimize encode & decode performance * added new attention * added new attention * attention for gpt-sw3 working * attention good * Cache is now working * fixed attention mask so that it works with causal attention * fixed badbmm bug for cpu and caching * updated config with correct parameters * Refactor and leave optimizations as separate functions to avoid breaking expected functionality * Fix special tokens mapping for both tokenizers * cleaning up of code and comments * HF compatible attention outputs * Tokenizer now passing tests, add documentation * Update documentation * reverted back to base implementation after checking that it is identical to pretrained model * updated gpt-sw3 config * updated conversion script * aligned parameters with gpt-sw3 config * changed default scale_attn_by_inverse_layer_idx to true * removed flag from conversion script * added temporary model path * reverted back to functioning convert script * small changes to default config * updated tests for gpt-sw3 * make style, make quality, minor cleanup * Change local paths to testing online repository * Change name: GptSw3 -> GPTSw3 * Remove GPTSw3TokenizerFast references * Use official model repository and add more model sizes * Added reference to 6.7b model * Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel * Remove pointers to non-existing TFGPTSw3 * Add GPTSw3 to docs/_toctree.yml * Remove TF artifacts from GPTSw3 in __init__ files * Update README:s with 'make fix-copies' * Add 20b model to archive list * Add documentation for GPT-Sw3 * Fix typo in documentation for GPT-Sw3 * Do 'make fix-copies' again after having updated docs * Fix some typos in docs * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Resolve comments from PR feedback * Resolve more comments from PR feedback, also set use_cache=True in convert script * Add '# Copied from' comments for GPTSw3 modeling * Set 'is_parallelizable = False' * Remove '# Copied from' where code was modified and add 'with x->y' when appropriate * Remove parallelize in mdx * make style, make quality * Update GPTSw3Config default values and corresponding documentation * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available * Make style, make quality * Add dummy object for GPTSw3Tokenizer via 'make fix-copies' * make fix-copies * Remove GPTSw3 modeling classes * make style, make quality * Add GPTSw3 auto-mappings for other GPT2 heads * Update docs/source/en/model_doc/gpt-sw3.mdx Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove old TODO-comment * Add example usage to GPTSw3Tokenizer docstring * make style, make quality * Add implementation details and example usage to gpt-sw3.mdx Co-authored-by: JoeyOhman <joeyoh@kth.se> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
---|---|---|
.. | ||
albert.mdx | ||
audio-spectrogram-transformer.mdx | ||
auto.mdx | ||
bart.mdx | ||
barthez.mdx | ||
bartpho.mdx | ||
beit.mdx | ||
bert-generation.mdx | ||
bert-japanese.mdx | ||
bert.mdx | ||
bertweet.mdx | ||
big_bird.mdx | ||
bigbird_pegasus.mdx | ||
biogpt.mdx | ||
bit.mdx | ||
blenderbot-small.mdx | ||
blenderbot.mdx | ||
bloom.mdx | ||
bort.mdx | ||
byt5.mdx | ||
camembert.mdx | ||
canine.mdx | ||
chinese_clip.mdx | ||
clip.mdx | ||
clipseg.mdx | ||
codegen.mdx | ||
conditional_detr.mdx | ||
convbert.mdx | ||
convnext.mdx | ||
cpm.mdx | ||
ctrl.mdx | ||
cvt.mdx | ||
data2vec.mdx | ||
deberta-v2.mdx | ||
deberta.mdx | ||
decision_transformer.mdx | ||
deformable_detr.mdx | ||
deit.mdx | ||
detr.mdx | ||
dialogpt.mdx | ||
dinat.mdx | ||
distilbert.mdx | ||
dit.mdx | ||
donut.mdx | ||
dpr.mdx | ||
dpt.mdx | ||
electra.mdx | ||
encoder-decoder.mdx | ||
ernie.mdx | ||
esm.mdx | ||
flan-t5.mdx | ||
flaubert.mdx | ||
flava.mdx | ||
fnet.mdx | ||
fsmt.mdx | ||
funnel.mdx | ||
glpn.mdx | ||
gpt_neo.mdx | ||
gpt_neox_japanese.mdx | ||
gpt_neox.mdx | ||
gpt-sw3.mdx | ||
gpt2.mdx | ||
gptj.mdx | ||
groupvit.mdx | ||
herbert.mdx | ||
hubert.mdx | ||
ibert.mdx | ||
imagegpt.mdx | ||
jukebox.mdx | ||
layoutlm.mdx | ||
layoutlmv2.mdx | ||
layoutlmv3.mdx | ||
layoutxlm.mdx | ||
led.mdx | ||
levit.mdx | ||
lilt.mdx | ||
longformer.mdx | ||
longt5.mdx | ||
luke.mdx | ||
lxmert.mdx | ||
m2m_100.mdx | ||
marian.mdx | ||
markuplm.mdx | ||
maskformer.mdx | ||
mbart.mdx | ||
mctct.mdx | ||
megatron_gpt2.mdx | ||
megatron-bert.mdx | ||
mluke.mdx | ||
mobilebert.mdx | ||
mobilenet_v1.mdx | ||
mobilenet_v2.mdx | ||
mobilevit.mdx | ||
mpnet.mdx | ||
mt5.mdx | ||
mvp.mdx | ||
nat.mdx | ||
nezha.mdx | ||
nllb.mdx | ||
nystromformer.mdx | ||
openai-gpt.mdx | ||
opt.mdx | ||
owlvit.mdx | ||
pegasus_x.mdx | ||
pegasus.mdx | ||
perceiver.mdx | ||
phobert.mdx | ||
plbart.mdx | ||
poolformer.mdx | ||
prophetnet.mdx | ||
qdqbert.mdx | ||
rag.mdx | ||
realm.mdx | ||
reformer.mdx | ||
regnet.mdx | ||
rembert.mdx | ||
resnet.mdx | ||
retribert.mdx | ||
roberta.mdx | ||
roc_bert.mdx | ||
roformer.mdx | ||
segformer.mdx | ||
sew-d.mdx | ||
sew.mdx | ||
speech_to_text_2.mdx | ||
speech_to_text.mdx | ||
speech-encoder-decoder.mdx | ||
splinter.mdx | ||
squeezebert.mdx | ||
swin.mdx | ||
swinv2.mdx | ||
switch_transformers.mdx | ||
t5.mdx | ||
t5v1.1.mdx | ||
table-transformer.mdx | ||
tapas.mdx | ||
tapex.mdx | ||
time_series_transformer.mdx | ||
timesformer.mdx | ||
trajectory_transformer.mdx | ||
transfo-xl.mdx | ||
trocr.mdx | ||
ul2.mdx | ||
unispeech-sat.mdx | ||
unispeech.mdx | ||
van.mdx | ||
videomae.mdx | ||
vilt.mdx | ||
vision-encoder-decoder.mdx | ||
vision-text-dual-encoder.mdx | ||
visual_bert.mdx | ||
vit_hybrid.mdx | ||
vit_mae.mdx | ||
vit_msn.mdx | ||
vit.mdx | ||
wav2vec2_phoneme.mdx | ||
wav2vec2-conformer.mdx | ||
wav2vec2.mdx | ||
wavlm.mdx | ||
whisper.mdx | ||
xclip.mdx | ||
xglm.mdx | ||
xlm-prophetnet.mdx | ||
xlm-roberta-xl.mdx | ||
xlm-roberta.mdx | ||
xlm.mdx | ||
xlnet.mdx | ||
xls_r.mdx | ||
xlsr_wav2vec2.mdx | ||
yolos.mdx | ||
yoso.mdx |