transformers/docs/source/en/model_doc
Matthijs Hollemans e4bacf6614
[WIP] add SpeechT5 model (#18922)
* make SpeechT5 model by copying Wav2Vec2

* add paper to docs

* whoops added docs in wrong file

* remove SpeechT5Tokenizer + put CTC back in the name

* remove deprecated class

* remove unused docstring

* delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead

* remove classes we don't need right now

* initial stab at speech encoder prenet

* add more speech encoder prenet stuff

* improve SpeechEncoderPrenet

* add encoder (not finished yet)

* add relative position bias to self-attention

* add encoder CTC layers

* fix formatting

* add decoder from BART, doesn't work yet

* make it work with generate loop

* wrap the encoder into a speech encoder class

* wrap the decoder in a text decoder class

* changed my mind

* changed my mind again ;-)

* load decoder weights, make it work

* add weights for text decoder postnet

* add SpeechT5ForCTC model that uses only the encoder

* clean up EncoderLayer and DecoderLayer

* implement _init_weights in SpeechT5PreTrainedModel

* cleanup config + Encoder and Decoder

* add head + cross attention masks

* improve doc comments

* fixup

* more cleanup

* more fixup

* TextDecoderPrenet works now, thanks Kendall

* add CTC loss

* add placeholders for other pre/postnets

* add type annotation

* fix freeze_feature_encoder

* set padding tokens to 0 in decoder attention mask

* encoder attention mask downsampling

* remove features_pen calculation

* disable the padding tokens thing again

* fixup

* more fixup

* code review fixes

* rename encoder/decoder wrapper classes

* allow checkpoints to be loaded into SpeechT5Model

* put encoder into wrapper for CTC model

* clean up conversion script

* add encoder for TTS model

* add speech decoder prenet

* add speech decoder post-net

* attempt to reconstruct the generation loop

* add speech generation loop

* clean up generate_speech

* small tweaks

* fix forward pass

* enable always dropout on speech decoder prenet

* sort declaration

* rename models

* fixup

* fix copies

* more fixup

* make consistency checker happy

* add Seq2SeqSpectrogramOutput class

* doc comments

* quick note about loss and labels

* add HiFi-GAN implementation (from Speech2Speech PR)

* rename file

* add vocoder to TTS model

* improve vocoder

* working on tokenizer

* more better tokenizer

* add CTC tokenizer

* fix decode and batch_code in CTC tokenizer

* fix processor

* two processors and feature extractors

* use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2

* cleanup

* more cleanup

* even more fixup

* notebooks

* fix log-mel spectrograms

* support reduction factor

* fixup

* shift spectrograms to right to create decoder inputs

* return correct labels

* add labels for stop token prediction

* fix doc comments

* fixup

* remove SpeechT5ForPreTraining

* more fixup

* update copyright headers

* add usage examples

* add SpeechT5ProcessorForCTC

* fixup

* push unofficial checkpoints to hub

* initial version of tokenizer unit tests

* add slow test

* fix failing tests

* tests for CTC tokenizer

* finish CTC tokenizer tests

* processor tests

* initial test for feature extractors

* tests for spectrogram feature extractor

* fixup

* more fixup

* add decorators

* require speech for tests

* modeling tests

* more tests for ASR model

* fix imports

* add fake tests for the other models

* fixup

* remove jupyter notebooks

* add missing SpeechT5Model tests

* add missing tests for SpeechT5ForCTC

* add missing tests for SpeechT5ForTextToSpeech

* sort tests by name

* fix Hi-Fi GAN tests

* fixup

* add speech-to-speech model

* refactor duplicate speech generation code

* add processor for SpeechToSpeech model

* add usage example

* add tests for speech-to-speech model

* fixup

* enable gradient checkpointing for SpeechT5FeatureEncoder

* code review

* push_to_hub now takes repo_id

* improve doc comments for HiFi-GAN config

* add missing test

* add integration tests

* make number of layers in speech decoder prenet configurable

* rename variable

* rename variables

* add auto classes for TTS and S2S

* REMOVE CTC!!!

* S2S processor does not support save/load_pretrained

* fixup

* these models are now in an auto mapping

* fix doc links

* rename HiFiGAN to HifiGan, remove separate config file

* REMOVE auto classes

* there can be only one

* fixup

* replace assert

* reformat

* feature extractor can process input and target at same time

* update checkpoint names

* fix commit hash
2023-02-03 12:43:46 -05:00
..
albert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
altclip.mdx Add AltCLIP (#20446) 2023-01-04 09:18:57 +01:00
audio-spectrogram-transformer.mdx Add resources (#20872) 2023-01-17 17:42:33 +01:00
auto.mdx Add Universal Segmentation class + mapping (#20766) 2022-12-16 14:22:46 +01:00
bart.mdx Add TFBartForSequenceClassification (#20570) 2022-12-07 18:05:39 +01:00
barthez.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bartpho.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
beit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
bert-generation.mdx Result of new doc style with fixes (#17015) 2022-04-29 17:42:15 -04:00
bert-japanese.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bert.mdx Add BERT resources (#19852) 2022-11-01 11:09:53 -07:00
bertweet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
big_bird.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
bigbird_pegasus.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
biogpt.mdx Add BioGPT (#20420) 2022-12-05 10:12:03 -05:00
bit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
blenderbot-small.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
blenderbot.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
blip.mdx blip support for training (#21021) 2023-01-18 11:24:37 +01:00
bloom.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
bort.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bridgetower.mdx Add BridgeTower model (#20775) 2023-01-25 14:04:32 -05:00
byt5.mdx [Doctests] Fix all T5 doc tests (#16646) 2022-04-13 11:36:54 +02:00
camembert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
canine.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
chinese_clip.mdx Add Chinese-CLIP implementation (#20368) 2022-11-30 19:22:23 +01:00
clip.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
clipseg.mdx [CLIPSeg] Add resources (#20118) 2022-11-09 18:31:22 +01:00
codegen.mdx Add CodeGen model (#17443) 2022-06-24 17:10:38 +02:00
conditional_detr.mdx Add segmentation + object detection image processors (#20160) 2022-11-30 10:24:03 +00:00
convbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
convnext.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
cpm.mdx Allow all imports from transformers (#17050) 2022-05-02 12:47:39 -04:00
ctrl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
cvt.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
data2vec.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
deberta-v2.mdx Add DebertaV2ForMultipleChoice (#17135) 2022-05-10 16:21:44 -04:00
deberta.mdx Add to DeBERTa resources (#20155) 2022-11-15 13:26:07 -05:00
decision_transformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
deformable_detr.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
deit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
deta.mdx [Docs] Minor fixes (#21383) 2023-01-31 15:13:12 +01:00
detr.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
dialogpt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dinat.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
distilbert.mdx add resources for distilbert (#19930) 2022-10-28 13:16:07 -07:00
dit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
donut.mdx Add Donut image processor (#20425) 2022-11-29 10:38:01 +00:00
dpr.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dpt.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
efficientformer.mdx Efficientformer (#20459) 2023-01-20 11:35:42 +03:00
electra.mdx [FlaxBert] Add ForCausalLM (#16995) 2022-05-03 11:26:19 +02:00
encoder-decoder.mdx [EncoderDecoder] Improve docs (#18271) 2022-07-27 10:08:59 +02:00
ernie.mdx add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686) 2022-09-09 07:36:46 -04:00
esm.mdx Add ESMFold (#19977) 2022-10-31 21:32:58 -04:00
flan-t5.mdx Update flan-t5 original model link (#20897) 2022-12-27 02:26:14 -05:00
flaubert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
flava.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
fnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
fsmt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
funnel.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
git.mdx Add resources (#20872) 2023-01-17 17:42:33 +01:00
glpn.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
gpt_neo.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
gpt_neox_japanese.mdx Add support for Japanese GPT-NeoX-based model by ABEJA, Inc. (#18814) 2022-09-14 10:17:40 -04:00
gpt_neox.mdx [WIP] Adding GPT-NeoX-20B (#16659) 2022-05-24 09:31:10 -04:00
gpt-sw3.mdx Add gpt-sw3 model to transformers (#20209) 2022-12-12 13:12:13 -05:00
gpt2.mdx add in layer gpt2 tokenizer (#20421) 2022-11-29 10:02:40 -05:00
gptj.mdx Adding resource section to GPT-J docs (#21270) 2023-01-30 16:48:04 -05:00
graphormer.mdx Graphormer model for Graph Classification (#20968) 2023-01-19 13:05:59 -05:00
groupvit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
herbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
hubert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
ibert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
imagegpt.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
jukebox.mdx Add Jukebox model (replaces #16875) (#17826) 2022-11-10 21:05:27 +01:00
layoutlm.mdx Added model resources for LayoutLM Issue#19848 (#21377) 2023-02-03 08:53:16 -05:00
layoutlmv2.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
layoutlmv3.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
layoutxlm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
led.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
levit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
lilt.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
longformer.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
longt5.mdx Update longt5.mdx (#18634) 2022-08-16 10:20:46 -05:00
luke.mdx Adding fine-tuning models to LUKE (#18353) 2022-08-01 11:09:47 -04:00
lxmert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
m2m_100.mdx Fix m2m_100.mdx doc example missing labels (#19149) 2022-09-29 13:27:58 +02:00
marian.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
markuplm.mdx Fix doctest for MarkupLM (#19845) 2022-10-24 17:54:23 +02:00
mask2former.mdx [Mask2Former] Add doc tests (#21232) 2023-01-25 12:34:43 +01:00
maskformer.mdx Add Mask2Former (#20792) 2023-01-16 20:37:07 +03:00
mbart.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
mctct.mdx [Past CI] 🔥 Leave Past CI failures in the past 🔥 (#20861) 2022-12-27 18:37:25 +01:00
megatron_gpt2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
megatron-bert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mluke.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mobilebert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mobilenet_v1.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
mobilenet_v2.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
mobilevit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
mpnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mt5.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
mvp.mdx Add MVP model (#17787) 2022-06-29 09:30:55 -04:00
nat.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
nezha.mdx Nezha Pytorch implementation (#17776) 2022-06-23 12:36:22 -04:00
nllb.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
nystromformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
oneformer.mdx [Mask2Former] Add doc tests (#21232) 2023-01-25 12:34:43 +01:00
openai-gpt.mdx Very small edit to change name to OpenAI GPT (#20722) 2022-12-12 09:43:43 -05:00
opt.mdx Add OPTForQuestionAnswering (#19402) 2022-10-10 09:30:59 -04:00
owlvit.mdx Improve OWL-ViT postprocessing (#20980) 2023-01-03 19:25:09 +03:00
pegasus_x.mdx PEGASUS-X (#18551) 2022-09-02 19:54:02 +02:00
pegasus.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
perceiver.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
phobert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
plbart.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
poolformer.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
prophetnet.mdx Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) 2022-11-04 11:32:44 -04:00
qdqbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
rag.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
realm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
reformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
regnet.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
rembert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
resnet.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
retribert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
roberta-prelayernorm.mdx Implement Roberta PreLayerNorm (#20305) 2022-12-19 09:30:17 +01:00
roberta.mdx Add RoBERTa resources (#19911) 2022-10-27 11:33:15 -07:00
roc_bert.mdx Add RocBert (#20013) 2022-11-08 10:03:43 -05:00
roformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
segformer.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
sew-d.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
sew.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
speech_to_text_2.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
speech_to_text.mdx Fix some doctests after PR 15775 (#20036) 2022-11-03 14:18:45 +01:00
speech-encoder-decoder.mdx Replace as_target context managers by direct calls (#18325) 2022-07-29 08:09:09 -04:00
speecht5.mdx [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
splinter.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
squeezebert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
swin.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
swin2sr.mdx Add Swin2SR (#19784) 2022-12-16 16:24:01 +01:00
swinv2.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
switch_transformers.mdx Add Switch transformers (#19323) 2022-11-15 13:06:45 +01:00
t5.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
t5v1.1.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
table-transformer.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
tapas.mdx Fix tapas scatter (#20149) 2022-11-14 01:04:26 -05:00
tapex.mdx Add TAPEX (#16473) 2022-04-08 10:57:51 +02:00
time_series_transformer.mdx time series forecasting model (#17965) 2022-09-30 15:32:59 -04:00
timesformer.mdx [New Model] Add TimeSformer model (#18908) 2022-12-02 09:13:25 +01:00
trajectory_transformer.mdx Add trajectory transformer (#17141) 2022-05-17 19:07:43 -04:00
transfo-xl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
trocr.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
ul2.mdx Add UL2 (just docs) (#17740) 2022-06-21 10:24:50 +02:00
unispeech-sat.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
unispeech.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
upernet.mdx [Docs] Minor fixes (#21383) 2023-01-31 15:13:12 +01:00
van.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
videomae.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
vilt.mdx AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
vision-encoder-decoder.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
vision-text-dual-encoder.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
visual_bert.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
vit_hybrid.mdx Add BiT + ViT hybrid (#20550) 2022-12-07 11:03:39 +01:00
vit_mae.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
vit_msn.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
vit.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
wav2vec2_phoneme.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
wav2vec2-conformer.mdx [Wav2Vec2Conformer] Official release (#17709) 2022-06-15 18:34:15 +02:00
wav2vec2.mdx Add wav2vec2 resources (#19931) 2022-10-28 13:28:18 -07:00
wavlm.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
whisper.mdx Generate: move generation_*.py src files into generation/*.py (#20096) 2022-11-09 15:34:08 +00:00
xclip.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
xglm.mdx Add TF implementation of XGLMModel (#16543) 2022-08-24 10:51:05 +01:00
xlm-prophetnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm-roberta-xl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm-roberta.mdx Remove Roberta Dependencies from XLM Roberta Flax and Tensorflow models (#21047) 2023-01-18 07:49:39 -05:00
xlm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xls_r.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlsr_wav2vec2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
yolos.mdx Add batch of resources (#20647) 2023-01-17 17:18:56 +01:00
yoso.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00