transformers/docs/source/en
Matthijs Hollemans e4bacf6614
[WIP] add SpeechT5 model (#18922)
* make SpeechT5 model by copying Wav2Vec2

* add paper to docs

* whoops added docs in wrong file

* remove SpeechT5Tokenizer + put CTC back in the name

* remove deprecated class

* remove unused docstring

* delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead

* remove classes we don't need right now

* initial stab at speech encoder prenet

* add more speech encoder prenet stuff

* improve SpeechEncoderPrenet

* add encoder (not finished yet)

* add relative position bias to self-attention

* add encoder CTC layers

* fix formatting

* add decoder from BART, doesn't work yet

* make it work with generate loop

* wrap the encoder into a speech encoder class

* wrap the decoder in a text decoder class

* changed my mind

* changed my mind again ;-)

* load decoder weights, make it work

* add weights for text decoder postnet

* add SpeechT5ForCTC model that uses only the encoder

* clean up EncoderLayer and DecoderLayer

* implement _init_weights in SpeechT5PreTrainedModel

* cleanup config + Encoder and Decoder

* add head + cross attention masks

* improve doc comments

* fixup

* more cleanup

* more fixup

* TextDecoderPrenet works now, thanks Kendall

* add CTC loss

* add placeholders for other pre/postnets

* add type annotation

* fix freeze_feature_encoder

* set padding tokens to 0 in decoder attention mask

* encoder attention mask downsampling

* remove features_pen calculation

* disable the padding tokens thing again

* fixup

* more fixup

* code review fixes

* rename encoder/decoder wrapper classes

* allow checkpoints to be loaded into SpeechT5Model

* put encoder into wrapper for CTC model

* clean up conversion script

* add encoder for TTS model

* add speech decoder prenet

* add speech decoder post-net

* attempt to reconstruct the generation loop

* add speech generation loop

* clean up generate_speech

* small tweaks

* fix forward pass

* enable always dropout on speech decoder prenet

* sort declaration

* rename models

* fixup

* fix copies

* more fixup

* make consistency checker happy

* add Seq2SeqSpectrogramOutput class

* doc comments

* quick note about loss and labels

* add HiFi-GAN implementation (from Speech2Speech PR)

* rename file

* add vocoder to TTS model

* improve vocoder

* working on tokenizer

* more better tokenizer

* add CTC tokenizer

* fix decode and batch_code in CTC tokenizer

* fix processor

* two processors and feature extractors

* use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2

* cleanup

* more cleanup

* even more fixup

* notebooks

* fix log-mel spectrograms

* support reduction factor

* fixup

* shift spectrograms to right to create decoder inputs

* return correct labels

* add labels for stop token prediction

* fix doc comments

* fixup

* remove SpeechT5ForPreTraining

* more fixup

* update copyright headers

* add usage examples

* add SpeechT5ProcessorForCTC

* fixup

* push unofficial checkpoints to hub

* initial version of tokenizer unit tests

* add slow test

* fix failing tests

* tests for CTC tokenizer

* finish CTC tokenizer tests

* processor tests

* initial test for feature extractors

* tests for spectrogram feature extractor

* fixup

* more fixup

* add decorators

* require speech for tests

* modeling tests

* more tests for ASR model

* fix imports

* add fake tests for the other models

* fixup

* remove jupyter notebooks

* add missing SpeechT5Model tests

* add missing tests for SpeechT5ForCTC

* add missing tests for SpeechT5ForTextToSpeech

* sort tests by name

* fix Hi-Fi GAN tests

* fixup

* add speech-to-speech model

* refactor duplicate speech generation code

* add processor for SpeechToSpeech model

* add usage example

* add tests for speech-to-speech model

* fixup

* enable gradient checkpointing for SpeechT5FeatureEncoder

* code review

* push_to_hub now takes repo_id

* improve doc comments for HiFi-GAN config

* add missing test

* add integration tests

* make number of layers in speech decoder prenet configurable

* rename variable

* rename variables

* add auto classes for TTS and S2S

* REMOVE CTC!!!

* S2S processor does not support save/load_pretrained

* fixup

* these models are now in an auto mapping

* fix doc links

* rename HiFiGAN to HifiGan, remove separate config file

* REMOVE auto classes

* there can be only one

* fixup

* replace assert

* reformat

* feature extractor can process input and target at same time

* update checkpoint names

* fix commit hash
2023-02-03 12:43:46 -05:00
..
internal MinNewTokensLengthLogitsProcessor for .generate method #20814 (#20892) 2023-01-03 06:29:02 -05:00
main_classes [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
model_doc [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
tasks Fix task guide formatting (#21409) 2023-02-02 10:06:26 -08:00
_config.py Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
_toctree.yml [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
accelerate.mdx update to use interlibrary links instead of Markdown (#18500) 2022-08-08 10:53:52 -05:00
add_new_model.mdx fix docs typos in "add_new_model" (#20900) 2022-12-27 02:49:15 -05:00
add_new_pipeline.mdx Spanish translation of asr.mdx and add_new_pipeline.mdx (#20569) 2022-12-12 09:23:23 -05:00
add_tensorflow_model.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
autoclass_tutorial.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
benchmarks.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bertology.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
big_models.mdx docs: Resolve many typos in the English docs (#20088) 2022-11-07 09:19:04 -05:00
community.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
converting_tensorflow_models.mdx Docs - Guide to add a new TensorFlow model (#19256) 2022-09-30 20:30:38 +01:00
create_a_model.mdx Documentation code sample fixes (#21302) 2023-01-25 11:33:39 -05:00
custom_models.mdx Replace awkward timm link with the expected one (#20109) 2022-11-07 13:57:39 -05:00
debugging.mdx Spanish translation of the file debugging.mdx (#20566) 2022-12-12 10:38:56 -05:00
fast_tokenizers.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
generation_strategies.mdx Add: An introductory guide for text generation (#21090) 2023-01-17 12:23:22 -05:00
glossary.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
hpo_train.mdx update doc for perf_train_cpu_many (#19506) 2022-10-11 22:54:19 -04:00
index.mdx [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
installation.mdx Move cache folder to huggingface/hub for consistency with hf_hub (#18492) 2022-08-05 13:14:00 -04:00
migration.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
model_sharing.mdx Just re-reading the whole doc every couple of months 😬 (#18489) 2022-08-06 09:38:55 +02:00
model_summary.mdx Embed circle packing chart for model summary (#20791) 2022-12-20 10:26:52 -08:00
multilingual.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
pad_truncation.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
perf_hardware.mdx [WIP] [doc] performance/scalability revamp (#15723) 2022-05-16 13:36:41 +02:00
perf_infer_cpu.mdx add doc for (#20525) 2022-12-01 16:52:13 +01:00
perf_infer_gpu_many.mdx add doc for (#20525) 2022-12-01 16:52:13 +01:00
perf_infer_gpu_one.mdx add doc for (#20525) 2022-12-01 16:52:13 +01:00
perf_infer_special.mdx Improve performance docs (#17750) 2022-06-23 14:51:54 +02:00
perf_train_cpu_many.mdx update cpu related doc (#20444) 2022-11-28 08:54:35 -05:00
perf_train_cpu.mdx update cpu related doc (#20444) 2022-11-28 08:54:35 -05:00
perf_train_gpu_many.mdx Fix Typo in Docs for GPU (#20509) 2022-11-30 10:41:18 -05:00
perf_train_gpu_one.mdx Migrate torchdynamo to torch.compile (#20634) 2022-12-08 11:18:52 -05:00
perf_train_special.mdx Fix Typo in Docs for GPU (#20509) 2022-11-30 10:41:18 -05:00
perf_train_tpu.mdx Fix Typo in Docs for GPU (#20509) 2022-11-30 10:41:18 -05:00
performance.mdx Fix Typo in Docs for GPU (#20509) 2022-11-30 10:41:18 -05:00
perplexity.mdx Fix incorrect size of input for 1st strided window length in Perplexity of fixed-length models (#18906) 2022-09-06 15:20:12 -04:00
philosophy.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00
pipeline_tutorial.mdx Documentation code sample fixes (#21302) 2023-01-25 11:33:39 -05:00
pipeline_webserver.mdx Rework the pipeline tutorial (#20437) 2022-12-06 10:47:31 +01:00
pr_checks.mdx 📝 update documentation build section (#18548) 2022-08-09 18:22:55 -05:00
preprocessing.mdx Updates to computer vision section of the Preprocess doc (#21181) 2023-01-19 08:43:36 -05:00
quicktour.mdx typo fix (#20891) 2022-12-26 02:06:23 -05:00
run_scripts.mdx Just re-reading the whole doc every couple of months 😬 (#18489) 2022-08-06 09:38:55 +02:00
sagemaker.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
serialization.mdx Add Onnx Config for PoolFormer (#20868) 2022-12-23 01:30:57 -05:00
task_summary.mdx Update expected values for doctest (#21284) 2023-01-24 13:32:31 -08:00
tasks_explained.mdx Update task summary (#21067) 2023-02-02 11:41:27 -08:00
testing.mdx fixed spelling error in testing.mdx (#20220) 2022-11-15 09:40:06 -05:00
tf_xla.mdx Rewrite a couple of lines in the TF XLA doc (#21177) 2023-01-18 17:53:05 +00:00
tokenizer_summary.mdx Update tokenizer_summary.mdx (#20135) 2022-11-15 01:18:13 +01:00
torchscript.mdx Breakup export guide (#19271) 2022-10-03 13:18:29 -07:00
training.mdx Fix code example in training tutorial (#21201) 2023-01-20 07:38:15 -08:00
troubleshooting.mdx Update doc examples feature extractor -> image processor (#20501) 2022-11-30 14:50:55 +00:00