transformers/utils
Matthijs Hollemans e4bacf6614
[WIP] add SpeechT5 model (#18922)
* make SpeechT5 model by copying Wav2Vec2

* add paper to docs

* whoops added docs in wrong file

* remove SpeechT5Tokenizer + put CTC back in the name

* remove deprecated class

* remove unused docstring

* delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead

* remove classes we don't need right now

* initial stab at speech encoder prenet

* add more speech encoder prenet stuff

* improve SpeechEncoderPrenet

* add encoder (not finished yet)

* add relative position bias to self-attention

* add encoder CTC layers

* fix formatting

* add decoder from BART, doesn't work yet

* make it work with generate loop

* wrap the encoder into a speech encoder class

* wrap the decoder in a text decoder class

* changed my mind

* changed my mind again ;-)

* load decoder weights, make it work

* add weights for text decoder postnet

* add SpeechT5ForCTC model that uses only the encoder

* clean up EncoderLayer and DecoderLayer

* implement _init_weights in SpeechT5PreTrainedModel

* cleanup config + Encoder and Decoder

* add head + cross attention masks

* improve doc comments

* fixup

* more cleanup

* more fixup

* TextDecoderPrenet works now, thanks Kendall

* add CTC loss

* add placeholders for other pre/postnets

* add type annotation

* fix freeze_feature_encoder

* set padding tokens to 0 in decoder attention mask

* encoder attention mask downsampling

* remove features_pen calculation

* disable the padding tokens thing again

* fixup

* more fixup

* code review fixes

* rename encoder/decoder wrapper classes

* allow checkpoints to be loaded into SpeechT5Model

* put encoder into wrapper for CTC model

* clean up conversion script

* add encoder for TTS model

* add speech decoder prenet

* add speech decoder post-net

* attempt to reconstruct the generation loop

* add speech generation loop

* clean up generate_speech

* small tweaks

* fix forward pass

* enable always dropout on speech decoder prenet

* sort declaration

* rename models

* fixup

* fix copies

* more fixup

* make consistency checker happy

* add Seq2SeqSpectrogramOutput class

* doc comments

* quick note about loss and labels

* add HiFi-GAN implementation (from Speech2Speech PR)

* rename file

* add vocoder to TTS model

* improve vocoder

* working on tokenizer

* more better tokenizer

* add CTC tokenizer

* fix decode and batch_code in CTC tokenizer

* fix processor

* two processors and feature extractors

* use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2

* cleanup

* more cleanup

* even more fixup

* notebooks

* fix log-mel spectrograms

* support reduction factor

* fixup

* shift spectrograms to right to create decoder inputs

* return correct labels

* add labels for stop token prediction

* fix doc comments

* fixup

* remove SpeechT5ForPreTraining

* more fixup

* update copyright headers

* add usage examples

* add SpeechT5ProcessorForCTC

* fixup

* push unofficial checkpoints to hub

* initial version of tokenizer unit tests

* add slow test

* fix failing tests

* tests for CTC tokenizer

* finish CTC tokenizer tests

* processor tests

* initial test for feature extractors

* tests for spectrogram feature extractor

* fixup

* more fixup

* add decorators

* require speech for tests

* modeling tests

* more tests for ASR model

* fix imports

* add fake tests for the other models

* fixup

* remove jupyter notebooks

* add missing SpeechT5Model tests

* add missing tests for SpeechT5ForCTC

* add missing tests for SpeechT5ForTextToSpeech

* sort tests by name

* fix Hi-Fi GAN tests

* fixup

* add speech-to-speech model

* refactor duplicate speech generation code

* add processor for SpeechToSpeech model

* add usage example

* add tests for speech-to-speech model

* fixup

* enable gradient checkpointing for SpeechT5FeatureEncoder

* code review

* push_to_hub now takes repo_id

* improve doc comments for HiFi-GAN config

* add missing test

* add integration tests

* make number of layers in speech decoder prenet configurable

* rename variable

* rename variables

* add auto classes for TTS and S2S

* REMOVE CTC!!!

* S2S processor does not support save/load_pretrained

* fixup

* these models are now in an auto mapping

* fix doc links

* rename HiFiGAN to HifiGan, remove separate config file

* REMOVE auto classes

* there can be only one

* fixup

* replace assert

* reformat

* feature extractor can process input and target at same time

* update checkpoint names

* fix commit hash
2023-02-03 12:43:46 -05:00
..
test_module AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
tf_ops Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
check_config_docstrings.py Create dummy models (#19901) 2022-10-28 13:05:41 +02:00
check_copies.py update template (#20885) 2023-01-04 10:15:45 +01:00
check_doc_toc.py Split model list on modality (#18328) 2022-08-01 11:10:20 -05:00
check_doctest_list.py check paths in utils/documentation_tests.txt (#21315) 2023-01-26 15:33:47 +01:00
check_dummies.py Add some tests for check_dummies (#19146) 2022-09-21 14:54:09 -04:00
check_inits.py Add ESMFold (#19977) 2022-10-31 21:32:58 -04:00
check_repo.py [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
check_self_hosted_runner.py Add offline runners info in the Slack report (#19169) 2022-09-23 19:23:05 +02:00
check_table.py Fix some typos. (#17560) 2022-07-11 05:00:13 -04:00
check_task_guides.py Automated compatible models list for task guides (#21338) 2023-01-27 13:19:28 -05:00
check_tf_ops.py Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
create_dummy_models.py Pipeline testing - using tiny models on Hub (#20426) 2023-01-30 10:39:43 +01:00
custom_init_isort.py Fix init import_structure sorting (#20477) 2022-11-29 09:46:10 -05:00
documentation_tests.txt [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
download_glue_data.py Raise exceptions instead of asserts (#13907) 2021-10-07 12:44:23 +05:30
extract_warnings.py Update some GH action versions (#20537) 2022-12-06 16:54:40 +01:00
get_ci_error_statistics.py Update Past CI report script (#19228) 2022-09-29 19:22:23 +02:00
get_github_job_time.py add a script to get time info. from GA workflow jobs (#18822) 2022-09-01 12:02:52 +02:00
get_modified_files.py Updates the default branch from master to main (#16326) 2022-03-23 03:46:59 -04:00
notification_service_doc_tests.py fix missing block when there is no failure (#18775) 2022-08-29 09:10:13 +02:00
notification_service.py extract warnings in GH workflows (#20487) 2022-11-29 15:58:54 +01:00
past_ci_versions.py Fix past CI (#20967) 2023-01-12 18:04:21 +01:00
prepare_for_doc_test.py Add a check regarding the number of occurrences of ``` (#18389) 2022-08-01 14:23:02 +02:00
print_env.py Print more library versions in CI (#17384) 2022-06-02 10:24:16 +02:00
release.py Clean README in post release job as well. (#17519) 2022-06-02 07:44:03 -04:00
sort_auto_mappings.py Automatically sort auto mappings (#17250) 2022-05-16 13:24:20 -04:00
tests_fetcher.py Add TF image classification example script (#19956) 2023-02-01 19:09:36 +00:00
update_metadata.py Adapt repository creation to latest hf_hub (#21158) 2023-01-18 11:14:00 -05:00