transformers/tests
Matthijs Hollemans 4ece3b9433
add VITS model (#24085)
* add VITS model

* let's vits

* finish TextEncoder (mostly)

* rename VITS to Vits

* add StochasticDurationPredictor

* ads flow model

* add generator

* correctly set vocab size

* add tokenizer

* remove processor & feature extractor

* add PosteriorEncoder

* add missing weights to SDP

* also convert LJSpeech and VCTK checkpoints

* add training stuff in forward

* add placeholder tests for tokenizer

* add placeholder tests for model

* starting cleanup

* let the great renaming begin!

* use config

* global_conditioning

* more cleaning

* renaming variables

* more renaming

* more renaming

* it never ends

* reticulating the splines

* more renaming

* HiFi-GAN

* doc strings for main model

* fixup

* fix-copies

* don't make it a PreTrainedModel

* fixup

* rename config options

* remove training logic from forward pass

* simplify relative position

* use actual checkpoint

* style

* PR review fixes

* more review changes

* fixup

* more unit tests

* fixup

* fix doc test

* add integration test

* improve tokenizer tests

* add tokenizer integration test

* fix tests on GPU (gave OOM)

* conversion script can handle repos from hub

* add conversion script for all MMS-TTS checkpoints

* automatically create a README for the converted checkpoint

* small changes to config

* push README to hub

* only show uroman note for checkpoints that need it

* remove conversion script because code formatting breaks the readme

* make WaveNet layers configurable

* rename variables

* simplifying the math

* output attentions and hidden states

* remove VitsFlip in flow model

* also got rid of the other flip

* fix tests

* rename more variables

* rename tokenizer, add phonemization

* raise error when phonemizer missing

* re-order config docstrings to match method

* change config naming

* remove redundant str -> list

* fix copyright: vits authors -> kakao enterprise

* (mean, log_variances) -> (prior_mean, prior_log_variances)

* if return dict -> if not return dict

* speed -> speaking rate

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update fused tanh sigmoid

* reduce dims in tester

* audio -> output_values

* audio -> output_values in tuple out

* fix return type

* fix return type

* make _unconstrained_rational_quadratic_spline a function

* all nn's to accept a config

* add spectro to output

* move {speaking rate, noise scale, noise scale duration} to config

* path -> attn_path

* idxs -> valid idxs -> padded idxs

* output values -> waveform

* use config for attention

* make generation work

* harden integration test

* add spectrogram to dict output

* tokenizer refactor

* make style

* remove 'fake' padding token

* harden tokenizer tests

* ron norm test

* fprop / save tests deterministic

* move uroman to tokenizer as much as possible

* better logger message

* fix vivit imports

* add uroman integration test

* make style

* up

* matthijs -> sanchit-gandhi

* fix tokenizer test

* make fix-copies

* fix dict comprehension

* fix config tests

* fix model tests

* make outputs consistent with reverse/not reverse

* fix key concat

* more model details

* add author

* return dict

* speaker error

* labels error

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vits/convert_original_checkpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove uromanize

* add docstrings

* add docstrings for tokenizer

* upper-case skip messages

* fix return dict

* style

* finish tests

* update checkpoints

* make style

* remove doctest file

* revert

* fix docstring

* fix tokenizer

* remove uroman integration test

* add sampling rate

* fix docs / docstrings

* style

* add sr to model output

* fix outputs

* style / copies

* fix docstring

* fix copies

* remove sr from model outputs

* Update utils/documentation_tests.txt

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add sr as allowed attr

---------

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-01 10:50:06 +01:00
..
benchmark [Test refactor 1/5] Per-folder tests reorganization (#15725) 2022-02-23 15:46:28 -05:00
bettertransformer Add methods to PreTrainedModel to use PyTorch's BetterTransformer (#21259) 2023-04-27 11:03:42 +02:00
deepspeed 🚨🚨🚨 [Refactor] Move third-party related utility files into integrations/ folder 🚨🚨🚨 (#25599) 2023-08-25 17:13:34 +02:00
extended [tests] switch to torchrun (#22712) 2023-04-12 08:25:45 -07:00
fixtures [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
generation Generate: general test for decoder-only generation from inputs_embeds (#25687) 2023-08-23 19:17:01 +01:00
models add VITS model (#24085) 2023-09-01 10:50:06 +01:00
optimization Make schedulers picklable by making lr_lambda fns global (#21768) 2023-03-02 12:08:43 -05:00
peft_integration [PEFT] Fix PeftConfig save pretrained when calling add_adapter (#25738) 2023-08-25 08:19:11 +02:00
pipelines Save image_processor while saving pipeline (ImageSegmentationPipeline) (#25884) 2023-08-31 16:08:20 +02:00
quantization 🚨🚨🚨 [Refactor] Move third-party related utility files into integrations/ folder 🚨🚨🚨 (#25599) 2023-08-25 17:13:34 +02:00
repo_utils Document check copies (#25291) 2023-08-04 14:56:29 +02:00
sagemaker Avoid invalid escape sequences, use raw strings (#22936) 2023-04-25 09:17:56 -04:00
tokenization [ PreTrainedTokenizerFast] Keep properties from fast tokenizer (#25053) 2023-07-25 18:45:01 +02:00
tools Add support for for loops in python interpreter (#24429) 2023-06-26 09:58:14 -04:00
trainer Make training args fully immutable (#25435) 2023-08-15 11:47:47 -04:00
utils Support loading base64 images in pipelines (#25633) 2023-08-29 19:24:24 +01:00
__init__.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
test_backbone_common.py Add ViTDet (#25524) 2023-08-29 10:03:52 +01:00
test_configuration_common.py Deal with nested configs better in base class (#25237) 2023-08-04 14:56:09 +02:00
test_configuration_utils.py Deal with nested configs better in base class (#25237) 2023-08-04 14:56:09 +02:00
test_feature_extraction_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_feature_extraction_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_common.py Input data format (#25464) 2023-08-16 17:45:02 +01:00
test_image_processing_utils.py Run hub tests (#24807) 2023-07-13 15:25:45 -04:00
test_image_transforms.py Add input_data_format argument, image transforms (#25462) 2023-08-11 15:09:31 +01:00
test_modeling_common.py [resize_embedding] Introduce pad_to_multiple_of and guidance (#25088) 2023-08-17 17:00:32 +02:00
test_modeling_flax_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_flax_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_tf_common.py Skip test_onnx_runtime_optimize for now (#25560) 2023-08-17 11:23:16 +02:00
test_modeling_tf_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_utils.py Generate: Load generation config when device_map is passed (#25413) 2023-08-10 10:54:26 +01:00
test_pipeline_mixin.py Add Text-To-Speech pipeline (#24952) 2023-08-17 17:34:47 +01:00
test_sequence_feature_extraction_common.py Apply ruff flake8-comprehensions (#21694) 2023-02-22 09:14:54 +01:00
test_tokenization_common.py [split_special_tokens] Add support for split_special_tokens argument to encode (#25081) 2023-08-18 13:26:27 +02:00
test_tokenization_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00