mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-12 17:20:03 +06:00
![]() * start - docs, SpeechT5 copy and rename * add relevant code from FastSpeech2 draft, have tests pass * make it an actual conformer, demo ex. * matching inference with original repo, includes debug code * refactor nn.Sequentials, start more desc. var names * more renaming * more renaming * vocoder scratchwork * matching vocoder outputs * hifigan vocoder conversion script * convert model script, rename some config vars * replace postnet with speecht5's implementation * passing common tests, file cleanup * expand testing, add output hidden states and attention * tokenizer + passing tokenizer tests * variety of updates and tests * g2p_en pckg setup * import structure edits * docstrings and cleanup * repo consistency * deps * small cleanup * forward signature param order * address comments except for masks and labels * address comments on attention_mask and labels * address second round of comments * remove old unneeded line * address comments part 1 * address comments pt 2 * rename auto mapping * fixes for failing tests * address comments part 3 (bart-like, train loss) * make style * pass config where possible * add forward method + tests to WithHifiGan model * make style * address arg passing and generate_speech comments * address Arthur comments * address Arthur comments pt2 * lint changes * Sanchit comment * add g2p-en to doctest deps * move up self.encoder * onnx compatible tensor method * fix is symbolic * fix paper url * move models to espnet org * make style * make fix-copies * update docstring * Arthur comments * update docstring w/ new updates * add model architecture images * header size * md wording update * make style |
||
---|---|---|
.. | ||
asr.md | ||
audio_classification.md | ||
document_question_answering.md | ||
idefics.md | ||
image_captioning.md | ||
image_classification.md | ||
image_to_image.md | ||
knowledge_distillation_for_image_classification.md | ||
language_modeling.md | ||
masked_language_modeling.md | ||
monocular_depth_estimation.md | ||
multiple_choice.md | ||
object_detection.md | ||
prompting.md | ||
question_answering.md | ||
semantic_segmentation.md | ||
sequence_classification.md | ||
summarization.md | ||
text-to-speech.md | ||
token_classification.md | ||
translation.md | ||
video_classification.md | ||
visual_question_answering.md | ||
zero_shot_image_classification.md | ||
zero_shot_object_detection.md |