mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-08 07:10:06 +06:00

* make SpeechT5 model by copying Wav2Vec2 * add paper to docs * whoops added docs in wrong file * remove SpeechT5Tokenizer + put CTC back in the name * remove deprecated class * remove unused docstring * delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead * remove classes we don't need right now * initial stab at speech encoder prenet * add more speech encoder prenet stuff * improve SpeechEncoderPrenet * add encoder (not finished yet) * add relative position bias to self-attention * add encoder CTC layers * fix formatting * add decoder from BART, doesn't work yet * make it work with generate loop * wrap the encoder into a speech encoder class * wrap the decoder in a text decoder class * changed my mind * changed my mind again ;-) * load decoder weights, make it work * add weights for text decoder postnet * add SpeechT5ForCTC model that uses only the encoder * clean up EncoderLayer and DecoderLayer * implement _init_weights in SpeechT5PreTrainedModel * cleanup config + Encoder and Decoder * add head + cross attention masks * improve doc comments * fixup * more cleanup * more fixup * TextDecoderPrenet works now, thanks Kendall * add CTC loss * add placeholders for other pre/postnets * add type annotation * fix freeze_feature_encoder * set padding tokens to 0 in decoder attention mask * encoder attention mask downsampling * remove features_pen calculation * disable the padding tokens thing again * fixup * more fixup * code review fixes * rename encoder/decoder wrapper classes * allow checkpoints to be loaded into SpeechT5Model * put encoder into wrapper for CTC model * clean up conversion script * add encoder for TTS model * add speech decoder prenet * add speech decoder post-net * attempt to reconstruct the generation loop * add speech generation loop * clean up generate_speech * small tweaks * fix forward pass * enable always dropout on speech decoder prenet * sort declaration * rename models * fixup * fix copies * more fixup * make consistency checker happy * add Seq2SeqSpectrogramOutput class * doc comments * quick note about loss and labels * add HiFi-GAN implementation (from Speech2Speech PR) * rename file * add vocoder to TTS model * improve vocoder * working on tokenizer * more better tokenizer * add CTC tokenizer * fix decode and batch_code in CTC tokenizer * fix processor * two processors and feature extractors * use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2 * cleanup * more cleanup * even more fixup * notebooks * fix log-mel spectrograms * support reduction factor * fixup * shift spectrograms to right to create decoder inputs * return correct labels * add labels for stop token prediction * fix doc comments * fixup * remove SpeechT5ForPreTraining * more fixup * update copyright headers * add usage examples * add SpeechT5ProcessorForCTC * fixup * push unofficial checkpoints to hub * initial version of tokenizer unit tests * add slow test * fix failing tests * tests for CTC tokenizer * finish CTC tokenizer tests * processor tests * initial test for feature extractors * tests for spectrogram feature extractor * fixup * more fixup * add decorators * require speech for tests * modeling tests * more tests for ASR model * fix imports * add fake tests for the other models * fixup * remove jupyter notebooks * add missing SpeechT5Model tests * add missing tests for SpeechT5ForCTC * add missing tests for SpeechT5ForTextToSpeech * sort tests by name * fix Hi-Fi GAN tests * fixup * add speech-to-speech model * refactor duplicate speech generation code * add processor for SpeechToSpeech model * add usage example * add tests for speech-to-speech model * fixup * enable gradient checkpointing for SpeechT5FeatureEncoder * code review * push_to_hub now takes repo_id * improve doc comments for HiFi-GAN config * add missing test * add integration tests * make number of layers in speech decoder prenet configurable * rename variable * rename variables * add auto classes for TTS and S2S * REMOVE CTC!!! * S2S processor does not support save/load_pretrained * fixup * these models are now in an auto mapping * fix doc links * rename HiFiGAN to HifiGan, remove separate config file * REMOVE auto classes * there can be only one * fixup * replace assert * reformat * feature extractor can process input and target at same time * update checkpoint names * fix commit hash
298 lines
7.8 KiB
Plaintext
298 lines
7.8 KiB
Plaintext
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# Model outputs
|
|
|
|
All models have outputs that are instances of subclasses of [`~utils.ModelOutput`]. Those are
|
|
data structures containing all the information returned by the model, but that can also be used as tuples or
|
|
dictionaries.
|
|
|
|
Let's see how this looks in an example:
|
|
|
|
```python
|
|
from transformers import BertTokenizer, BertForSequenceClassification
|
|
import torch
|
|
|
|
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
|
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
|
|
|
|
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
|
|
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
|
|
outputs = model(**inputs, labels=labels)
|
|
```
|
|
|
|
The `outputs` object is a [`~modeling_outputs.SequenceClassifierOutput`], as we can see in the
|
|
documentation of that class below, it means it has an optional `loss`, a `logits` an optional `hidden_states` and
|
|
an optional `attentions` attribute. Here we have the `loss` since we passed along `labels`, but we don't have
|
|
`hidden_states` and `attentions` because we didn't pass `output_hidden_states=True` or
|
|
`output_attentions=True`.
|
|
|
|
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
|
|
will get `None`. Here for instance `outputs.loss` is the loss computed by the model, and `outputs.attentions` is
|
|
`None`.
|
|
|
|
When considering our `outputs` object as tuple, it only considers the attributes that don't have `None` values.
|
|
Here for instance, it has two elements, `loss` then `logits`, so
|
|
|
|
```python
|
|
outputs[:2]
|
|
```
|
|
|
|
will return the tuple `(outputs.loss, outputs.logits)` for instance.
|
|
|
|
When considering our `outputs` object as dictionary, it only considers the attributes that don't have `None`
|
|
values. Here for instance, it has two keys that are `loss` and `logits`.
|
|
|
|
We document here the generic model outputs that are used by more than one model type. Specific output types are
|
|
documented on their corresponding model page.
|
|
|
|
## ModelOutput
|
|
|
|
[[autodoc]] utils.ModelOutput
|
|
- to_tuple
|
|
|
|
## BaseModelOutput
|
|
|
|
[[autodoc]] modeling_outputs.BaseModelOutput
|
|
|
|
## BaseModelOutputWithPooling
|
|
|
|
[[autodoc]] modeling_outputs.BaseModelOutputWithPooling
|
|
|
|
## BaseModelOutputWithCrossAttentions
|
|
|
|
[[autodoc]] modeling_outputs.BaseModelOutputWithCrossAttentions
|
|
|
|
## BaseModelOutputWithPoolingAndCrossAttentions
|
|
|
|
[[autodoc]] modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions
|
|
|
|
## BaseModelOutputWithPast
|
|
|
|
[[autodoc]] modeling_outputs.BaseModelOutputWithPast
|
|
|
|
## BaseModelOutputWithPastAndCrossAttentions
|
|
|
|
[[autodoc]] modeling_outputs.BaseModelOutputWithPastAndCrossAttentions
|
|
|
|
## Seq2SeqModelOutput
|
|
|
|
[[autodoc]] modeling_outputs.Seq2SeqModelOutput
|
|
|
|
## CausalLMOutput
|
|
|
|
[[autodoc]] modeling_outputs.CausalLMOutput
|
|
|
|
## CausalLMOutputWithCrossAttentions
|
|
|
|
[[autodoc]] modeling_outputs.CausalLMOutputWithCrossAttentions
|
|
|
|
## CausalLMOutputWithPast
|
|
|
|
[[autodoc]] modeling_outputs.CausalLMOutputWithPast
|
|
|
|
## MaskedLMOutput
|
|
|
|
[[autodoc]] modeling_outputs.MaskedLMOutput
|
|
|
|
## Seq2SeqLMOutput
|
|
|
|
[[autodoc]] modeling_outputs.Seq2SeqLMOutput
|
|
|
|
## NextSentencePredictorOutput
|
|
|
|
[[autodoc]] modeling_outputs.NextSentencePredictorOutput
|
|
|
|
## SequenceClassifierOutput
|
|
|
|
[[autodoc]] modeling_outputs.SequenceClassifierOutput
|
|
|
|
## Seq2SeqSequenceClassifierOutput
|
|
|
|
[[autodoc]] modeling_outputs.Seq2SeqSequenceClassifierOutput
|
|
|
|
## MultipleChoiceModelOutput
|
|
|
|
[[autodoc]] modeling_outputs.MultipleChoiceModelOutput
|
|
|
|
## TokenClassifierOutput
|
|
|
|
[[autodoc]] modeling_outputs.TokenClassifierOutput
|
|
|
|
## QuestionAnsweringModelOutput
|
|
|
|
[[autodoc]] modeling_outputs.QuestionAnsweringModelOutput
|
|
|
|
## Seq2SeqQuestionAnsweringModelOutput
|
|
|
|
[[autodoc]] modeling_outputs.Seq2SeqQuestionAnsweringModelOutput
|
|
|
|
## Seq2SeqSpectrogramOutput
|
|
|
|
[[autodoc]] modeling_outputs.Seq2SeqSpectrogramOutput
|
|
|
|
## SemanticSegmenterOutput
|
|
|
|
[[autodoc]] modeling_outputs.SemanticSegmenterOutput
|
|
|
|
## ImageClassifierOutput
|
|
|
|
[[autodoc]] modeling_outputs.ImageClassifierOutput
|
|
|
|
## ImageClassifierOutputWithNoAttention
|
|
|
|
[[autodoc]] modeling_outputs.ImageClassifierOutputWithNoAttention
|
|
|
|
## DepthEstimatorOutput
|
|
|
|
[[autodoc]] modeling_outputs.DepthEstimatorOutput
|
|
|
|
## Wav2Vec2BaseModelOutput
|
|
|
|
[[autodoc]] modeling_outputs.Wav2Vec2BaseModelOutput
|
|
|
|
## XVectorOutput
|
|
|
|
[[autodoc]] modeling_outputs.XVectorOutput
|
|
|
|
## TFBaseModelOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFBaseModelOutput
|
|
|
|
## TFBaseModelOutputWithPooling
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPooling
|
|
|
|
## TFBaseModelOutputWithPoolingAndCrossAttentions
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions
|
|
|
|
## TFBaseModelOutputWithPast
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPast
|
|
|
|
## TFBaseModelOutputWithPastAndCrossAttentions
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFBaseModelOutputWithPastAndCrossAttentions
|
|
|
|
## TFSeq2SeqModelOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFSeq2SeqModelOutput
|
|
|
|
## TFCausalLMOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFCausalLMOutput
|
|
|
|
## TFCausalLMOutputWithCrossAttentions
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions
|
|
|
|
## TFCausalLMOutputWithPast
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFCausalLMOutputWithPast
|
|
|
|
## TFMaskedLMOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFMaskedLMOutput
|
|
|
|
## TFSeq2SeqLMOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFSeq2SeqLMOutput
|
|
|
|
## TFNextSentencePredictorOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFNextSentencePredictorOutput
|
|
|
|
## TFSequenceClassifierOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFSequenceClassifierOutput
|
|
|
|
## TFSeq2SeqSequenceClassifierOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFSeq2SeqSequenceClassifierOutput
|
|
|
|
## TFMultipleChoiceModelOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFMultipleChoiceModelOutput
|
|
|
|
## TFTokenClassifierOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFTokenClassifierOutput
|
|
|
|
## TFQuestionAnsweringModelOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFQuestionAnsweringModelOutput
|
|
|
|
## TFSeq2SeqQuestionAnsweringModelOutput
|
|
|
|
[[autodoc]] modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput
|
|
|
|
## FlaxBaseModelOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutput
|
|
|
|
## FlaxBaseModelOutputWithPast
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutputWithPast
|
|
|
|
## FlaxBaseModelOutputWithPooling
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutputWithPooling
|
|
|
|
## FlaxBaseModelOutputWithPastAndCrossAttentions
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions
|
|
|
|
## FlaxSeq2SeqModelOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqModelOutput
|
|
|
|
## FlaxCausalLMOutputWithCrossAttentions
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions
|
|
|
|
## FlaxMaskedLMOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxMaskedLMOutput
|
|
|
|
## FlaxSeq2SeqLMOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqLMOutput
|
|
|
|
## FlaxNextSentencePredictorOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxNextSentencePredictorOutput
|
|
|
|
## FlaxSequenceClassifierOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxSequenceClassifierOutput
|
|
|
|
## FlaxSeq2SeqSequenceClassifierOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqSequenceClassifierOutput
|
|
|
|
## FlaxMultipleChoiceModelOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxMultipleChoiceModelOutput
|
|
|
|
## FlaxTokenClassifierOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxTokenClassifierOutput
|
|
|
|
## FlaxQuestionAnsweringModelOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxQuestionAnsweringModelOutput
|
|
|
|
## FlaxSeq2SeqQuestionAnsweringModelOutput
|
|
|
|
[[autodoc]] modeling_flax_outputs.FlaxSeq2SeqQuestionAnsweringModelOutput
|