mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
add a note to whisper docs clarifying support of long-form decoding (#19497)
This commit is contained in:
parent
5dcb10d82a
commit
504cd71a6b
@ -25,6 +25,7 @@ Tips:
|
||||
|
||||
- The model usually performs well without requiring any finetuning.
|
||||
- The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation_utils.GenerationMixin.generate`] function for inference.
|
||||
- Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.
|
||||
- One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text.
|
||||
|
||||
This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ). The Tensorflow version of this model was contributed by [amyeroberts](https://huggingface.co/amyeroberts).
|
||||
|
Loading…
Reference in New Issue
Block a user