* Enable large-v3 downloading and update language list
* Fix type annotation
* make fixup
* Export Whisper feature extractor
* Fix error after extractor loading
* Do not use pre-computed mel filters
* Save the full preprocessor properly
* Update docs
* Remove comment
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add alignment heads consistent with each Whisper version
* Remove alignment heads calculation
* Save fast tokenizer format as well
* Fix slow to fast conversion
* Fix bos/eos/pad token IDs in the model config
* Add decoder_start_token_id to config
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix error in convert_openai_to_hf.py: "_download() missing 1 required positional argument: root"
* Fix error in convert_openai_to_hf.py: "TypeError: byte indices must be integers or slices, not str"
* Fix decoder_attention_heads value in convert_openai_to_hf.py.
Correct the assignment for `decoder_attention_heads` in the conversion script for the Whisper model.
* Black reformat convert_openai_to_hf.py file.
* Fix Whisper model configuration defaults (for Tiny).
- Correct encoder/decoder layers and attention heads count.
- Update model width (`d_model`) to 384.
* Add docstring to the convert_openai_to_hf.py script with a doctest
* Add shebang and +x permission to the convert_openai_to_hf.py
* convert_openai_to_hf.py: reuse the read model_bytes in the _download() function
* Move convert_openai_to_hf.py doctest example to whisper.md
* whisper.md: Add an inference example to the Conversion section.
* whisper.md: remove `model.config.forced_decoder_ids` from examples (deprecated)
* whisper.md: Remove "## Format Conversion" section; not used by users
* whisper.md: Use librispeech_asr_dummy dataset and load_dataset()
* first batch of structure improvements for model_docs
* second batch of structure improvements for model_docs
* more structure improvements for model_docs
* more structure improvements for model_docs
* structure improvements for cv model_docs
* more structural refactoring
* addressed feedback about image processors