mirror of https://github.com/huggingface/transformers.git synced 2025-07-24 23:08:57 +06:00

History

Alex Hedges 95091e1582 Set `cache_dir` for `evaluate.load()` in example scripts (#28422 ) While using `run_clm.py`,[^1] I noticed that some files were being added to my global cache, not the local cache. I set the `cache_dir` parameter for the one call to `evaluate.load()`, which partially solved the problem. I figured that while I was fixing the one script upstream, I might as well fix the problem in all other example scripts that I could. There are still some files being added to my global cache, but this appears to be a bug in `evaluate` itself. This commit at least moves some of the files into the local cache, which is better than before. To create this PR, I made the following regex-based transformation: `evaluate\.load$(.*?)$` -> `evaluate\.load$$1, cache_dir=model_args.cache_dir$`. After using that, I manually fixed all modified files with `ruff` serving as useful guidance. During the process, I removed one existing usage of the `cache_dir` parameter in a script that did not have a corresponding `--cache-dir` argument declared. [^1]: I specifically used `pytorch/language-modeling/run_clm.py` from v4.34.1 of the library. For the original code, see the following URL: `acc394c4f5/examples/pytorch/language-modeling/run_clm.py`.		2024-01-11 15:38:44 +01:00
..
README.md	[Flax Examples] Seq2Seq ASR Fine-Tuning Script (#21764 )	2023-09-29 16:42:58 +01:00
requirements.txt	[Flax Examples] Seq2Seq ASR Fine-Tuning Script (#21764 )	2023-09-29 16:42:58 +01:00
run_flax_speech_recognition_seq2seq.py	Set `cache_dir` for `evaluate.load()` in example scripts (#28422 )	2024-01-11 15:38:44 +01:00

README.md

Automatic Speech Recognition - Flax Examples

Sequence to Sequence

The script run_flax_speech_recognition_seq2seq.py can be used to fine-tune any Flax Speech Sequence-to-Sequence Model for automatic speech recognition on one of the official speech recognition datasets or a custom dataset. This includes the Whisper model from OpenAI, or a warm-started Speech-Encoder-Decoder Model, an example for which is included below.

Whisper Model

We can load all components of the Whisper model directly from the pretrained checkpoint, including the pretrained model weights, feature extractor and tokenizer. We simply have to specify the id of fine-tuning dataset and the necessary training hyperparameters.

The following example shows how to fine-tune the Whisper small checkpoint on the Hindi subset of the Common Voice 13 dataset. Note that before running this script you must accept the dataset's terms of use and register your Hugging Face Hub token on your device by running huggingface-hub login.

python run_flax_speech_recognition_seq2seq.py \
	--model_name_or_path="openai/whisper-small" \
	--dataset_name="mozilla-foundation/common_voice_13_0" \
	--dataset_config_name="hi" \
	--language="hindi" \
	--train_split_name="train+validation" \
	--eval_split_name="test" \
	--output_dir="./whisper-small-hi-flax" \
	--per_device_train_batch_size="16" \
	--per_device_eval_batch_size="16" \
	--num_train_epochs="10" \
	--learning_rate="1e-4" \
	--warmup_steps="500" \
	--logging_steps="25" \
	--generation_max_length="40" \
	--preprocessing_num_workers="32" \
	--dataloader_num_workers="32" \
	--max_duration_in_seconds="30" \
	--text_column_name="sentence" \
	--overwrite_output_dir \
	--do_train \
	--do_eval \
	--predict_with_generate \
	--push_to_hub \
	--use_auth_token

On a TPU v4-8, training should take approximately 25 minutes, with a final cross-entropy loss of 0.02 and word error rate of 34%. See the checkpoint sanchit-gandhi/whisper-small-hi-flax for an example training run.