transformers/examples/research_projects/wav2vec2/vocab/buckwalter.json
Mohamed El-Geish af8afdc88d
wav2vec2: support datasets other than LibriSpeech (#10581)
* wav2vec2: support datasets other than LibriSpeech

* Formatting run_asr.py to pass code quality test

* bundled orthography options and added verbose logs

* fixing a typo in timit fine-tuning script

* update comment for clarity

* resize_lm_head and load custom vocab from file

* adding a max_duration_in_seconds filter

* do not assign `duration_filter` lambda, use a def

* log untransliterated text as well

* fix base model for arabic

* fix duration filter when target_sr is not set

* drop duration_in_seconds when unneeded

* script for wav2vec2-large-lv60-timit-asr

* fix for "tha" in arabic corpus (huggingface#10581)

* adding more options to work with common_voice

* PR feedback (huggingface#10581)

* small README change
2021-03-18 10:20:26 +03:00

58 lines
733 B
JSON

{
"<pad>": 0,
"<s>": 1,
"</s>": 2,
"<unk>": 3,
"/": 4,
"'": 5,
"|": 6,
">": 7,
"&": 8,
"<": 9,
"}": 10,
"A": 11,
"b": 12,
"p": 13,
"t": 14,
"v": 15,
"j": 16,
"H": 17,
"x": 18,
"d": 19,
"*": 20,
"r": 21,
"z": 22,
"s": 23,
"$": 24,
"S": 25,
"D": 26,
"T": 27,
"Z": 28,
"E": 29,
"g": 30,
"_": 31,
"f": 32,
"q": 33,
"k": 34,
"l": 35,
"m": 36,
"n": 37,
"h": 38,
"w": 39,
"Y": 40,
"y": 41,
"F": 42,
"N": 43,
"K": 44,
"a": 45,
"u": 46,
"i": 47,
"~": 48,
"o": 49,
"`": 50,
"{": 51,
"P": 52,
"J": 53,
"V": 54,
"G": 55
}