Speech-to-Text Seq2Seq Whisper#

Finetuned hyperlocal languages on pretrained HuggingFace models, https://huggingface.co/mesolitica

This tutorial is available as an IPython notebook at malaya-speech/example/stt-seq2seq-whisper.

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.

This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at malaya-speech/example/pipeline.

Why official OpenAI Whisper instead HuggingFace?#

Some implementation from official repository is much better and evolved into better features, eg, https://github.com/m-bain/whisperX

Install OpenAI Whisper#

Simply,

pip install openai-whisper

[1]:

import malaya_speech
import numpy as np
from malaya_speech import Pipeline

`pyaudio` is not available, `malaya_speech.streaming.stream` is not able to use.

[2]:

import logging

logging.basicConfig(level=logging.INFO)

List available Whisper model#

[3]:

malaya_speech.stt.seq2seq.available_whisper()

INFO:malaya_speech.stt:for `malay-fleur102` language, tested on FLEURS102 `ms_my` test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `malay-malaya` language, tested on malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `singlish` language, tested on IMDA malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt

[3]:

	Size (MB)	malay-malaya	malay-fleur102	singlish	Language
mesolitica/finetune-whisper-tiny-ms-singlish	151	{'WER': 0.20141585, 'CER': 0.071964908}	{'WER': 0.235680975, 'CER': 0.0986880877}	{'WER': 0.09045121, 'CER': 0.0481965}	[malay, singlish]
mesolitica/finetune-whisper-tiny-ms-singlish-v2	151	{'WER': 0.20141585, 'CER': 0.071964908}	{'WER': 0.22459602, 'CER': 0.089406469}	{'WER': 0.138882971, 'CER': 0.074929807}	[malay, singlish]
mesolitica/finetune-whisper-base-ms-singlish-v2	290	{'WER': 0.172632664, 'CER': 0.0680027682}	{'WER': 0.1837319118, 'CER': 0.0599804251}	{'WER': 0.111506313, 'CER': 0.05852830724}	[malay, singlish]
mesolitica/finetune-whisper-small-ms-singlish-v2	967	{'WER': 0.13189875561, 'CER': 0.0434602169}	{'WER': 0.13277694, 'CER': 0.0478108612}	{'WER': 0.09489335668, 'CER': 0.05045327551}	[malay, singlish]

Load Whisper model#

def whisper(
    model: str = 'mesolitica/finetune-whisper-base-ms-singlish-v2',
    force_check: bool = True,
    **kwargs,
):
    """
    Load Finetuned models from HuggingFace.

    Parameters
    ----------
    model : str, optional (default='mesolitica/finetune-whisper-base-ms-singlish-v2')
        Check available models at `malaya_speech.stt.seq2seq.available_whisper()`.
    force_check: bool, optional (default=True)
        Force check model one of malaya model.
        Set to False if you have your own huggingface model.

    Returns
    -------
    result : whisper.model.Whisper class
    """

[9]:

model = malaya_speech.stt.seq2seq.whisper(model = 'mesolitica/finetune-whisper-base-ms-singlish-v2')

Generate#

You can read more at official repository, https://github.com/openai/whisper

[11]:

model = model.to('cpu')

[13]:

import whisper

[14]:

audio = whisper.load_audio('speech/khutbah/wadi-annuar.wav')
audio = whisper.pad_or_trim(audio)

mel = whisper.log_mel_spectrogram(audio).to(model.device)
options = whisper.DecodingOptions(fp16 = False)
result = whisper.decode(model, mel, options)
result.text

[14]:

'dalam perjalanan ini dunia yang susah ini ketika nabi mengajar muaz bin jabal tadi ni alah maha'

[16]:

audio = whisper.load_audio('speech/singlish/singlish0.wav')
audio = whisper.pad_or_trim(audio)

mel = whisper.log_mel_spectrogram(audio).to(model.device)
options = whisper.DecodingOptions(fp16 = False)
result = whisper.decode(model, mel, options)
result.text

[16]:

'how they roll it in film okay actually'

Speech-to-Text Seq2Seq Whisper

Contents

Speech-to-Text Seq2Seq Whisper#

Why official OpenAI Whisper instead HuggingFace?#

Install OpenAI Whisper#

List available Whisper model#

Load Whisper model#

Generate#