Speech-to-Text RNNT web inference using Gradio#

Encoder model + RNNT loss web inference using Gradio

This tutorial is available as an IPython notebook at malaya-speech/example/stt-transducer-gradio.

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.

[1]:
import malaya_speech
import numpy as np
from malaya_speech import Pipeline
`pyaudio` is not available, `malaya_speech.streaming.stream` is not able to use.
[2]:
import logging

logging.basicConfig(level=logging.INFO)

List available RNNT model#

[4]:
malaya_speech.stt.transducer.available_transformer()
INFO:malaya_speech.stt:for `malay-fleur102` language, tested on FLEURS102 `ms_my` test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `malay-malaya` language, tested on malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `singlish` language, tested on IMDA malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
[4]:
Size (MB) Quantized Size (MB) malay-malaya malay-fleur102 Language singlish
tiny-conformer 24.4 9.14 {'WER': 0.2128108, 'CER': 0.08136871, 'WER-LM'... {'WER': 0.2682816, 'CER': 0.13052725, 'WER-LM'... [malay] NaN
small-conformer 49.2 18.1 {'WER': 0.19853302, 'CER': 0.07449528, 'WER-LM... {'WER': 0.23412149, 'CER': 0.1138314813, 'WER-... [malay] NaN
conformer 125 37.1 {'WER': 0.16340855635999124, 'CER': 0.05897205... {'WER': 0.20090442596, 'CER': 0.09616901, 'WER... [malay] NaN
large-conformer 404 107 {'WER': 0.1566839, 'CER': 0.0619715, 'WER-LM':... {'WER': 0.1711028238, 'CER': 0.077953559, 'WER... [malay] NaN
conformer-stack-2mixed 130 38.5 {'WER': 0.1889883954, 'CER': 0.0726845531, 'WE... {'WER': 0.244836948, 'CER': 0.117409327, 'WER-... [malay, singlish] {'WER': 0.08535878149, 'CER': 0.0452357273822,...
small-conformer-singlish 49.2 18.1 NaN NaN [singlish] {'WER': 0.087831, 'CER': 0.0456859, 'WER-LM': ...
conformer-singlish 125 37.1 NaN NaN [singlish] {'WER': 0.07779246, 'CER': 0.0403616, 'WER-LM'...
large-conformer-singlish 404 107 NaN NaN [singlish] {'WER': 0.07014733, 'CER': 0.03587201, 'WER-LM...

Load RNNT model#

def transformer(
    model: str = 'conformer',
    quantized: bool = False,
    **kwargs,
):
    """
    Load Encoder-Transducer ASR model.

    Parameters
    ----------
    model : str, optional (default='conformer')
        Check available models at `malaya_speech.stt.transducer.available_transformer()`.
    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.

    Returns
    -------
    result : malaya_speech.model.transducer.Transducer class
    """
[5]:
model = malaya_speech.stt.transducer.transformer(model = 'conformer')
2023-02-01 12:00:10.062604: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-01 12:00:10.177269: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.177874: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.192428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.193014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.193230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.193560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.759365: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.759985: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.760200: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.760713: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.760905: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.761296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 12845 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-02-01 12:00:10.761541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.761729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:1 with 1119 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6
2023-02-01 12:00:20.326030: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.326701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.327002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.327551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.327780: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.328210: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.328496: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.328834: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.329036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.329358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12845 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-02-01 12:00:20.329384: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.329572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 1119 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6

web inference using Gradio#

def gradio(self, record_mode: bool = True, **kwargs):
    """
    Transcribe an input using beam decoder on Gradio interface.

    Parameters
    ----------
    record_mode: bool, optional (default=True)
        if True, Gradio will use record mode, else, file upload mode.

    **kwargs: keyword arguments for beam decoder and `iface.launch`.
    """

record mode#

[7]:
model.gradio(record_mode = True)
[10]:
from IPython.core.display import Image, display

display(Image('record-mode.png', width=800))
_images/stt-transducer-gradio_13_0.png

upload mode#

[9]:
model.gradio(record_mode = False)
[11]:
from IPython.core.display import Image, display

display(Image('upload-mode.png', width=800))
_images/stt-transducer-gradio_16_0.png
[ ]: