Speech-to-Text RNNT web inference using Gradio#

Encoder model + RNNT loss web inference using Gradio

This tutorial is available as an IPython notebook at malaya-speech/example/stt-transducer-gradio.

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.

[1]:

import malaya_speech
import numpy as np
from malaya_speech import Pipeline

`pyaudio` is not available, `malaya_speech.streaming.stream` is not able to use.

[2]:

import logging

logging.basicConfig(level=logging.INFO)

List available RNNT model#

[4]:

malaya_speech.stt.transducer.available_transformer()

INFO:malaya_speech.stt:for `malay-fleur102` language, tested on FLEURS102 `ms_my` test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `malay-malaya` language, tested on malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt
INFO:malaya_speech.stt:for `singlish` language, tested on IMDA malaya-speech test set, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt

[4]:

	Size (MB)	Quantized Size (MB)	malay-malaya	malay-fleur102	Language	singlish
tiny-conformer	24.4	9.14	{'WER': 0.2128108, 'CER': 0.08136871, 'WER-LM'...	{'WER': 0.2682816, 'CER': 0.13052725, 'WER-LM'...	[malay]	NaN
small-conformer	49.2	18.1	{'WER': 0.19853302, 'CER': 0.07449528, 'WER-LM...	{'WER': 0.23412149, 'CER': 0.1138314813, 'WER-...	[malay]	NaN
conformer	125	37.1	{'WER': 0.16340855635999124, 'CER': 0.05897205...	{'WER': 0.20090442596, 'CER': 0.09616901, 'WER...	[malay]	NaN
large-conformer	404	107	{'WER': 0.1566839, 'CER': 0.0619715, 'WER-LM':...	{'WER': 0.1711028238, 'CER': 0.077953559, 'WER...	[malay]	NaN
conformer-stack-2mixed	130	38.5	{'WER': 0.1889883954, 'CER': 0.0726845531, 'WE...	{'WER': 0.244836948, 'CER': 0.117409327, 'WER-...	[malay, singlish]	{'WER': 0.08535878149, 'CER': 0.0452357273822,...
small-conformer-singlish	49.2	18.1	NaN	NaN	[singlish]	{'WER': 0.087831, 'CER': 0.0456859, 'WER-LM': ...
conformer-singlish	125	37.1	NaN	NaN	[singlish]	{'WER': 0.07779246, 'CER': 0.0403616, 'WER-LM'...
large-conformer-singlish	404	107	NaN	NaN	[singlish]	{'WER': 0.07014733, 'CER': 0.03587201, 'WER-LM...

Load RNNT model#

def transformer(
    model: str = 'conformer',
    quantized: bool = False,
    **kwargs,
):
    """
    Load Encoder-Transducer ASR model.

    Parameters
    ----------
    model : str, optional (default='conformer')
        Check available models at `malaya_speech.stt.transducer.available_transformer()`.
    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.

    Returns
    -------
    result : malaya_speech.model.transducer.Transducer class
    """

[5]:

model = malaya_speech.stt.transducer.transformer(model = 'conformer')

2023-02-01 12:00:10.062604: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-01 12:00:10.177269: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.177874: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.192428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.193014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.193230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.193560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.759365: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.759985: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.760200: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.760713: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.760905: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.761296: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 12845 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-02-01 12:00:10.761541: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:10.761729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:1 with 1119 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6
2023-02-01 12:00:20.326030: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.326701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.327002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.327551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.327780: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.328210: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.328496: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.328834: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.329036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.329358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12845 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-02-01 12:00:20.329384: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 12:00:20.329572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 1119 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090 Ti, pci bus id: 0000:07:00.0, compute capability: 8.6

web inference using Gradio#

def gradio(self, record_mode: bool = True, **kwargs):
    """
    Transcribe an input using beam decoder on Gradio interface.

    Parameters
    ----------
    record_mode: bool, optional (default=True)
        if True, Gradio will use record mode, else, file upload mode.

    **kwargs: keyword arguments for beam decoder and `iface.launch`.
    """

record mode#

[7]:

model.gradio(record_mode = True)

[10]:

from IPython.core.display import Image, display

display(Image('record-mode.png', width=800))

upload mode#

[9]:

model.gradio(record_mode = False)

[11]:

from IPython.core.display import Image, display

display(Image('upload-mode.png', width=800))

[ ]:

Speech-to-Text RNNT web inference using Gradio

Contents

Speech-to-Text RNNT web inference using Gradio#

List available RNNT model#

Load RNNT model#

web inference using Gradio#

record mode#

upload mode#