Text-to-Speech web inference using Gradio#

STT Mel synthesis + Vocoder

This tutorial is available as an IPython notebook at malaya-speech/example/tts-gradio.

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.

[1]:
import malaya_speech
import numpy as np
from malaya_speech import Pipeline

For this example, I am going to use GlowTTS. Feel free to use any TTS model.

List available GlowTTS#

[2]:
malaya_speech.tts.available_glowtts()
[2]:
Size (MB) Quantized Size (MB) Understand punctuation Is lowercase
male 119 27.6 True True
female 119 27.6 True True
haqkiem 119 27.6 True True
female-singlish 119 27.6 True True
yasmin 119 27.6 True False
osman 119 27.6 True False
multispeaker 404 79.9 True True

Load GlowTTS model#

Fastspeech2 use text normalizer from Malaya, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Load-normalizer,

Make sure you install Malaya version > 4.0 to make it works, to get better speech synthesis, make sure Malaya version > 4.9.1,

pip install malaya -U
def glowtts(model: str = 'yasmin',
            quantized: bool = False,
            pad_to: int = 2,
            **kwargs):
    """
    Load GlowTTS TTS model.

    Parameters
    ----------
    model : str, optional (default='yasmin')
        Model architecture supported. Allowed values:

        * ``'female'`` - GlowTTS trained on female voice.
        * ``'male'`` - GlowTTS trained on male voice.
        * ``'haqkiem'`` - GlowTTS trained on Haqkiem voice, https://www.linkedin.com/in/haqkiem-daim/
        * ``'female-singlish'`` - GlowTTS trained on female Singlish voice, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus
        * ``'yasmin'`` - GlowTTS trained on female Yasmin voice.
        * ``'osman'`` - GlowTTS trained on male Osman voice.
        * ``'multispeaker'`` - Multispeaker GlowTTS trained on male, female, husein and haqkiem voices, also able to do voice conversion.

    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.
    pad_to : int, optional (default=2)
        size of pad character with 0. Increase can stable up prediction on short sentence, we trained on 2.

    Returns
    -------
    result : malaya_speech.model.synthesis.GlowTTS class
    """
[3]:
male = malaya_speech.tts.glowtts(model = 'male')
[4]:
universal_melgan = malaya_speech.vocoder.melgan(model = 'universal-1024')

web inference using Gradio#

def gradio(self, vocoder: Callable, **kwargs):
    """
    Text-to-Speech on Gradio interface.

    Parameters
    ----------
    vocoder: bool, Callable
        vocoder object that has `predict` method, prefer from malaya_speech itself.

    **kwargs: keyword arguments for `predict` and `iface.launch`.
    """
[6]:
male.gradio(universal_melgan)
[7]:
from IPython.core.display import Image, display

display(Image('tts.png', width=800))
_images/tts-gradio_13_0.png
[ ]: