Text-to-Speech web inference using Gradio

Text-to-Speech web inference using Gradio#

STT Mel synthesis + Vocoder

This tutorial is available as an IPython notebook at malaya-speech/example/tts-gradio.

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.

[1]:

import malaya_speech
import numpy as np
from malaya_speech import Pipeline

For this example, I am going to use GlowTTS. Feel free to use any TTS model.

List available GlowTTS#

[2]:

malaya_speech.tts.available_glowtts()

[2]:

	Size (MB)	Quantized Size (MB)	Understand punctuation	Is lowercase
male	119	27.6	True	True
female	119	27.6	True	True
haqkiem	119	27.6	True	True
female-singlish	119	27.6	True	True
yasmin	119	27.6	True	False
osman	119	27.6	True	False
multispeaker	404	79.9	True	True

Load GlowTTS model#

Fastspeech2 use text normalizer from Malaya, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Load-normalizer,

Make sure you install Malaya version > 4.0 to make it works, to get better speech synthesis, make sure Malaya version > 4.9.1,

pip install malaya -U

def glowtts(model: str = 'yasmin',
            quantized: bool = False,
            pad_to: int = 2,
            **kwargs):
    """
    Load GlowTTS TTS model.

    Parameters
    ----------
    model : str, optional (default='yasmin')
        Model architecture supported. Allowed values:

        * ``'female'`` - GlowTTS trained on female voice.
        * ``'male'`` - GlowTTS trained on male voice.
        * ``'haqkiem'`` - GlowTTS trained on Haqkiem voice, https://www.linkedin.com/in/haqkiem-daim/
        * ``'female-singlish'`` - GlowTTS trained on female Singlish voice, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus
        * ``'yasmin'`` - GlowTTS trained on female Yasmin voice.
        * ``'osman'`` - GlowTTS trained on male Osman voice.
        * ``'multispeaker'`` - Multispeaker GlowTTS trained on male, female, husein and haqkiem voices, also able to do voice conversion.

    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model.
        Quantized model not necessary faster, totally depends on the machine.
    pad_to : int, optional (default=2)
        size of pad character with 0. Increase can stable up prediction on short sentence, we trained on 2.

    Returns
    -------
    result : malaya_speech.model.synthesis.GlowTTS class
    """

[3]:

male = malaya_speech.tts.glowtts(model = 'male')

[4]:

universal_melgan = malaya_speech.vocoder.melgan(model = 'universal-1024')

web inference using Gradio#

def gradio(self, vocoder: Callable, **kwargs):
    """
    Text-to-Speech on Gradio interface.

    Parameters
    ----------
    vocoder: bool, Callable
        vocoder object that has `predict` method, prefer from malaya_speech itself.

    **kwargs: keyword arguments for `predict` and `iface.launch`.
    """

[6]:

male.gradio(universal_melgan)

[7]:

from IPython.core.display import Image, display

display(Image('tts.png', width=800))

[ ]:

Text-to-Speech web inference using Gradio

Contents

Text-to-Speech web inference using Gradio#

List available GlowTTS#

Load GlowTTS model#

web inference using Gradio#