Text-to-Speech web inference using Gradio
Contents
Text-to-Speech web inference using Gradio#
STT Mel synthesis + Vocoder
This tutorial is available as an IPython notebook at malaya-speech/example/tts-gradio.
This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.
[1]:
import malaya_speech
import numpy as np
from malaya_speech import Pipeline
For this example, I am going to use GlowTTS. Feel free to use any TTS model.
List available GlowTTS#
[2]:
malaya_speech.tts.available_glowtts()
[2]:
Size (MB) | Quantized Size (MB) | Understand punctuation | Is lowercase | |
---|---|---|---|---|
male | 119 | 27.6 | True | True |
female | 119 | 27.6 | True | True |
haqkiem | 119 | 27.6 | True | True |
female-singlish | 119 | 27.6 | True | True |
yasmin | 119 | 27.6 | True | False |
osman | 119 | 27.6 | True | False |
multispeaker | 404 | 79.9 | True | True |
Load GlowTTS model#
Fastspeech2 use text normalizer from Malaya, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Load-normalizer,
Make sure you install Malaya version > 4.0 to make it works, to get better speech synthesis, make sure Malaya version > 4.9.1,
pip install malaya -U
def glowtts(model: str = 'yasmin',
quantized: bool = False,
pad_to: int = 2,
**kwargs):
"""
Load GlowTTS TTS model.
Parameters
----------
model : str, optional (default='yasmin')
Model architecture supported. Allowed values:
* ``'female'`` - GlowTTS trained on female voice.
* ``'male'`` - GlowTTS trained on male voice.
* ``'haqkiem'`` - GlowTTS trained on Haqkiem voice, https://www.linkedin.com/in/haqkiem-daim/
* ``'female-singlish'`` - GlowTTS trained on female Singlish voice, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus
* ``'yasmin'`` - GlowTTS trained on female Yasmin voice.
* ``'osman'`` - GlowTTS trained on male Osman voice.
* ``'multispeaker'`` - Multispeaker GlowTTS trained on male, female, husein and haqkiem voices, also able to do voice conversion.
quantized : bool, optional (default=False)
if True, will load 8-bit quantized model.
Quantized model not necessary faster, totally depends on the machine.
pad_to : int, optional (default=2)
size of pad character with 0. Increase can stable up prediction on short sentence, we trained on 2.
Returns
-------
result : malaya_speech.model.synthesis.GlowTTS class
"""
[3]:
male = malaya_speech.tts.glowtts(model = 'male')
[4]:
universal_melgan = malaya_speech.vocoder.melgan(model = 'universal-1024')
web inference using Gradio#
def gradio(self, vocoder: Callable, **kwargs):
"""
Text-to-Speech on Gradio interface.
Parameters
----------
vocoder: bool, Callable
vocoder object that has `predict` method, prefer from malaya_speech itself.
**kwargs: keyword arguments for `predict` and `iface.launch`.
"""
[6]:
male.gradio(universal_melgan)
[7]:
from IPython.core.display import Image, display
display(Image('tts.png', width=800))
[ ]: