FastSpeech2 long text#

This tutorial is available as an IPython notebook at malaya-speech/example/fastspeech2-long-text.

This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.

This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at malaya-speech/example/pipeline.

[1]:

import malaya_speech
import numpy as np
from malaya_speech import Pipeline
import matplotlib.pyplot as plt
import IPython.display as ipd

TTS model#

We are going to use FastSpeech2, really fast and accurate.

List available FastSpeech2#

[2]:

malaya_speech.tts.available_fastspeech2()

[2]:

	Size (MB)	Quantized Size (MB)	Understand punctuation	Is lowercase
male	125	31.7	True	True
female	125	31.7	True	True
husein	125	31.7	True	True
haqkiem	125	31.7	True	True
female-singlish	125	31.7	True	True
osman	125	31.7	True	False
yasmin	125	31.7	True	False
yasmin-sdp	128	33.1	True	False
osman-sdp	128	33.1	True	False

husein voice contributed by Husein-Zolkepli, recorded using low-end microphone in a small room with no reverberation absorber.

haqkiem voice contributed by Haqkiem Hamdan, recorded using high-end microphone in an audio studio.

female-singlish voice contributed by SG National Speech Corpus, recorded using high-end microphone in an audio studio.

Load FastSpeech2 model#

Fastspeech2 use text normalizer from Malaya, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Load-normalizer,

Make sure you install Malaya version > 4.0 to make it works, to get better speech synthesis, make sure Malaya version > 4.9.1,

pip install malaya -U

Read more about FastSpeech2 Text-to-Speech at https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html

[12]:

yasmin = malaya_speech.tts.fastspeech2(model = 'yasmin')
osman = malaya_speech.tts.fastspeech2(model = 'osman')

yasmin and osman are the best speakers for long text TTS task.

Load Vocoder model#

I will use Universal MelGAN in this example, better results, but slower than individual vocoders.

[4]:

universal_melgan = malaya_speech.vocoder.melgan(model = 'universal-1024')

Load sentence tokenizer#

We are going to use Malaya sentence tokenizer, can split even for very complex pattern, https://malaya.readthedocs.io/en/latest/load-tokenizer.html#Sentence-tokenizer

Make sure installed Malaya first,

pip3 install malaya

[5]:

import malaya

[6]:

long_text = """
SHAH ALAM - Pertubuhan Kebajikan Anak Bersatu Selangor (PKABS) bersetuju pihak kerajaan mewujudkan Suruhanjaya Siasatan Diraja (RCI) untuk menyiasat isu kartel daging.

Pengerusinya, Rahmadin Alimuddin berkata, perkara tersebut perlu disiasat sebaiknya kerana isu logo halal palsu membimbangkan umat Islam di negara ini.

Menurutnya, siasatan juga dapat memastikan pembekal daging beku mengikut piawaian yang ditetapkan oleh kerajaan Malaysia sebelum menjualnya di pasaran.

“Saya berharap pihak yang menyiasat isu daging kartel dapat menyelesaikan isu ini dengan adil supaya rakyat Malaysia tidak ragu dengan daging beku yang berada di pasaran,” katanya ketika dihubungi Sinar Harian pada Sabtu.

Terdahulu, Rahmadin dan Presiden Gagasan Baru Harapan Malaysia (GBHM), Mohd Zulfitri Mohd Basir telah menghantar memorandum kepada Suruhanjaya Pencegahan Rasuah Malaysia (SPRM) bagi meminta pihak tersebut menjalankan siasatan berkenaan kes kartel daging import haram di Senai, Johor awal Disember lalu.

Penyerahan memorandum tersebut meminta pihak berkuasa mendedahkan kartel yang terlibat dalam sindiket itu selain meminta penjelasan daging tersebut boleh terlepas daripada pihak berkuasa.
"""

Use Pipeline#

[30]:

p = Pipeline()
pipeline = (
    p.map(malaya.text.function.split_into_sentences)
    .foreach_map(yasmin)
    .foreach_map(lambda x: x['universal-output'])
    .foreach_map(universal_melgan)
)
p.visualize()

[30]:

[31]:

%%time

r = p(long_text)

CPU times: user 2min 23s, sys: 30 s, total: 2min 53s
Wall time: 29.2 s

[32]:

silent_period = np.array([0] * int(22050 * 0.5))
concated = []
for i in range(len(r['vocoder-melgan'])):
    c = [r['vocoder-melgan'][i]]
    if i < len(r['vocoder-melgan']) - 1:
        c.append(silent_period)
    concated.extend(c)

[33]:

ipd.Audio(np.concatenate(concated), rate = 22050)

[33]:

[23]:

p = Pipeline()
pipeline = (
    p.map(malaya.text.function.split_into_sentences)
    .foreach_map(osman)
    .foreach_map(lambda x: x['universal-output'])
    .foreach_map(universal_melgan)
)
p.visualize()

[23]:

[24]:

%%time

r = p(long_text)

CPU times: user 2min 10s, sys: 28.3 s, total: 2min 38s
Wall time: 34.1 s

[28]:

concated = []
for i in range(len(r['vocoder-melgan'])):
    c = [r['vocoder-melgan'][i]]
    if i < len(r['vocoder-melgan']) - 1:
        c.append(silent_period)
    concated.extend(c)

[29]:

ipd.Audio(np.concatenate(concated), rate = 22050)

[29]:

FastSpeech2 long text

Contents

FastSpeech2 long text#

TTS model#

List available FastSpeech2#

Load FastSpeech2 model#

Load Vocoder model#

Load sentence tokenizer#

Use Pipeline#