Remove silents using VAD#

Remove silents actually is pretty hard, traditional people use certain dB threshold, if lower, we assume it is a silent with certain window size. If I set -20 dB for one sample audio, does not mean able to do it for another samples.

This tutorial is available as an IPython notebook at malaya-speech/example/remove-silents-vad.

This module is language independent, so it save to use on different languages.

This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at malaya-speech/example/pipeline.

import malaya_speech
import numpy as np
import librosa
from malaya_speech import Pipeline
def norm_mel(y, sr):
    mel = librosa.feature.melspectrogram(y, sr = sr, n_mels = 80)
    return np.log10(np.maximum(mel, 1e-10)).T

def plot(y, sr):
    mel = norm_mel(y, sr)
    fig, axs = plt.subplots(2, figsize=(10, 8))
    im = axs[1].imshow(np.rot90(mel), aspect='auto', interpolation='none')
    fig.colorbar(mappable=im, shrink=0.65, orientation='horizontal', ax=axs[1])

Load easy example#

y, sr = malaya_speech.load('speech/podcast/nusantara.wav')
len(y) / sr
import matplotlib.pyplot as plt
import IPython.display as ipd
ipd.Audio(y, rate = sr)
plot(y, sr)

If you see at waveform graph or mel graph, we can see silent periods at the start, middle and end.

Use librosa.effects.trim#

y_ = librosa.effects.trim(y, top_db = 20)[0]
ipd.Audio(y_, rate = sr)
plot(y_, sr)

Looks good, but it missed silents at the middle.

Use pydub.silence.split_on_silence#

from pydub import AudioSegment
from pydub.silence import split_on_silence

Before changed from float np.array into audiosegment, need to cast to int.

y_int = malaya_speech.astype.float_to_int(y)
audio = AudioSegment(
    frame_rate = sr,
    sample_width = y_int.dtype.itemsize,
    channels = 1
audio_chunks = split_on_silence(
    min_silence_len = 200,
    silence_thresh = -30,
    keep_silence = 100,
[<pydub.audio_segment.AudioSegment at 0x14fb01810>,
 <pydub.audio_segment.AudioSegment at 0x14fb01950>,
 <pydub.audio_segment.AudioSegment at 0x14fb01990>,
 <pydub.audio_segment.AudioSegment at 0x14fb01dd0>,
 <pydub.audio_segment.AudioSegment at 0x14fb07490>]
y_ = sum(audio_chunks)
y_ = np.array(y_.get_array_of_samples())
y_ = malaya_speech.astype.int_to_float(y_)
ipd.Audio(y_, rate = sr)