{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Text-to-Speech VITS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "VITS, End-to-End." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [malaya-speech/example/tts-vits](https://github.com/huseinzol05/malaya-speech/tree/master/example/tts-vits).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This module is not language independent, so it not save to use on different languages. Pretrained models trained on hyperlocal languages.\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at [malaya-speech/example/pipeline](https://github.com/huseinzol05/malaya-speech/tree/master/example/pipeline).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Required PyTorch >= 1.10.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ['CUDA_VISIBLE_DEVICES'] = ''" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import malaya_speech\n", "import numpy as np\n", "from malaya_speech import Pipeline\n", "import matplotlib.pyplot as plt\n", "import IPython.display as ipd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### VITS description\n", "\n", "1. Malaya-speech VITS generate End-to-End, from text input into waveforms with 22050 sample rate.\n", "2. No length limit, but to get better results, split the text." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List available VITS" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)Understand punctuationIs lowercase
mesolitica/VITS-osman145TrueFalse
mesolitica/VITS-yasmin145TrueFalse
mesolitica/VITS-female-singlish145TrueTrue
mesolitica/VITS-haqkiem145TrueTrue
\n", "
" ], "text/plain": [ " Size (MB) Understand punctuation Is lowercase\n", "mesolitica/VITS-osman 145 True False\n", "mesolitica/VITS-yasmin 145 True False\n", "mesolitica/VITS-female-singlish 145 True True\n", "mesolitica/VITS-haqkiem 145 True True" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya_speech.tts.available_vits()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load VITS model\n", "\n", "Fastspeech2 use text normalizer from Malaya, https://malaya.readthedocs.io/en/latest/load-normalizer.html#Load-normalizer,\n", "\n", "Make sure you install Malaya version > 4.0 to make it works, **to get better speech synthesis, make sure Malaya version > 4.9.1**,\n", "\n", "```bash\n", "pip install malaya -U\n", "```\n", "\n", "```python\n", "def vits(model: str = 'mesolitica/VITS-osman', **kwargs):\n", " \"\"\"\n", " Load VITS End-to-End TTS model.\n", "\n", " Parameters\n", " ----------\n", " model : str, optional (default='male')\n", " Model architecture supported. Allowed values:\n", "\n", " * ``'mesolitica/VITS-osman'`` - VITS trained on male Osman voice.\n", " * ``'mesolitica/VITS-yasmin'`` - VITS trained on female Yasmin voice.\n", " * ``'mesolitica/VITS-female-singlish'`` - VITS trained on female singlish voice.\n", " * ``'mesolitica/VITS-haqkiem'`` - VITS trained on haqkiem voice.\n", "\n", " Returns\n", " -------\n", " result : malaya_speech.torch_model.synthesis.VITS class\n", " \"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "osman = malaya_speech.tts.vits(model = 'mesolitica/VITS-osman')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# https://www.sinarharian.com.my/article/115216/BERITA/Politik/Syed-Saddiq-pertahan-Dr-Mahathir\n", "string1 = 'Syed Saddiq berkata, mereka seharusnya mengingati bahawa semasa menjadi Perdana Menteri Pakatan Harapan'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predict\n", "\n", "```python\n", "def predict(\n", " self,\n", " string,\n", " temperature: float = 0.6666,\n", " temperature_durator: float = 0.6666,\n", " length_ratio: float = 1.0,\n", " **kwargs,\n", "):\n", " \"\"\"\n", " Change string to waveform.\n", "\n", " Parameters\n", " ----------\n", " string: str\n", " temperature: float, optional (default=0.6666)\n", " Decoder model trying to decode with encoder(text) + random.normal() * temperature.\n", " temperature_durator: float, optional (default=0.6666)\n", " Durator trying to predict alignment with random.normal() * temperature_durator.\n", " length_ratio: float, optional (default=1.0)\n", " Increase this variable will increase time voice generated.\n", "\n", " Returns\n", " -------\n", " result: Dict[string, ids, alignment, y]\n", " \"\"\"\n", "```\n", "\n", "It only able to predict 1 text for single feed-forward." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['string', 'ids', 'alignment', 'y'])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r_osman = osman.predict(string1)\n", "r_osman.keys()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig = plt.figure(figsize=(8, 6))\n", "ax = fig.add_subplot(111)\n", "ax.set_title('Alignment steps')\n", "im = ax.imshow(\n", " r_osman['alignment'],\n", " aspect='auto',\n", " origin='lower',\n", " interpolation='none')\n", "fig.colorbar(im, ax=ax)\n", "xlabel = 'Decoder timestep'\n", "plt.xlabel(xlabel)\n", "plt.ylabel('Encoder timestep')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ipd.Audio(r_osman['y'], rate = 22050)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "string2 = 'Haqkiem adalah pelajar tahun akhir yang mengambil Ijazah Sarjana Muda Sains Komputer Kecerdasan Buatan utama dari Universiti Teknikal Malaysia Melaka (UTeM) yang kini berusaha untuk latihan industri di mana dia secara praktikal dapat menerapkan pengetahuannya dalam Perisikan Perisian dan Pengaturcaraan ke arah organisasi atau industri yang berkaitan.'" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['string', 'ids', 'alignment', 'y'])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r_osman = osman.predict(string2)\n", "r_osman.keys()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig = plt.figure(figsize=(8, 6))\n", "ax = fig.add_subplot(111)\n", "ax.set_title('Alignment steps')\n", "im = ax.imshow(\n", " r_osman['alignment'],\n", " aspect='auto',\n", " origin='lower',\n", " interpolation='none')\n", "fig.colorbar(im, ax=ax)\n", "xlabel = 'Decoder timestep'\n", "plt.xlabel(xlabel)\n", "plt.ylabel('Encoder timestep')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ipd.Audio(r_osman['y'], rate = 22050)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }