{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Speaker Vector Nemo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [malaya-speech/example/speaker-vector-nemo](https://github.com/huseinzol05/malaya-speech/tree/master/example/speaker-vector-nemo).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This module is language independent, so it save to use on different languages. Pretrained models trained on multilanguages.\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Does not required to install Nvidia Nemo, Malaya-Speech already exported necessary code and models only.\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at [malaya-speech/example/pipeline](https://github.com/huseinzol05/malaya-speech/tree/master/example/pipeline).\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ['CUDA_VISIBLE_DEVICES'] = ''" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`pyaudio` is not available, `malaya_speech.streaming.pyaudio_vad.stream` is not able to use.\n" ] } ], "source": [ "from malaya_speech import Pipeline\n", "import malaya_speech\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import logging\n", "\n", "logging.basicConfig(level=logging.INFO)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List available Nemo models" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:malaya_speech.speaker_vector:tested on VoxCeleb2 test set. Lower EER is better.\n", "INFO:malaya_speech.speaker_vector:download the test set at https://github.com/huseinzol05/malaya-speech/tree/master/data/voxceleb\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)Embedding SizeEERoriginal from
huseinzol05/nemo-ecapa-tdnn96.81920.02492https://catalog.ngc.nvidia.com/orgs/nvidia/tea...
huseinzol05/nemo-speakernet23.61920.04279https://catalog.ngc.nvidia.com/orgs/nvidia/tea...
huseinzol05/nemo-titanet_large101.61920.02278https://catalog.ngc.nvidia.com/orgs/nvidia/tea...
\n", "
" ], "text/plain": [ " Size (MB) Embedding Size EER \\\n", "huseinzol05/nemo-ecapa-tdnn 96.8 192 0.02492 \n", "huseinzol05/nemo-speakernet 23.6 192 0.04279 \n", "huseinzol05/nemo-titanet_large 101.6 192 0.02278 \n", "\n", " original from \n", "huseinzol05/nemo-ecapa-tdnn https://catalog.ngc.nvidia.com/orgs/nvidia/tea... \n", "huseinzol05/nemo-speakernet https://catalog.ngc.nvidia.com/orgs/nvidia/tea... \n", "huseinzol05/nemo-titanet_large https://catalog.ngc.nvidia.com/orgs/nvidia/tea... " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya_speech.speaker_vector.available_nemo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Smaller EER the better model is**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Nemo model\n", "\n", "```python\n", "def nemo(\n", " model: str = 'huseinzol05/nemo-ecapa-tdnn',\n", " **kwargs,\n", "):\n", " \"\"\"\n", " Load Nemo Speaker verification model.\n", "\n", " Parameters\n", " ----------\n", " model : str, optional (default='huseinzol05/nemo-ecapa-tdnn')\n", " Check available models at `malaya_speech.speaker_vector.available_nemo()`.\n", "\n", " Returns\n", " -------\n", " result : malaya_speech.torch_model.nemo.Model class\n", " \"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:malaya_boilerplate.huggingface:downloading frozen huseinzol05/nemo-ecapa-tdnn/model_config.yaml\n", "INFO:malaya_boilerplate.huggingface:downloading frozen huseinzol05/nemo-ecapa-tdnn/model_weights.ckpt\n", "INFO:malaya_speech.utils.nemo_featurization:PADDING: 16\n" ] } ], "source": [ "model = malaya_speech.speaker_vector.nemo('huseinzol05/nemo-ecapa-tdnn')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from glob import glob\n", "\n", "speakers = ['speech/example-speaker/khalil-nooh.wav',\n", "'speech/example-speaker/mas-aisyah.wav',\n", "'speech/example-speaker/shafiqah-idayu.wav',\n", "'speech/example-speaker/husein-zolkepli.wav'\n", " ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pipeline" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def load_wav(file):\n", " return malaya_speech.load(file)[0]\n", "\n", "p = Pipeline()\n", "frame = p.foreach_map(load_wav).map(model)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p.visualize()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "r = p(speakers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate similarity" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , -0.30169573, -0.33176271, -0.24950222],\n", " [-0.30169573, 1. , -0.39778761, -0.39475821],\n", " [-0.33176271, -0.39778761, 1. , -0.30796176],\n", " [-0.24950222, -0.39475821, -0.30796176, 1. ]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from scipy.spatial.distance import cdist\n", "\n", "1 - cdist(r['speaker-vector-nemo'], r['speaker-vector-nemo'], metric = 'cosine')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember, our files are,\n", "\n", "```python\n", "['speech/example-speaker/khalil-nooh.wav',\n", " 'speech/example-speaker/mas-aisyah.wav',\n", " 'speech/example-speaker/shafiqah-idayu.wav',\n", " 'speech/example-speaker/husein-zolkepli.wav']\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }