{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mock Tensorflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [Malaya/example/mock-tensorflow](https://github.com/huseinzol05/Malaya/tree/master/example/mock-tensorflow).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting with Malaya-Speech 1.4.0 and malaya-boilerplate 0.0.24, Tensorflow is no longer necessary to install and if Tensorflow absent, it will be replaced with mock object.\n", "\n", "Let say you installed Malaya on a fresh machine or using a virtual environment," ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "!~/huggingface/bin/pip3 freeze | grep 'tensorflow'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This virtual environment does not have Tensorflow installed." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/husein/dev/malaya-boilerplate/malaya_boilerplate/frozen_graph.py:46: UserWarning: Cannot import beam_search_ops from Tensorflow 1, ['malaya.jawi_rumi.deep_model', 'malaya.phoneme.deep_model', 'malaya.rumi_jawi.deep_model', 'malaya.stem.deep_model'] for stemmer will not available to use, make sure Tensorflow 1 version >= 1.15\n", " warnings.warn(\n", "/home/husein/huggingface/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "`openai-whisper` is not available, native whisper processor is not available, will use huggingface processor instead.\n", "`pyaudio` is not available, `malaya_speech.streaming.stream` is not able to use.\n" ] } ], "source": [ "import malaya_speech" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import tensorflow as tf\n", "\n", "tf.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, everything is a Mock object, what happened if you tried to call a model using Tensorflow as backend?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Call Tensorflow model" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Size (MB)Quantized Size (MB)malay-malayamalay-fleur102Languagesinglish
tiny-conformer24.49.14{'WER': 0.2128108, 'CER': 0.08136871, 'WER-LM'...{'WER': 0.2682816, 'CER': 0.13052725, 'WER-LM'...[malay]NaN
small-conformer49.218.1{'WER': 0.19853302, 'CER': 0.07449528, 'WER-LM...{'WER': 0.23412149, 'CER': 0.1138314813, 'WER-...[malay]NaN
conformer12537.1{'WER': 0.16340855635999124, 'CER': 0.05897205...{'WER': 0.20090442596, 'CER': 0.09616901, 'WER...[malay]NaN
large-conformer404107{'WER': 0.1566839, 'CER': 0.0619715, 'WER-LM':...{'WER': 0.1711028238, 'CER': 0.077953559, 'WER...[malay]NaN
conformer-stack-2mixed13038.5{'WER': 0.1889883954, 'CER': 0.0726845531, 'WE...{'WER': 0.244836948, 'CER': 0.117409327, 'WER-...[malay, singlish]{'WER': 0.08535878149, 'CER': 0.0452357273822,...
small-conformer-singlish49.218.1NaNNaN[singlish]{'WER': 0.087831, 'CER': 0.0456859, 'WER-LM': ...
conformer-singlish12537.1NaNNaN[singlish]{'WER': 0.07779246, 'CER': 0.0403616, 'WER-LM'...
large-conformer-singlish404107NaNNaN[singlish]{'WER': 0.07014733, 'CER': 0.03587201, 'WER-LM...
\n", "
" ], "text/plain": [ " Size (MB) Quantized Size (MB) \\\n", "tiny-conformer 24.4 9.14 \n", "small-conformer 49.2 18.1 \n", "conformer 125 37.1 \n", "large-conformer 404 107 \n", "conformer-stack-2mixed 130 38.5 \n", "small-conformer-singlish 49.2 18.1 \n", "conformer-singlish 125 37.1 \n", "large-conformer-singlish 404 107 \n", "\n", " malay-malaya \\\n", "tiny-conformer {'WER': 0.2128108, 'CER': 0.08136871, 'WER-LM'... \n", "small-conformer {'WER': 0.19853302, 'CER': 0.07449528, 'WER-LM... \n", "conformer {'WER': 0.16340855635999124, 'CER': 0.05897205... \n", "large-conformer {'WER': 0.1566839, 'CER': 0.0619715, 'WER-LM':... \n", "conformer-stack-2mixed {'WER': 0.1889883954, 'CER': 0.0726845531, 'WE... \n", "small-conformer-singlish NaN \n", "conformer-singlish NaN \n", "large-conformer-singlish NaN \n", "\n", " malay-fleur102 \\\n", "tiny-conformer {'WER': 0.2682816, 'CER': 0.13052725, 'WER-LM'... \n", "small-conformer {'WER': 0.23412149, 'CER': 0.1138314813, 'WER-... \n", "conformer {'WER': 0.20090442596, 'CER': 0.09616901, 'WER... \n", "large-conformer {'WER': 0.1711028238, 'CER': 0.077953559, 'WER... \n", "conformer-stack-2mixed {'WER': 0.244836948, 'CER': 0.117409327, 'WER-... \n", "small-conformer-singlish NaN \n", "conformer-singlish NaN \n", "large-conformer-singlish NaN \n", "\n", " Language \\\n", "tiny-conformer [malay] \n", "small-conformer [malay] \n", "conformer [malay] \n", "large-conformer [malay] \n", "conformer-stack-2mixed [malay, singlish] \n", "small-conformer-singlish [singlish] \n", "conformer-singlish [singlish] \n", "large-conformer-singlish [singlish] \n", "\n", " singlish \n", "tiny-conformer NaN \n", "small-conformer NaN \n", "conformer NaN \n", "large-conformer NaN \n", "conformer-stack-2mixed {'WER': 0.08535878149, 'CER': 0.0452357273822,... \n", "small-conformer-singlish {'WER': 0.087831, 'CER': 0.0456859, 'WER-LM': ... \n", "conformer-singlish {'WER': 0.07779246, 'CER': 0.0403616, 'WER-LM'... \n", "large-conformer-singlish {'WER': 0.07014733, 'CER': 0.03587201, 'WER-LM... " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya_speech.stt.transducer.available_transformer()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "tensorflow is not installed. Please install it and try again.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[6], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[43mmalaya_speech\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstt\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtransducer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtransformer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mtiny-conformer\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/huggingface/lib/python3.8/site-packages/herpetologist/__init__.py:100\u001b[0m, in \u001b[0;36mcheck_type..check\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 97\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m p, v \u001b[38;5;129;01min\u001b[39;00m kwargs\u001b[38;5;241m.\u001b[39mitems():\n\u001b[1;32m 98\u001b[0m nested_check(v, p)\n\u001b[0;32m--> 100\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/dev/malaya-speech/malaya_speech/stt/transducer.py:220\u001b[0m, in \u001b[0;36mtransformer\u001b[0;34m(model, quantized, **kwargs)\u001b[0m\n\u001b[1;32m 215\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m model \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m _transformer_availability:\n\u001b[1;32m 216\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 217\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmodel not supported, please check supported models from `malaya_speech.stt.transducer.available_transformer()`.\u001b[39m\u001b[38;5;124m'\u001b[39m\n\u001b[1;32m 218\u001b[0m )\n\u001b[0;32m--> 220\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mstt\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtransducer_load\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 221\u001b[0m \u001b[43m \u001b[49m\u001b[43mmodel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 222\u001b[0m \u001b[43m \u001b[49m\u001b[43mmodule\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mspeech-to-text-transducer\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 223\u001b[0m \u001b[43m \u001b[49m\u001b[43mlanguages\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_transformer_availability\u001b[49m\u001b[43m[\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m]\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mLanguage\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 224\u001b[0m \u001b[43m \u001b[49m\u001b[43mquantized\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mquantized\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 225\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\n\u001b[1;32m 226\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/dev/malaya-speech/malaya_speech/supervised/stt.py:80\u001b[0m, in \u001b[0;36mtransducer_load\u001b[0;34m(model, module, languages, quantized, stt, **kwargs)\u001b[0m\n\u001b[1;32m 72\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 73\u001b[0m path \u001b[38;5;241m=\u001b[39m check_file(\n\u001b[1;32m 74\u001b[0m file\u001b[38;5;241m=\u001b[39mmodel,\n\u001b[1;32m 75\u001b[0m module\u001b[38;5;241m=\u001b[39mmodule,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m 79\u001b[0m )\n\u001b[0;32m---> 80\u001b[0m vocab \u001b[38;5;241m=\u001b[39m \u001b[43msubword_load\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mvocab\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 81\u001b[0m g \u001b[38;5;241m=\u001b[39m load_graph(path[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mmodel\u001b[39m\u001b[38;5;124m'\u001b[39m], \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 82\u001b[0m featurizer \u001b[38;5;241m=\u001b[39m STTFeaturizer(normalize_per_feature\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n", "File \u001b[0;32m~/dev/malaya-speech/malaya_speech/utils/subword.py:47\u001b[0m, in \u001b[0;36mload\u001b[0;34m(path)\u001b[0m\n\u001b[1;32m 43\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mload\u001b[39m(path: \u001b[38;5;28mstr\u001b[39m):\n\u001b[1;32m 44\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 45\u001b[0m \u001b[38;5;124;03m Load text file into subword dictionary.\u001b[39;00m\n\u001b[1;32m 46\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m---> 47\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mSubwordTextEncoder\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_from_file\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/dev/malaya-speech/malaya_speech/utils/text_encoder/subword_encoder.py:266\u001b[0m, in \u001b[0;36mSubwordTextEncoder.load_from_file\u001b[0;34m(cls, filename_prefix)\u001b[0m\n\u001b[1;32m 264\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Extracts list of subwords from file.\"\"\"\u001b[39;00m\n\u001b[1;32m 265\u001b[0m filename \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_filename(filename_prefix)\n\u001b[0;32m--> 266\u001b[0m lines, _ \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mcls\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_read_lines_from_file\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilename\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 267\u001b[0m \u001b[38;5;66;03m# Strip wrapping single quotes\u001b[39;00m\n\u001b[1;32m 268\u001b[0m vocab_list \u001b[38;5;241m=\u001b[39m [line[\u001b[38;5;241m1\u001b[39m:\u001b[38;5;241m-\u001b[39m\u001b[38;5;241m1\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m line \u001b[38;5;129;01min\u001b[39;00m lines]\n", "File \u001b[0;32m~/dev/malaya-speech/malaya_speech/utils/text_encoder/__init__.py:117\u001b[0m, in \u001b[0;36mTextEncoder._read_lines_from_file\u001b[0;34m(cls, filename)\u001b[0m\n\u001b[1;32m 115\u001b[0m \u001b[38;5;129m@classmethod\u001b[39m\n\u001b[1;32m 116\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_read_lines_from_file\u001b[39m(\u001b[38;5;28mcls\u001b[39m, filename):\n\u001b[0;32m--> 117\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mread_lines_from_file\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mcls\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;18;43m__name__\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfilename\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/dev/malaya-speech/malaya_speech/utils/text_encoder/__init__.py:537\u001b[0m, in \u001b[0;36mread_lines_from_file\u001b[0;34m(cls_name, filename)\u001b[0m\n\u001b[1;32m 535\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mread_lines_from_file\u001b[39m(cls_name, filename):\n\u001b[1;32m 536\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Read lines from file, parsing out header and metadata.\"\"\"\u001b[39;00m\n\u001b[0;32m--> 537\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[43mGFile\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilename\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mrb\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m \u001b[38;5;28;01mas\u001b[39;00m f:\n\u001b[1;32m 538\u001b[0m lines \u001b[38;5;241m=\u001b[39m [tf\u001b[38;5;241m.\u001b[39mcompat\u001b[38;5;241m.\u001b[39mas_text(line)[:\u001b[38;5;241m-\u001b[39m\u001b[38;5;241m1\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m line \u001b[38;5;129;01min\u001b[39;00m f]\n\u001b[1;32m 539\u001b[0m header_line \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m'\u001b[39m \u001b[38;5;241m%\u001b[39m (_HEADER_PREFIX, cls_name)\n", "File \u001b[0;32m~/dev/malaya-boilerplate/malaya_boilerplate/__init__.py:48\u001b[0m, in \u001b[0;36mMock.__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 47\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__call__\u001b[39m(\u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[0;32m---> 48\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mparent_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m is not installed. Please install it and try again.\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", "\u001b[0;31mValueError\u001b[0m: tensorflow is not installed. Please install it and try again." ] } ], "source": [ "model = malaya_speech.stt.transducer.transformer(model = 'tiny-conformer')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Call PyTorch model\n", "\n", "Starting from malaya 1.3.0, we are going to focus Pytorch for main deep learning backend." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": false }, "outputs": [], "source": [ "model = malaya_speech.stt.transducer.pt_transformer(model = 'mesolitica/conformer-tiny')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "y, _ = malaya_speech.load('speech/example-speaker/husein-zolkepli.wav')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['testing nama saya husin bin zulkifli']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.beam_decoder([y])" ] } ], "metadata": { "kernelspec": { "display_name": "huggingface", "language": "python", "name": "huggingface" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }