Mock Tensorflow#

This tutorial is available as an IPython notebook at Malaya/example/mock-tensorflow.

Starting with Malaya-Speech 1.4.0 and malaya-boilerplate 0.0.24, Tensorflow is no longer necessary to install and if Tensorflow absent, it will be replaced with mock object.

Let say you installed Malaya on a fresh machine or using a virtual environment,

[1]:
!~/huggingface/bin/pip3 freeze | grep 'tensorflow'

This virtual environment does not have Tensorflow installed.

[2]:
import malaya_speech
/home/husein/dev/malaya-boilerplate/malaya_boilerplate/frozen_graph.py:46: UserWarning: Cannot import beam_search_ops from Tensorflow 1, ['malaya.jawi_rumi.deep_model', 'malaya.phoneme.deep_model', 'malaya.rumi_jawi.deep_model', 'malaya.stem.deep_model'] for stemmer will not available to use, make sure Tensorflow 1 version >= 1.15
  warnings.warn(
/home/husein/huggingface/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
`openai-whisper` is not available, native whisper processor is not available, will use huggingface processor instead.
`pyaudio` is not available, `malaya_speech.streaming.stream` is not able to use.
[3]:
import tensorflow as tf

tf.__version__
[3]:
<malaya_boilerplate.Mock at 0x7f9a659380a0>

As you can see, everything is a Mock object, what happened if you tried to call a model using Tensorflow as backend?

Call Tensorflow model#

[5]:
malaya_speech.stt.transducer.available_transformer()
[5]:
Size (MB) Quantized Size (MB) malay-malaya malay-fleur102 Language singlish
tiny-conformer 24.4 9.14 {'WER': 0.2128108, 'CER': 0.08136871, 'WER-LM'... {'WER': 0.2682816, 'CER': 0.13052725, 'WER-LM'... [malay] NaN
small-conformer 49.2 18.1 {'WER': 0.19853302, 'CER': 0.07449528, 'WER-LM... {'WER': 0.23412149, 'CER': 0.1138314813, 'WER-... [malay] NaN
conformer 125 37.1 {'WER': 0.16340855635999124, 'CER': 0.05897205... {'WER': 0.20090442596, 'CER': 0.09616901, 'WER... [malay] NaN
large-conformer 404 107 {'WER': 0.1566839, 'CER': 0.0619715, 'WER-LM':... {'WER': 0.1711028238, 'CER': 0.077953559, 'WER... [malay] NaN
conformer-stack-2mixed 130 38.5 {'WER': 0.1889883954, 'CER': 0.0726845531, 'WE... {'WER': 0.244836948, 'CER': 0.117409327, 'WER-... [malay, singlish] {'WER': 0.08535878149, 'CER': 0.0452357273822,...
small-conformer-singlish 49.2 18.1 NaN NaN [singlish] {'WER': 0.087831, 'CER': 0.0456859, 'WER-LM': ...
conformer-singlish 125 37.1 NaN NaN [singlish] {'WER': 0.07779246, 'CER': 0.0403616, 'WER-LM'...
large-conformer-singlish 404 107 NaN NaN [singlish] {'WER': 0.07014733, 'CER': 0.03587201, 'WER-LM...
[6]:
model = malaya_speech.stt.transducer.transformer(model = 'tiny-conformer')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 model = malaya_speech.stt.transducer.transformer(model = 'tiny-conformer')

File ~/huggingface/lib/python3.8/site-packages/herpetologist/__init__.py:100, in check_type.<locals>.check(*args, **kwargs)
     97     for p, v in kwargs.items():
     98         nested_check(v, p)
--> 100 return func(*args, **kwargs)

File ~/dev/malaya-speech/malaya_speech/stt/transducer.py:220, in transformer(model, quantized, **kwargs)
    215 if model not in _transformer_availability:
    216     raise ValueError(
    217         'model not supported, please check supported models from `malaya_speech.stt.transducer.available_transformer()`.'
    218     )
--> 220 return stt.transducer_load(
    221     model=model,
    222     module='speech-to-text-transducer',
    223     languages=_transformer_availability[model]['Language'],
    224     quantized=quantized,
    225     **kwargs
    226 )

File ~/dev/malaya-speech/malaya_speech/supervised/stt.py:80, in transducer_load(model, module, languages, quantized, stt, **kwargs)
     72 else:
     73     path = check_file(
     74         file=model,
     75         module=module,
   (...)
     78         **kwargs,
     79     )
---> 80     vocab = subword_load(path['vocab'])
     81 g = load_graph(path['model'], **kwargs)
     82 featurizer = STTFeaturizer(normalize_per_feature=True)

File ~/dev/malaya-speech/malaya_speech/utils/subword.py:47, in load(path)
     43 def load(path: str):
     44     """
     45     Load text file into subword dictionary.
     46     """
---> 47     return SubwordTextEncoder.load_from_file(path)

File ~/dev/malaya-speech/malaya_speech/utils/text_encoder/subword_encoder.py:266, in SubwordTextEncoder.load_from_file(cls, filename_prefix)
    264 """Extracts list of subwords from file."""
    265 filename = cls._filename(filename_prefix)
--> 266 lines, _ = cls._read_lines_from_file(filename)
    267 # Strip wrapping single quotes
    268 vocab_list = [line[1:-1] for line in lines]

File ~/dev/malaya-speech/malaya_speech/utils/text_encoder/__init__.py:117, in TextEncoder._read_lines_from_file(cls, filename)
    115 @classmethod
    116 def _read_lines_from_file(cls, filename):
--> 117     return read_lines_from_file(cls.__name__, filename)

File ~/dev/malaya-speech/malaya_speech/utils/text_encoder/__init__.py:537, in read_lines_from_file(cls_name, filename)
    535 def read_lines_from_file(cls_name, filename):
    536     """Read lines from file, parsing out header and metadata."""
--> 537     with GFile(filename, 'rb') as f:
    538         lines = [tf.compat.as_text(line)[:-1] for line in f]
    539     header_line = '%s%s' % (_HEADER_PREFIX, cls_name)

File ~/dev/malaya-boilerplate/malaya_boilerplate/__init__.py:48, in Mock.__call__(self, *args, **kwargs)
     47 def __call__(self, *args, **kwargs):
---> 48     raise ValueError(f'{self.parent_name} is not installed. Please install it and try again.')

ValueError: tensorflow is not installed. Please install it and try again.

Call PyTorch model#

Starting from malaya 1.3.0, we are going to focus Pytorch for main deep learning backend.

[7]:
model = malaya_speech.stt.transducer.pt_transformer(model = 'mesolitica/conformer-tiny')
[9]:
y, _ = malaya_speech.load('speech/example-speaker/husein-zolkepli.wav')
[12]:
model.beam_decoder([y])
[12]:
['testing nama saya husin bin zulkifli']