{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Speech-to-Text CTC HuggingFace + pyctcdecode" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finetuned hyperlocal languages on pretrained CTC HuggingFace models + pyctcdecode with KenLM, https://huggingface.co/mesolitica" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " | Size (MB) | \n", "malay-malaya | \n", "malay-fleur102 | \n", "singlish | \n", "Language | \n", "
---|---|---|---|---|---|
mesolitica/wav2vec2-xls-r-300m-mixed | \n", "1180 | \n", "{'WER': 0.194655128, 'CER': 0.04775798, 'WER-L... | \n", "{'WER': 0.2373861259, 'CER': 0.07055478, 'WER-... | \n", "{'WER': 0.127588595, 'CER': 0.0494924979, 'WER... | \n", "[malay, singlish] | \n", "
mesolitica/wav2vec2-xls-r-300m-mixed-v2 | \n", "1180 | \n", "{'WER': 0.154782923, 'CER': 0.035164031, 'WER-... | \n", "{'WER': 0.2013994374, 'CER': 0.0518170369, 'WE... | \n", "{'WER': 0.2258822139, 'CER': 0.082982312, 'WER... | \n", "[malay, singlish] | \n", "
mesolitica/wav2vec2-xls-r-300m-12layers-ms | \n", "657 | \n", "{'WER': 0.1494983789, 'CER': 0.0342059992, 'WE... | \n", "{'WER': 0.217107489, 'CER': 0.0546614199, 'WER... | \n", "NaN | \n", "[malay] | \n", "
mesolitica/wav2vec2-xls-r-300m-6layers-ms | \n", "339 | \n", "{'WER': 0.1494983789, 'CER': 0.0342059992, 'WE... | \n", "{'WER': 0.217107489, 'CER': 0.0546614199, 'WER... | \n", "NaN | \n", "[malay] | \n", "
mesolitica/wav2vec2-xls-r-300m-3layers-ms | \n", "195 | \n", "{'WER': 0.1494983789, 'CER': 0.0342059992, 'WE... | \n", "{'WER': 0.217107489, 'CER': 0.0546614199, 'WER... | \n", "NaN | \n", "[malay] | \n", "