{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Language Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " | Size (MB) | \n", "LM order | \n", "Description | \n", "Command | \n", "
---|---|---|---|---|
bahasa | \n", "17 | \n", "3 | \n", "Gathered from malaya-speech ASR bahasa transcript | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
bahasa-news | \n", "24 | \n", "3 | \n", "Gathered from malaya-speech bahasa ASR transcr... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
bahasa-combined | \n", "29 | \n", "3 | \n", "Gathered from malaya-speech ASR bahasa transcr... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
redape-community | \n", "887.1 | \n", "4 | \n", "Mirror for https://github.com/redapesolutions/... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 4 ... | \n", "
dump-combined | \n", "310 | \n", "3 | \n", "Academia + News + IIUM + Parliament + Watpadd ... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
manglish | \n", "202 | \n", "3 | \n", "Manglish News + Manglish Reddit + Manglish for... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
bahasa-manglish-combined | \n", "608 | \n", "3 | \n", "Combined `dump-combined` and `manglish`. | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "