{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# KenLM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " | Size (MB) | \n", "LM order | \n", "Description | \n", "Command | \n", "
---|---|---|---|---|
bahasa-news | \n", "107 | \n", "3 | \n", "local news. | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
bahasa-wiki | \n", "70.5 | \n", "3 | \n", "MS wikipedia. | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
redape-community | \n", "887.1 | \n", "4 | \n", "Mirror for https://github.com/redapesolutions/... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 4 ... | \n", "
dump-combined | \n", "310 | \n", "3 | \n", "Academia + News + IIUM + Parliament + Watpadd ... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
manglish | \n", "202 | \n", "3 | \n", "Manglish News + Manglish Reddit + Manglish for... | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "
bahasa-manglish-combined | \n", "608 | \n", "3 | \n", "Combined `dump-combined` and `manglish`. | \n", "[./lmplz --text text.txt --arpa out.arpa -o 3 ... | \n", "