Description
Command line utility to produce a symbol table from an input text corpus. Creates a symbol entry for every type in the corpus, as well as for
<epsilon> (index 0) and an out-of-vocabulary symbol (last in the symbol table). Command line options
--epsilon_symbol and
--OOV_symbol permit the specification of the labels wanted for those special symbols.
Usage
ngramsymbols [--options] [in.txt [out.txt]]
--epsilon_symbol: type = string, default = <epsilon>
--OOV_symbol: type = string, default = <UNK>
|
|
Examples
$ ngramsymbols <earnest.txt >earnest.syms
Caveats