OpenGrm NGram Forum

Estimating the perplexity of a language model with OOV

GuillaumeWisniewski - 2019-02-20 - 10:54

I am trying to compute the perplexity of a language model on a test set but I do not understand how I should handle OOV words.

Right now I am using the following set of commands: <verbatim> cat train.txt test.txt > tmp

ngramsymbols < tmp > voc.syms

farcompilestrings -symbols=voc.syms -keep_symbols=1 train.fr > train.far ngramcount -order=3 train.far > train.cnt ngrammake train.cnt > train.mod

farcompilestrings -symbols=voc.syms -keep_symbols=1 test.fr > test.far

../bin/ngramapply en3gram.mod test.far </verbatim>

but as soon as the test corpus contains an OOV, the FST returned by the ngramapply command are empty.

What did I did wrong ?


Compiling 1.3.4 with openfst 1.6.9 in non-standard location

EstherJudd - 2018-10-25 - 17:23

