Difference: GrmNGramForum (148 vs. 149)

Revision 1492019-02-20 - GuillaumeWisniewski

Line: 1 to 1

OpenGrm NGram Forum

Line: 18 to 18

Estimating the perplexity of a language model with OOV

GuillaumeWisniewski - 2019-02-20 - 10:54

I am trying to compute the perplexity of a language model on a test set but I do not understand how I should handle OOV words.

Right now I am using the following set of commands: <verbatim> cat train.txt test.txt > tmp

ngramsymbols < tmp > voc.syms

farcompilestrings -symbols=voc.syms -keep_symbols=1 train.fr > train.far ngramcount -order=3 train.far > train.cnt ngrammake train.cnt > train.mod

farcompilestrings -symbols=voc.syms -keep_symbols=1 test.fr > test.far

../bin/ngramapply en3gram.mod test.far </verbatim>

but as soon as the test corpus contains an OOV, the FST returned by the ngramapply command are empty.

What did I did wrong ?


Log In


Compiling 1.3.4 with openfst 1.6.9 in non-standard location

EstherJudd - 2018-10-25 - 17:23

This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback