Difference: GrmNGramForum (148 vs. 149)

Revision 1492019-02-20 - GuillaumeWisniewski

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

OpenGrm NGram Forum

Line: 18 to 18
 
Added:
>
>

Estimating the perplexity of a language model with OOV

GuillaumeWisniewski - 2019-02-20 - 10:54

I am trying to compute the perplexity of a language model on a test set but I do not understand how I should handle OOV words.

Right now I am using the following set of commands: <verbatim> cat train.txt test.txt > tmp

ngramsymbols < tmp > voc.syms

farcompilestrings -symbols=voc.syms -keep_symbols=1 train.fr > train.far ngramcount -order=3 train.far > train.cnt ngrammake train.cnt > train.mod

farcompilestrings -symbols=voc.syms -keep_symbols=1 test.fr > test.far

../bin/ngramapply en3gram.mod test.far </verbatim>

but as soon as the test corpus contains an OOV, the FST returned by the ngramapply command are empty.

What did I did wrong ?

Thanks

<--/commentPlugin-->
Log In

 

Compiling 1.3.4 with openfst 1.6.9 in non-standard location

EstherJudd - 2018-10-25 - 17:23

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback