Difference: GrmNGramForum (37 vs. 38)

Revision 382013-07-16 - BrianRoark

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

OpenGrm NGram Forum

Line: 27 to 27
 Any insight or ideas would be greatly appreciated. Thanks!
Added:
>
>

BrianRoark - 15 Jul 2013 - 22:37

Hi,

The model will only have probability mass for the OOV symbol if there are counts for it in the training corpus. ngramperplexity does have a utility for including an OOV probability, but this is done on the fly, not in the model structure itself. If you want to provide probability mass at the unigram state for the OOV symbol, you could create a corpus consisting of just that symbol, train a unigram model, then use ngrammerge to mix (either counts or model) with your main model. Then there would be explicit probability mass allocated to that symbol. You can use merge parameters to dictate how much probability that symbol should have. Hope that helps.

brian

 
<--/commentPlugin-->
Log In
Line: 40 to 49
 I would like to use the 3-gram model to score the up-coming character with history context. What example can I start with ? Is it possible not to convert to farstrings each time ?
Added:
>
>

BrianRoark - 15 Jul 2013 - 22:32

Hi,

This is one of the benefits of having the open-source library interface in C++, you can write functions of your own. We choose to score strings (when calculating perplexity for example) when encoding the strings as fars, but you could perform a similar function in your own C++ code. I would look at the code for ngramperplexity as a starting point, and learn how to use the arc iterators. Once you understand the structure of the model, you should be able to make that work. Alternatively, print out the model using fstprint and read it into your own data structures. Good luck!

brian

 
<--/commentPlugin-->
Log In
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback