OpenGrm NGram Forum

You need to be a registered user to participate in the discussions.
Log In or Register

OpenGrm NGram Forum

You can start a new discussion here:

You can use the formatting commands describes in TextFormattingRules in your comment.
If you want to post some code, surround it with <verbatim> and </verbatim> tags.
Auto-linking of WikiWords is now disabled in comments, so you can type VectorFst and it won't result in a broken link.
You now need to use <br> to force new lines in your comment (unless inside verbatim tags). However, a blank line will automatically create a new paragraph.

Subject
Comment

Build failure on Fedora 17

JerryJames - 18 Dec 2012 - 10:42

Hi. I maintain several voice-recognition-related packages, including openfst, for the Fedora Linux distribution. I am working on an OpenGrm NGram package. My first attempt at building version 1.0.3 (with GCC 4.7.2 and glibc 2.15) failed:

In file included from ngramrandgen.cc:32:0:
./../include/ngram/ngram-randgen.h:55:48: error: there are no arguments to 'getpid' that depend on a template parameter, so a declaration of 'getpid' must be available [-fpermissive]
./../include/ngram/ngram-randgen.h:55:48: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
ngramrandgen.cc:39:1: error: 'getpid' was not declared in this scope
ngramrandgen.cc:39:1: error: 'getpid' was not declared in this scope

It appears that an explicit #include <unistd.h> is needed in ngram-randgen.h. That header was probably pulled in through some other header in previous versions of either gcc or glibc.

BrianRoark - 19 Dec 2012 - 17:22

ok, that header file will be included in the next version. Thanks for the heads up.

brian

Expected result when using a lattice with ngram perplexity?

JosefNovak - 28 Nov 2012 - 06:40

I was wondering what the expected result is when feeding a lattice, rather than a string/sentence, to the ngramperplexity utility? Is this supported? It seems to report the perplexity of an arbitrary path through the lattice.

BrianRoark - 28 Nov 2012 - 20:46

Hi Josef,

ngramperplexity reports the perplexity of the path through the lattice that you get by taking the first arc out of each state that you reach. (Note that this is what you want for strings encoded as single-path automata.) Not sure what the preferred functionality should be for general lattices. Could make sense to show a warning or an error there; but at this point the onus is on the user to ensure that what is being scored is the same as what you get from farcompilestrings - unweighted, single-path automata. If you have an idea of what preferred functionality would be for non-string lattices, email me.

brian

is there a way to use NGramApply in c++

MarkusFreitag - 19 Oct 2012 - 12:55

Hi,

I do not want to print my fst and execute NGramApply in bash before reading the new fst again in c++.

Is there a method to use the method NGramApply directly in c++ ?

Thanks

BrianRoark - 21 Oct 2012 - 21:32

Hi Markus,

there is no single method; rather there are several ways to perform composition with the model, depending on how you want to interpret the backoff arcs. The most straightforward way to do this in your own code is to look at src/bin/ngramapply.cc and use the composition method for the particular kind of backoff arc, e.g., ngram.FailLMCompose() when interpreting the backoff as a failure transition. In other words, write your own ngramapply method based on inspection of the ngramapply code.

Hope that helps,

brian

MarkusFreitag - 22 Oct 2012 - 09:58

Hi,

thanks, I think yes that should work. I am using FailureArcs and my LM fst is created, so I do not need to build a lm fst out of strings or an ARPA lm.

I first just need to read the fst lm from my disk:

#include <ngram/ngram.h>

fst::StdMutableFst *fstforNGram; fstforNGram->Read($MYNGRAMFST); ngram::NGramModel ngram(fstforNGram); // that seems not to work, as: undefined reference to `ngram::NGramModel::InitModel()'

If I read the lm , I could then just add:

ngram.FailLMCompose(*lattice, &cfst, kSpecialLabel);

and the composed fst should be ready, right?

Thanks for helping

BrianRoark - 23 Oct 2012 - 09:05

Correct, that is the method for composing with failure arcs.

MarkusFreitag - 23 Oct 2012 - 09:30

yes, but I have a problem to read the fst lm in c++:

fst::StdMutableFst *fstforNGram;

fstforNGram->Read($MYNGRAMFST);

to that point it works.

ngram::NGramModel ngram(fstforNGram);

that seems not to work, as: undefined reference to `ngram::NGramModel::InitModel()'

Thanks

Fractional Kneser Ney

JosefNovak - 09 Oct 2012 - 04:31

Hi, I have been using OpenGrm with my Grapheme-to-Phoneme conversion tools for a while now and recently added some functionality to output weighted alignment lattices in .far format.

It is my understanding that these weighted lattices can only currently be utilized with Witten-Bell smoothing; is this correct?

Is there any plan to support fractional counts with Kneser-Ney smoothing, for instance along the lines of,

"Correlated Bigram LSA for Unsupervised Language Model Adaptation", Tam and Schultz.

or would I be best advised to implement this myself?

BrianRoark - 09 Oct 2012 - 09:32

Hi Josef,

Witten-Bell generalizes straightforwardly to fractional counts, as you point out. No immediate plans for new versions of other smoothing methods along those lines, so if that's something that you need urgently, you would need to implement it.

brian

JosefNovak - 09 Oct 2012 - 18:21

Understood, and thanks very much for your response!

FATAL: NegLogDiff

LukeCarmichael - 25 Sep 2012 - 17:21

Hello, I run this sequence of commands with the following output.

home$ ngramcounts a.far > a.cnts
home$  ngrammake --v=4 --method=katz a.cnts > katz.mod
INFO: FstImpl::ReadHeader: source: a.cnts, fst_type: vector, arc_type: standard, version: 2, flags: 3
Count bin   Katz Discounts (1-grams/2-grams/3-grams)
Count = 1   -nan/0.253709/-0.343723
Count = 2   -nan/1.26571/1.19095
Count = 3   -nan/0.467797/-0.532465
Count = 4   -nan/1.29438/1.18879
Count = 5   -nan/0.0740557/-0.489831
Count > 5   1/1/1
FATAL: NegLogDiff: undefined -10.2649 -10.2651

Other methods work fine.

How can I diagnose this problem?

Thanks, Luke

BrianRoark - 26 Sep 2012 - 13:12

Hi Luke,

this is basically a floating point precision issue, the system is trying to subtract two approximately equal numbers (while calculating backoff weights). The new version of the library coming out in a month or so has much improved floating point precision, which will help. In the meantime, you can get this to work by modifying a constant value in src/include/ngram/ngram-model.h which will allow these two numbers to be judged to be approximately equal. Look for: static const double kNormEps = 0.000001; near the top of that file. Change to 0.0001, then recompile.

This sort of problem usually comes up when you train a model with a relatively small vocabulary (like a phone or POS-tag model) and a relatively large corpus. The n-gram counts end up not following Good-Turing assumptions about what the distribution should look like (hence the odd discount values). In those cases, you're probably better off with Witten-Bell smoothing with the --witten_bell_k=15 or something like that. Or even trying an unsmoothed model.

And stay tuned for the next release, which deals more gracefully with some of these small vocabulary scenarios.

Brian

FATAL: NGramModel: bad ngram model topology

BenoitFavre - 10 Sep 2012 - 09:18

I generated an ngram model from a .arpa file with the following command:

ngramread --ARPA lm.arpa > lm.model

ngramread does not complain, but ngraminfo and trying to load the model from C++ code generate the following error:

FATAL: NGramModel: bad ngram model topology

How can I troubleshoot the problem?

BenoitFavre - 10 Sep 2012 - 09:27

Adding verbosity results in more mystery...

ngraminfo --v=2 lm.model INFO: FstImpl::ReadHeader: source: lm.model, fst_type: vector, arc_type: standard, version: 2, flags: 3 INFO: Incomplete # of ascending n-grams: 1377525 FATAL: NGramModel: bad ngram model topology

BrianRoark - 11 Sep 2012 - 10:33

Hi,

that error is coming from a sanity check that verifies that every state in the language model (other than the start and unigram states) is reached by exactly one 'ascending' arc, that goes from a lower order to a higher order state. ARPA format models can diverge from this, by, for example, having 'holes' (e.g., bigrams pruned but trigrams with that bigram as a suffix retained). But ngramread should plug all of those. maybe duplication? I'll email you about this.

BrianRoark - 18 Sep 2012 - 10:50

Benoit found a case where certain 'holes' from a pruned ARPA model were not being filled appropriately in the conversion. The sanity check routines on loading the model ensured that this anomaly was caught (causing the errors he mentioned), and we were able to find the cases where this was occurring and update the code. The updated conversion functions will be in the forthcoming version update release of the library, within the next month or two. In the meantime, if anyone encounters this problem, let me know and I can provide a workaround.

-- CyrilAllauzen - 09 Aug 2012

This topic: Forum > WebHome > GrmNGramForum
Topic revision: r20 - 2012-12-19 - BrianRoark