OpenGrm NGram Forum

  Strangely, the kneser_ney method doesn't generate an error.

DanielRenshaw - 2016-07-13 - 07:33

This problem is occurring only when printing the counts to a text format and reading them back in (following the answer to the "Can ngramcount ignore OOVs?" question); the problem occurs even if <unk>s are not removed.

Given counts1.grm, an FST produced by ngramcount, I would have thought "ngramprint counts1.grm | ngramread - counts2.grm" would result in counts2.grm being identical to counts1.grm, but this isn't true in general. With the earnest.cnts example the new version is only slightly different in terms of file size. For part of my own corpus the difference in file size is much more substantial. Could this difference be due to symbol table changes only, or could the procedure change the FST in other ways?

