This utility counts ngrams from an input FST archive. This produces a count FST with the same topology as the eventual normalized model, complete with backoff transitions. The option order specifies the maximum order ngram to count, and the utility counts all ngram orders less than or equal to the parameterized maximum order. The option epsilon_as_backoff causes the counter to interpret <epsilon> as a backoff transition while counting, which is only appropriate in very specialized circumstances (see caveats below).
ngramcount [options] [in.far [out.fst]] order: type = int64, default = 3 epsilon_as_backoff: type = bool, default = false 

class NGramCounter(size_t order); 
The default counts trigrams, bigrams and unigrams from an input corpus:
ngramcount earnest.far >earnest.3g.cnts
To count trigrams, bigrams and unigrams from a single FST using the library functions:
NGramCounter<Log64Weight> ngram_counter(3); StdMutableFst *fst = StdMutableFst::Read("in.fst", true); ngram_counter.Count(*fst); VectorFst<StdArc> fst; ngram_counter.GetFst(&fst); fst.Write("out.fst");
Backoff transitions, labeled with <epsilon>, have weight One() in the semiring. By default, the count FSTs are in the tropical semiring, hence backoff weight is 0 and ngram transitions have weight log(count).
The epsilon_as_backoff switch interprets <epsilon> in the input fst archive as a backoff transition. This is only appropriate when the corpus is randomly sampled from a model and shows where backoff transitions were taken. It allows for the use of the presmoothed method in ngrammake. These are not typical scenarios, hence these options should be used with care.
 MichaelRiley  09 Dec 2011