NGramPrint

Description

* By default, only n-grams are printed (without backoff ⟨epsilon⟩ transitions), in the same format as discussed above for reading in n-gram counts: w₁ ... w_k score, where the score will be either the n-gram count or the n-gram probability, depending on whether the model has been normalized. By default, scores are converted from the internal negative log representation to real semiring counts or probabilities.

By using the flag --ARPA, the n-gram model is printed in the well-known ARPA format.
By using the flag --backoff, backoff ⟨epsilon⟩ transitions are printed along with the n-grams.
By using the flag --negativelogs, scores are shown as negative logs, rather than being converted to the real semiring.
By using the flag --integers, scores are converted to the real semiring and rounded to integers.

For writing n-gram counts and ARPA format models, tokens ⟨s⟩ and ⟨/s⟩ are used to represent start-of-sequence and end-of-sequence, respectively. Neither of these symbols are used in our automaton format (see above).

Usage

Complexity

Caveats

References

-- MichaelRiley - 09 Dec 2011

This topic: GRM > WebHome > NGramLibrary > NGramQuickTour > NGramPrint
Topic revision: r1 - 2011-12-09 - MichaelRiley