NGramPrint

Description

* By default, only n-grams are printed (without backoff ⟨epsilon⟩ transitions), in the same format as discussed above for reading in n-gram counts: w1 ... wk score, where the score will be either the n-gram count or the n-gram probability, depending on whether the model has been normalized. By default, scores are converted from the internal negative log representation to real semiring counts or probabilities.
  • By using the flag --ARPA, the n-gram model is printed in the well-known ARPA format.
  • By using the flag --backoff, backoff ⟨epsilon⟩ transitions are printed along with the n-grams.
  • By using the flag --negativelogs, scores are shown as negative logs, rather than being converted to the real semiring.
  • By using the flag --integers, scores are converted to the real semiring and rounded to integers.

For writing n-gram counts and ARPA format models, tokens ⟨s⟩ and ⟨/s⟩ are used to represent start-of-sequence and end-of-sequence, respectively. Neither of these symbols are used in our automaton format (see above).

Usage

Complexity

Caveats

References

-- MichaelRiley - 09 Dec 2011


This topic: GRM > WebHome > NGramLibrary > NGramQuickTour > NGramPrint
Topic revision: r1 - 2011-12-09 - MichaelRiley
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback