TWiki> GRM Web>SFstLibrary (revision 16)EditAttach

# OpenGrm SFst Library: Stochastic Finite-State Transducer Library

SFst is a library for normalizing, sampling, combining, and approximating stochastic (or probabilistic) finite-state transducers. These are weighted finite-state transducers, represented in OpenFst library format, that have the following two properties:

1. normalized: the weights of the paths into the future from each state sum to Weight::One()1
2. canonical topology:

For example, an n-gram model produced by the OpenGrm NGram Library is a stochastic FST (provided the `phi_label` is specified to match the backoff label, typically 0, of the n-gram model), but many other topologies are possible.

The following operations are provided for SFSTs:

Operation Usage Description Complexity
Condition Condition(fst, phi_label, delta) modifies input moving towards a globally normalizable FST, controlled by delta >= 0 Time, Space: O(V + E)
Approx Approx(ifst, &ofst, phi_label, delta) approximates a normalized stochastic FST wrt a provided backoff model same as ShortestDistance on the intersection of the input and output FSTs
sfstapprox[--phi_label=\$l][--delta=\$d] in.fst backoff.fst out.fst
IsCanonical IsCanonical(fst, phi_label) checks the second property above holds for a weighted FST Time, Space: O(V + E)
IsNormalized IsNormalized(fst, phi_label, delta) checks the above two properties hold for a weighted FST Time, Space: O(V + E)
GlobalNormalize GlobalNormalize(&fst, phi_label, delta) globally normalizes, when possible2, a canonical weighted FST preserving total path weights (up to a global constant) same as ShortestDistance
sfstnormalize [--method=global] [--phi_label=\$l][--delta=\$d] in.fst out.fst
LocalNormalize LocalNormalize(&fst) locally normalizes, when possible, a canonical weighted FST preserving each state's out-going arc weights up to a state-specific constant Time, Space: O(V + E)
sfstnormalize -method=local in.fst out.fst
NGramApprox NGramApprox(ifst, &ofst, order, phi_label, delta) approximates a normalized stochastic FST as an n-gram model (having `phi_labels` in OpenGrm NGram format) same as ShortestDistance on the intersection of the input and output FSTs
sfstngramapprox [--order=\$o][--phi_label=\$l][--delta=\$d] in.fst out.fst
Perplexity Perplexity(fst, phi_label, unknown_label, unknown_class_size) computes perplexity for a stochastic FST
sfstperplexity [--phi_label=\$l] [-unknown_label=\$u][--unknown_class_size=\$s] in.fst test.far (test sentences are in FST archive format)
PhiNormalize PhiNormalize(&fst, phi_label) normalizes, when possible, a canonical weighted FST by only modifying the failure transitions Time, Space: O(V + E)
sfstnormalize --method=phi [-phi_label=\$l][--delta=\$d] in.fst out.fst
RandGen fst::RandGen(ifst, &ofst, fst::RandGenOptions<SFstArcSelector>(...)) randomly generates paths in a stochastic FST (correctly dealing with failure transitions) see RandGen
sfstrandgen [--phi_label=\$l] [--max_length=\$l] [--npath=\$n] [--seed=\$s] in.fst out.fst
Trim Trim(&fst, phi_label) Removes useless states and transitions in stochastic automata (irrespective of weights) Time, Space: O(V + E * max-phi-order)
sfsttrim -phi_label=\$l in.fst out.fst

1Computation is done internally with Log64Weight. Conversion from the input weight type is done using a `WeightConvert` functor, pre-defined for common weight types like `TropicalWeight` and `LogWeight`.
2Possible when the sum of weight of all successful paths from the initial state is finite (and the input is trim).

Edit | Attach | Watch | Print version |  | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r16 - 2018-05-10 - MichaelRiley

Copyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback