TWiki
>
GRM Web
>
SFstLibrary
(revision 16) (raw view)
Edit
Attach
---+ <nop>OpenGrm SFst Library: Stochastic Finite-State Transducer Library _SFst_ is a library for normalizing, sampling, combining, and approximating _stochastic_ (or _probabilistic_) finite-state transducers. These are weighted finite-state transducers, represented in [[http://www.openfst.org][OpenFst library]] format, that have the following two properties: 1 _normalized_: the weights of the paths into the future from each state sum to Weight::One()<sup>1</sup> 1 _canonical topology_: * the states are sorted by input label * there may be failure transitions (see [[http://www.openfst.org/twiki/bin/view/FST/FstAdvancedUsage#Matchers][phi-labeled transitions]]) but * there is at most one such transition per state * there are no failure-transition (and/or epsilon-transition) cycles <!-- * let state =s= have a transition labeled by =x=: if there is a failure transition from =s= to =s'=, then either =s'= also has a transition labeled by =x= or it is not possible to read =x= from =s'= even via failure transitions --> * no assumption is made of general determinism or what transitions must be present on failure (unlike in a [[http://www.opengrm.org/twiki/bin/view/GRM/NGramModelFormat][canonical n-gram model]]). For example, an n-gram model produced by the [[NGramLibrary][OpenGrm NGram Library]] is a stochastic FST (provided the =phi_label= is specified to match the backoff label, typically 0, of the n-gram model), but many other topologies are possible. The following operations are provided for SFSTs: | *Operation* | *Usage* | *Description* | *Complexity* | | <nop>Condition | <nop>Condition(fst, phi_label, delta) | modifies input moving towards a globally normalizable FST, controlled by delta >= 0 | Time, Space: _O(V + E)_ | | <nop>Approx | <nop>Approx(ifst, &ofst, phi_label, delta) | approximates a normalized stochastic FST wrt a provided backoff model | same as [[http://www.openfst.org/twiki/bin/view/FST/ShortestDistanceDoc][ShortestDistance]] on the intersection of the input and output FSTs | | | sfstapprox[--phi_label=$l][--delta=$d] in.fst backoff.fst out.fst | | | | <nop>IsCanonical | <nop>IsCanonical(fst, phi_label) | checks the second property above holds for a weighted FST | Time, Space: _O(V + E)_ | | <nop>IsNormalized | <nop>IsNormalized(fst, phi_label, delta) | checks the above two properties hold for a weighted FST | Time, Space: _O(V + E)_ | | <nop>GlobalNormalize | <nop>GlobalNormalize(&fst, phi_label, delta) | globally normalizes, when possible<sup>2</sup>, a canonical weighted FST preserving total path weights (up to a global constant) | same as [[http://www.openfst.org/twiki/bin/view/FST/ShortestDistanceDoc][ShortestDistance]] | | | sfstnormalize [--method=global] [--phi_label=$l][--delta=$d] in.fst out.fst | | | | <nop>LocalNormalize | <nop>LocalNormalize(&fst) | locally normalizes, when possible, a canonical weighted FST preserving each state's out-going arc weights up to a state-specific constant | Time, Space: _O(V + E)_ | | | sfstnormalize -method=local in.fst out.fst | | | | <nop>NGramApprox | <nop>NGramApprox(ifst, &ofst, order, phi_label, delta) | approximates a normalized stochastic FST as an n-gram model (having =phi_labels= in [[NGramQuickTour#ModelFormat][OpenGrm NGram format]]) | same as [[http://www.openfst.org/twiki/bin/view/FST/ShortestDistanceDoc][ShortestDistance]] on the intersection of the input and output FSTs | | | sfstngramapprox [--order=$o][--phi_label=$l][--delta=$d] in.fst out.fst | | | | Perplexity | Perplexity(fst, phi_label, unknown_label, unknown_class_size) | computes perplexity for a stochastic FST | | | | sfstperplexity [--phi_label=$l] [-unknown_label=$u][--unknown_class_size=$s] in.fst test.far | (test sentences are in [[http://www.openfst.org/twiki/bin/view/FST/FstExtensions#FstArchive][FST archive]] format) | | | <nop>PhiNormalize | <nop>PhiNormalize(&fst, phi_label) | normalizes, when possible, a canonical weighted FST by only modifying the failure transitions | Time, Space: _O(V + E)_ | | | sfstnormalize --method=phi [-phi_label=$l][--delta=$d] in.fst out.fst | | | | <nop>RandGen| <nop>fst::RandGen(ifst, &ofst, <nop>fst::RandGenOptions<SFstArcSelector<Arc>>(...)) | randomly generates paths in a stochastic FST (correctly dealing with failure transitions) | see [[http://www.openfst.org/twiki/bin/view/FST/RandGenDoc][RandGen]] | | | sfstrandgen [--phi_label=$l] [--max_length=$l] [--npath=$n] [--seed=$s] in.fst out.fst | | | | <nop>Trim | <nop>Trim(&fst, phi_label) | Removes useless states and transitions in stochastic automata (irrespective of weights) | Time, Space: _O(V + E * max-phi-order)_ | | | sfsttrim -phi_label=$l in.fst out.fst | | | <sup>1</sup>Computation is done internally with [[http://www.openfst.org/twiki/bin/view/FST/FstAdvancedUsage#Weights][Log64Weight]]. Conversion from the input weight type is done using a =WeightConvert= functor, pre-defined for common weight types like =TropicalWeight= and =LogWeight=. <br><sup>2</sup>Possible when the sum of weight of all successful paths from the initial state is finite (and the input is [[http://www.openfst.org/twiki/bin/view/FST/FstGlossary#Trim][trim]]).
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r23
|
r18
<
r17
<
r16
<
r15
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r16 - 2018-05-10
-
MichaelRiley
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback