Python Extensions to OpenFst
This extension exposes nearly all of the FST
script-level interface in
Python. Like the script-level interface, it supports
arbitrary arcs and weights.
Users may use this extension to rapidly prototype or construct FSTs interactively using the Python REPL. Note that
>>>
indicates the Python interactive prompt; all other typewriter-text lines are print to
stdout
or
stderr
.
The Python module itself is called
fst
.
import fst
FSTs are read in from disk using
fst.Fst
, which takes a filename string as its a required first argument.
>>> f = fst.Fst("vector1.fst")
>>> print f
<vector Fst at 0x7f1e45890e70>
fst.Fst
takes an optional second argument, a string indicating the desired
Fst type. The FST is converted to this type if it is not already of the appropriate type.
>>> c = fst.Fst("const1.fst")
>>> print c
<const Fst at 0x7f1e49aefbe8>
>>> v = fst.Fst("const1.fst", fst_type="vector")
>>> print v
<vector Fst at 0x7f1e45890f10>
This conversion can also be accomplished after instantiation using
fst.convert
.
>>> v = fst.convert(c, fst_type="vector")
>>> print v
<vector Fst at 0x7f1e45890290>
All FSTs support constructive operations such as
composition (
fst.compose
),
intersection (
fst.intersect
), and
reversal (
fst.reverse
), and the result is stored in a vector FST.
>>> cv = fst.compose(c, v)
>>> print cv
<vector Fst at 0x7f1e45890190>
All FSTs also support tests for
equality (
fst.equal
),
equivalence (
fst.equivalent
), and
isomorphism (
fst.isomorphic
).
>>> print fst.isomorphic(c, v)
True
Finally, all FSTs can be written to disk using their
write
instance method.
>>> c.write("const-copy.fst")
FSTs which are
mutable (e.g., vector FSTs) also support destructive operations such as
arc-sorting,
inversion, and
projection. These operations work in place, mutating the instance they are called on, and return nothing. These instance methods are not available for immutable FST types (e.g., const FSTs), and only become available once the FST has been converted to a mutable FST type.
>>> v.arcsort(sort_type="olabel")
>>> v.invert()
>>> v.project()
A few operations (e.g., weight-pushing) are available in both constructive and destructive forms, albeit with slightly different options.
FST properties can be inspected using the
properties
instance method and module-level "CONSTANT_CASE" constants.
>>> print bool(c.properties(fst.ACCEPTOR))
False
>>> print bool(v.properties(fst.MUTABLE))
True
The
FstError
exception signals an error during FST operations.
>>> fst.intersect(c, v)
Error: IntersectFst: input FSTs are not acceptors
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pywrapfst.pyx", line 1481, in pywrapfst.intersect (pywrapfst.cc:10151)
File "pywrapfst.pyx", line 1128, in pywrapfst._init_MutableFst (pywrapfst.cc:7154)
pywrapfst.FstError: Operation failed
To read documentation on individual FST operations, use Python's built-in
help
function.
>>> help(fst.equal)
Help on built-in function equal in module pywrapfst:
equal(...)
Are two FSTs equal?
This function tests whether two FSTs have the same states with the same
numbering and the same transitions with the same labels and weights in the
same order.
Args:
ifst1: The first input Fst.
ifst2: The second input Fst.
delta: Comparison/quantization delta.
Returns:
True if the two transducers satisfy the above condition, else False.
See also: `equivalent`, `isomorphic`, `randequivalent`.
>>> help(v.project)
Help on built-in function project:
project(...)
Converts the FST to an acceptor using input or output labels.
This operation destructively projects an FST onto its domain or range by
either copying each arc's input label to its output label (the default) or
vice versa.
Args:
project_output: Should the output labels be projected?
See also: `decode`, `encode`, `relabel`.
Putting it all together, the following example, based on
Mohri et al. 2002, 2008, shows the construction of an ASR recognition transducer from a pronunciation lexicon L, grammar G, a transducer from context-dependent phones to context-independent phones C, and an HMM set H
(where we assume that the components are all determinizable and, preferably, in the log semiring).
>>> L = fst.Fst("L.fst")
>>> G = fst.Fst("G.fst")
>>> C = fst.Fst("C.fst")
>>> H = fst.Fst("H.fst")
>>> LG = fst.determinize(fst.compose(L, G))
>>> CLG = fst.determinize(fst.compose(C, LG))
>>> HCLG = fst.determinize(fst.compose(H, CLG))
>>> HCLG.minimize()
>>> HCLG.write("hclg.fst")