TWiki> FST Web>FstExtensions>PythonExtension (revision 10)EditAttach

Python Extensions to OpenFst

This extension exposes nearly all of the FST script-level interface in Python. Like the script-level interface, it supports arbitrary arcs and weights.

Note that this extension is unrelated to, and incompatible with, any other third-party Python extensions for OpenFst (e.g., pyfst). Install should not install this extension in the same Python environment as any third-party extensions.

To install this package, issue --enable-python= during configuration. Or, you can install it from PyPi using tools like pip or easy_install.

Users may use this extension to rapidly prototype or construct FSTs interactively using the Python REPL. Note that >>> indicates the Python interactive prompt; all other typewriter-text lines are print to stdout or stderr.

The Python module itself is called fst.

>>> import fst

FSTs are read in from disk using fst.Fst, which takes a filename string as its a required first argument.

>>> f = fst.Fst("vector1.fst")
>>> print f
<vector Fst at 0x7f1e45890e70>

fst.Fst takes an optional second argument, a string indicating the desired FST type. The FST is converted to this type if it is not already of the appropriate type.

>>> c = fst.Fst("const1.fst")
>>> print c
<const Fst at 0x7f1e49aefbe8>
>>> v = fst.Fst("const1.fst", fst_type="vector")
>>> print v
<vector Fst at 0x7f1e45890f10>

This conversion can also be accomplished after instantiation using fst.convert.

>>> v = fst.convert(c, fst_type="vector")
>>> print v
<vector Fst at 0x7f1e45890290>

All FSTs support constructive operations such as composition (fst.compose), intersection (fst.intersect), and reversal (fst.reverse), and the result is stored in a vector FST.

>>> cv = fst.compose(c, v)
>>> print cv
 <vector Fst at 0x7f1e45890190>

All FSTs also support tests for equality (fst.equal), equivalence (fst.equivalent), and isomorphism (fst.isomorphic).

>>> print fst.isomorphic(c, v)
True

Finally, all FSTs can be written to disk using their write instance method.

>>> c.write("const-copy.fst")

FSTs which are mutable (e.g., vector FSTs) also support destructive operations such as arc-sorting, inversion, and projection. These operations work in place, mutating the instance they are called on, and return nothing. These instance methods are not available for immutable FST types (e.g., const FSTs), and only become available once the FST has been converted to a mutable FST type.

>>> v.arcsort(sort_type="olabel")
>>> v.invert()
>>> v.project()

A few operations (e.g., weight-pushing) are available in both constructive and destructive forms, albeit with slightly different options.

FST properties can be inspected using the properties instance method and module-level "CONSTANT_CASE" constants.

>>> print bool(c.properties(fst.ACCEPTOR))
False
>>> print bool(v.properties(fst.MUTABLE))
True

The FstError exception signals an error during FST operations.

>>> fst.intersect(c, v)
Error: IntersectFst: input FSTs are not acceptors
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pywrapfst.pyx", line 1481, in pywrapfst.intersect (pywrapfst.cc:10151)
  File "pywrapfst.pyx", line 1128, in pywrapfst._init_MutableFst (pywrapfst.cc:7154)
pywrapfst.FstError: Operation failed

To read documentation on individual FST operations, use Python's built-in help function.

>>> help(fst.equal)
Help on built-in function equal in module pywrapfst:

equal(...)
    Are two FSTs equal?
    
    This function tests whether two FSTs have the same states with the same
    numbering and the same transitions with the same labels and weights in the
    same order.
    
    Args:
      ifst1: The first input Fst.
      ifst2: The second input Fst.
      delta: Comparison/quantization delta.
    
    Returns:
      True if the two transducers satisfy the above condition, else False.
    
    See also: `equivalent`, `isomorphic`, `randequivalent`.

>>> help(v.project)
Help on built-in function project:

project(...)
    Converts the FST to an acceptor using input or output labels.
    
    This operation destructively projects an FST onto its domain or range by
    either copying each arc's input label to its output label (the default) or
    vice versa.
    
    Args:
      project_output: Should the output labels be projected?
    
    See also: `decode`, `encode`, `relabel`.

Putting it all together, the following example, based on Mohri et al. 2002, 2008, shows the construction of an ASR recognition transducer from a pronunciation lexicon L, grammar G, a transducer from context-dependent phones to context-independent phones C, and an HMM set H (where we assume that the components are all determinizable and, preferably, in the log semiring).

>>> L = fst.Fst("L.fst")
>>> G = fst.Fst("G.fst")
>>> C = fst.Fst("C.fst")
>>> H = fst.Fst("H.fst")
>>> LG = fst.determinize(fst.compose(L, G))
>>> CLG = fst.determinize(fst.compose(C, LG))
>>> HCLG = fst.determinize(fst.compose(H, CLG))
>>> HCLG.minimize()
>>> HCLG.write("hclg.fst")
Edit | Attach | Watch | Print version | History: r22 | r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r10 - 2015-10-23 - KyleGorman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback