TWiki
>
FST Web
>
FstExtensions
>
PythonExtension
(revision 16) (raw view)
Edit
Attach
---+ OpenFst Python extension This extension exposes the OpenFst [[FstAdvancedUsage#FstScript][scripting API]] to [[https://www.python.org/][Python]]. Like the scripting API, it supports [[FstAdvancedUsage#FstArcs][arbitrary arcs and weights]]. The extension allows for rapid prototyping and interactive construction of FSTs using the Python REPL. Note that this extension is unrelated to, and incompatible with, any other third-party Python extensions for OpenFst (e.g., [[http://pyfst.github.io/][pyfst]]). To install this package, either: * issue =--enable-python= during configuration of OpenFst * Or, install OpenFst with FAR support (=--enable-far=) then install the [[https://pypi.python.org/pypi/openfst][PyPi]] package =openfst= using [[https://pip.pypa.io/en/stable/][Pip]]: =pip install openfst= NB: =>>>= indicates the Python interactive prompt; all other typewriter-text lines are print to stdout or stderr. ---++ Module import The Python module itself is called =pywrapfst= but in this tutorial, we will alias it to =fst=. <verbatim>>>> import pywrapfst as fst</verbatim> ---++ FST construction FSTs can be compiled from [[FstQuickTour#CreatingShellFsts][arc-lists]] in the same format used by the =fstcompile= binary. <verbatim>>>> compiler = fst.Compiler() >>> print >> compiler, "0 1 97 120 .5" >>> print >> compiler, "0 1 98 121 1.5" >>> print >> compiler, "1 2 99 123 2.5" >>> print >> compiler, "2 3.5" >>> f = compiler.compile() # Creates the FST and flushes the compiler buffer. >>> f.num_states 3 >>> f.final(2) <tropical Weight 3.5 at 0x1215ed0></verbatim> FSTs can be read in from disk using =Fst.read=, which takes a string argument specifying the input file's location. <verbatim>>>> v = fst.Fst.read("vector.fst")</verbatim> This class method takes an optional second argument, a string indicating the desired [[FstAdvancedUsage#Fst_Types][FST type]]. The FST is converted to this type if it the on-disk FST is not already of the desired type. <verbatim>>>> c = fst.Fst.read("const.fst") >>> c.fst_type 'const' >>> v = fst.Fst.read("const.fst", fst_type="vector") >>> v.fst_type 'vector'</verbatim> This conversion can also be accomplished after instantiation using the =convert= function. <verbatim>>>> v = fst.convert(c, fst_type="vector") >>> v.fst_type 'vector'</verbatim> Note that this conversion to the =vector= FST type is mandatory if one wishes to perform [[#FstMutation][mutation operations]] on an =const= FST. FSTs can be read in from [[FstExtensions#FstArchives][FST Archives (FARs)]] using the =FarReader= object. <verbatim>>>> reader = fst.FarReader.open("lattice.far")</verbatim> Each FST stored within a FAR has a unique string ID which can be used to extract it from the reader object. <verbatim>>>> f = reader["1best"]</verbatim> Or, all FSTs stored within a FAR may be accessed via iteration over the reader object. <verbatim>>>> for (name, f) in reader: ... print name, f.num_states ('1best', 23) ('2best', 27) ('3best', 27) ...</verbatim> Finally, an empty mutable vector FST can be created using =Fst=. <verbatim>>>> f = fst.Fst()</verbatim> By default, the resulting FST uses =standard= (tropical-weight) arcs, but users can specify other arc types (e.g., log) via an optional argument. <verbatim>>>> f.arc_type 'standard' >>> g = fst.Fst("log") >>> g.arc_type 'log'</verbatim> ---++ FST object attributes and properties All FSTs have the following read-only attributes ("properties" in Python jargon):<br /><br /> | =arc_type= | A string indicating the arc type | | =input_symbols= | The input =SymbolTable=, or =None= if no input table is set | | =fst_type= | A string indicating the FST (container) type | | =output_symbols= | The output =SymbolTable=, or =None= if no output table is set | | =start= | The state ID for the start state | | =weight_type= | A string indicating the weight type | <br />Mutable FSTs also provide the =num_states= attribute, which indicates the number of states in the FST. To access FST properties (i.e., cyclicity, weightedness), use the =properties= method. <verbatim>>>> print "Is f cyclic?", f.properties(fst.CYCLIC, True) == fst.CYCLIC Is f cyclic? True</verbatim> ---++ FST access and iteration FST arcs and states can be accessed via the =StateIterator=, =ArcIterator=, and =MutableArcIterator= objects. These are most naturally constructed using the =states= and =arcs= methods, as follows. <verbatim>>>> for state in f.states(): ... for arc in f.arcs(state): ... print state, arc.ilabel, arc.olabel, arc.weight, arc.nextstate 0 97 120 1.5 1 0 98 121 2.5 1 1 99 123 2.5 2</verbatim> The final weight of a state can be accessed using the =final= instance method. <verbatim>>>> for state in f.states(): ... print state, f.final(state) 0 Infinity 1 Infinity 2 3.5</verbatim> The following function can be used to count the number of arcs and states in an FST. <verbatim>>>> def num_arcs_and_states(f): ... return sum(1 + f.num_arcs(s) for s in f.states())</verbatim> ---++ FST mutation #FstMutation Mutable FSTs can be modified by adding states (=add_state=), adding arcs leaving existing states (=add_arc=), marking a existing state as the start state (=set_start=), or giving a non-infinite final weight to an existing state (=set_final=). Optionally, the user can reserve states before adding them using the =reserve_states= instance method, and reserve arcs leaving an existing state using the =reserve_arcs= method. The following snippet creates an acceptor which, when its arc labels are interpreted as bytes, accepts the well-known "sheep language" =/baa+/=. <verbatim>>>> f = fst.Fst() >>> f.reserve_states(3) # Optional. >>> s = f.add_state() >>> f.set_start(s) >>> n = f.add_state() >>> f.reserve_arcs(s, 1) # Optional. >>> f.add_arc(s, 98, 98, fst.Weight.One(f.weight_type), n) >>> s = n >>> n = f.add_state() >>> f.reserve_arcs(s, 1) # Optional. >>> f.add_arc(s, 97, 97, fst.Weight.One(f.weight_type), n) >>> s = n >>> n = f.add_state() >>> f.reserve_arcs(s, 1) # Optional. >>> f.add_arc(s, 97, 97, fst.Weight.One(f.weight_type), n) >>> f.reserve_arcs(n, 1) # Optional. >>> f.add_arc(n, 97, 97, fst.Weight.One(f.weight_type), n) >>> f.set_final(n, fst.Weight.One(f.weight_type)) >>> f.verify() # Checks FST's sanity. True >>> print f 0 1 98 98 1 2 97 97 2 3 97 97 3 3 97 97 3 </verbatim> While it is possible to add arcs whose destination state has not yet been added, any other references to states not yet created (by =add_state=) is forbidden and will raise an =FstIndexError=. Existing arcs and states can also be deleted using =delete_states=, and arcs leaving an existing state can be deleted using =delete_arcs=. For example, the following function can be used to remove all arcs and states from an FST. <verbatim>>>> def clear(f): ... for state in f.states(): ... f.delete_arcs(state) ... f.delete_states() </verbatim> ---++ FST visualization The instance method text returns a string representing the FST as an arc-list using the same format and options as the =fstprint= binary. If =f= is an FST, then =print f= is an alias for =print f.text()=. <verbatim>>>> print f 0 1 98 98 1 2 97 97 2 3 97 97 3 3 97 97 3</verbatim> FSTs can also be written to a [[http://graphviz.org][GraphViz]] file using the =draw= instance method. <verbatim>>>> f.draw("f.gv")</verbatim> ---++ FST operations All FSTs support constructive operations such as [[ComposeDoc][composition]] (=compose=), [[IntersectDoc][intersection]] (=intersect=), and [[ReverseDoc][reversal]] (=reverse=), storing the result in a vector FST. <verbatim>>>> cv = fst.compose(c, v)</verbatim> FSTs also support tests for [[EqualDoc][equality]] (=equal=), [[EquivalentDoc][equivalence]] (=equivalent=), [[RandEquivalentDoc][stochastic equivalence]] (=randequivalent=), and [[IsomorphicDoc][isomorphism]] (=isomorphic=). <verbatim>>>> fst.isomorphic(c, v) True</verbatim> FSTs which are [[FstAdvancedUsage#Fst_Types][mutable]] (e.g., =vector= FSTs) also support destructive operations such as [[ArcSortDoc][arc-sorting]] (=arcsort=), [[InvertDoc][inversion]] (=invert=), [[ProjectDoc][projection]] (=project=), and [[UnionDoc][union]] (=union=). These operations work in place, mutating the instance they are called on and returning nothing. These instance methods are not available for immutable FST types (e.g., =const= FSTs). <verbatim>>>> v.arcsort(sort_type="olabel") >>> v.invert() >>> v.project()</verbatim> A few operations (e.g., weight-pushing, epsilon-removal) are available in both constructive and destructive forms, albeit with slightly different options. To read documentation on individual FST operations, use Python's built-in =help= function. <verbatim>>>> help(fst.equal) Help on built-in function equal in module pywrapfst: equal(...) equal(ifst1, ifst2, delta=fst.kDelta) Are two FSTs equal? This function tests whether two FSTS have the same states with the same numbering and the same transitions with the same labels and weights in the same order. Args: ifst1: The first input FST. ifst2: The second input FST. delta: Comparison/quantization delta. Returns: True if the two FSTs satisfy the above conditions, otherwise False. See also: `equivalent`, `isomorphic`, `randequivalent`.</verbatim> ---++ FST output FSTs can be written to disk using the =write= instance method. <verbatim>>>> f.write("f.fst")</verbatim> They also can be written into FARs using the =FarWriter= object. Once created, an FST can be written to the =FarWriter= object using dictionary-style assignment. <verbatim>>>> writer = fst.FarWriter.create("lattice.far") >>> writer["1best"] = 1best >>> writer["2best] = 2best</verbatim> Note that the FAR itself is not guaranteed to be flushed to disk until the =FarWriter= is garbage-collected. Under normal circumstances, calling =del= on the =FarWriter= variable will decrement the reference count to zero and trigger garbage collection on the next cycle. <verbatim>>>> del writer</verbatim> ---++ Worked example Putting it all together, the following example, based on [[FstBackground][Mohri et al. 2002, 2008]], shows the construction of an ASR recognition transducer from a pronunciation lexicon _L_, grammar _G_, a transducer from context-dependent phones to context-independent phones _C_, and an HMM set _H_ (where we assume that the components are all determinizable and, preferably, in the log semiring). <verbatim>>>> reader = fst.FarReader("hclg.far") >>> LG = fst.determinize(fst.compose(reader["L"], reader["G"])) >>> CLG = fst.determinize(fst.compose(reader["C"], LG)) >>> HCLG = fst.determinize(fst.compose(reader["H"], CLG)) >>> HCLG.minimize() >>> HCLG.write("hclg.fst")</verbatim>
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r22
|
r18
<
r17
<
r16
<
r15
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r16 - 2016-02-13
-
KyleGorman
FST
Log In
or
Register
FST Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback