Difference: PyniniDocs (1 vs. 20)

Revision 202019-07-15 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 1787 to 1787
  Returns the current state index.
Changed:
<
<

StringPaths

>
>

StringPathIterator

 
Changed:
<
<
The string paths object is used to iterate through paths of an acyclic FST, in the forms of input and output label sequences, input and output strings, and path weights.
>
>
The string path iterator object is used to iterate through paths of an acyclic FST, in the forms of input and output label sequences, input and output strings, and path weights.
  See also: Paths algorithm.

Constructor
Changed:
<
<
StringPaths(fst, token_type="byte", isymbols=None, osymbols=None)
>
>
StringPathIterator(fst, token_type="byte", isymbols=None, osymbols=None)
  Iterator for string paths in acyclic FST.
Line: 1831 to 1831
  error(self)
Changed:
<
<
Indicates whether the StringPaths has encountered an error.
>
>
Indicates whether the StringPathIterator has encountered an error.
  Returns:
Changed:
<
<
True if the StringPaths is in an errorful state, False otherwise.
>
>
True if the StringPathIterator is in an errorful state, False otherwise.

ilabels(self)

Returns the input labels for the current path.

Returns: A list of input labels for the current path.

  istring(self)
Line: 1846 to 1853
 Returns: The path's input string.
Deleted:
<
<
next(self)

Advances the iterator.

 istrings(self)

Generates all input strings in the FST.

Line: 1860 to 1863
 Yields: All input strings.
Added:
>
>
next(self)

Advances the iterator.

olabels(self)

Returns the output labels for the current path.

Returns: A list of output labels for the current path.

ostring(self)

Returns the current path's output string.

Returns: The path's output string.

 ostrings(self)

Generates all output strings in the FST.

Line: 1870 to 1891
 Yields: All output strings.
Added:
>
>
weight(self)

Returns the current path's total weight.

Returns: The path's Weight.

 weights(self)

Generates all path weights in the FST.

Line: 1878 to 1906
 responsible for resetting the iterator if desired.

Yields:

Changed:
<
<
All weights.

ostring(self)

Returns the current path's output string.

Returns: The path's output string.

weight(self)

Returns the current path's total weight.

Returns: The path's Weight.

>
>
All weights.
 

SymbolTable

Line: 3021 to 3035
  aiter.next() return f
Changed:
<
<
The method paths returns a StringPaths iterator. This provides several ways to access the paths of an acyclic FST. For each
>
>
The method paths returns a StringPathIterator. This provides several ways to access the paths of an acyclic FST. For each
 path, one can access the input and output strings via the istring and ostring methods, and the path weight via the weight method. One can also access all input strings, all output strings, and all path weights using the istrings, ostrings, and weights methods, which return generators; after invoking any one of these methods, the caller must reset the iterator to reuse it.

See in-module documentation strings for more information on iterators.

Revision 192019-03-27 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 24 to 24
 

Introduction

Changed:
<
<
Pynini ( Gorman 2016) is a library for compiling a grammar of strings, regular expressions, and context-dependent rewrite rules into weighted finite-state transducers.
>
>
Pynini (Gorman 2016) is a library for compiling a grammar of strings, regular expressions, and context-dependent rewrite rules into weighted finite-state transducers.
 

Design

Revision 182018-09-14 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 94 to 94
  The first condition requires that the path begin at a start state, the second condition requires that all transitions be licensed by δ, and the third condition requires that the final transition be to a final state.
Changed:
<
<
Let the string of a path be the (possibly empty) sequence of labels. For instance, if P is (s, l0, q1, (q1, l1, q2), (q2, l2, q3), then the string of P is simply l0, lsub>1, l2. The language described by an FSA is simply the union of all strings of all its paths.
>
>
Let the string of a path be the (possibly empty) sequence of labels. For instance, if P is (s, l0, q1, (q1, l1, q2), (q2, l2, q3), then the string of P is simply l0, l1, l2. The language described by an FSA is simply the union of all strings of all its paths.
 

Example

Revision 172018-08-16 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 2681 to 2681
  queue_type="auto", reverse=False)

Compute the shortest distance from the initial or final state.

Changed:
<
<
manuallymanually
>
>
 This operation computes the shortest distance from the initial state (when `reverse` is False) or from every state to the final state (when `reverse` is True). The shortest distance from p to q is the otimes-sum of the weights of

Revision 162018-08-06 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Deleted:
<
<
 

        ,ggggggggggg,
Line: 29 to 28
 

Design

Changed:
<
<
Pynini supports much of the functionality of Thrax, but whereas Thrax is essentially a compiler for a domain-specific language, Pynini is implemented as a Python extension module. This provides several advantages:
>
>
Pynini supports much of the functionality of Thrax, but whereas Thrax is essentially a compiler for a domain-specific language, Pynini is implemented as a Python extension module. This provides several advantages:
 
  • Pynini users can exploit Python's rich tooling ecosystem, including logging and unit testing frameworks.
  • Pynini users can incorporate Thrax primitives like string compilation into executables.
Line: 40 to 39
  This document covers the following areas:
Changed:
<
<
  1. formal preliminaries underlying finite-state transducers,
  2. getting started with Pynini,
  3. examples of Pynini in action,
  4. API reference, and
  5. advanced topics.
>
>
  1. formal preliminaries underlying finite-state transducers,
  2. getting started with Pynini,
  3. examples of Pynini in action,
  4. API reference, and
  5. advanced topics.
  Those who are familiar with FSTs may prefer to jump straight to getting started with Pynini. Or, for a quick start, you may also wish to read our Pynini tutorial for O'Reilly.
Line: 67 to 66
  360px-Kosher_gumballs.JPG
Changed:
<
<
A working gumball machine can be in one of two states: either its coin slot does, or does not, contain a coin. Initially the gumball machine has an empty coin slot; and turning the knob has no effect on the state of the machine. But once a coin is inserted into the coin slot, turning the knob dispenses a gumball and empties the coin slot. If the knob is turned while the coin slot is vacant, no change of state occurs. However, if the knob is turned while the coin slot is occupied, a gumball is dispensed and the coin slot is cleared. We can represent this schematic description of a gumball machine as a directed graph in which the states are represented by circles and the transitions between states—henceforth, arcs—are represented by arrows between states. Each arc has a pair of labels representing the input consumed, and output emitted, respectively, when that arc is traversed. By convention, the Greek letter ε (epsilon) is used to represent null input and/or output along an arc.
>
>
A working gumball machine can be in one of two states: either its coin slot does, or does not, contain a coin. Initially the gumball machine has an empty coin slot; and turning the knob has no effect on the state of the machine. But once a coin is inserted into the coin slot, turning the knob dispenses a gumball and empties the coin slot. If the knob is turned while the coin slot is vacant, no change of state occurs. However, if the knob is turned while the coin slot is occupied, a gumball is dispensed and the coin slot is cleared. We can represent this schematic description of a gumball machine as a directed graph in which the states are represented by circles and the transitions between states—henceforth, arcs—are represented by arrows between states. Each arc has a pair of labels representing the input consumed, and output emitted, respectively, when that arc is traversed. By convention, the Greek letter ε (epsilon) is used to represent null input and/or output along an arc.
 
Changed:
<
<
gumball.png
>
>
gumball.png
  The bold double-circle state labeled 0 simply indicates that that state is "final", whereas the single-circle state 1 is non-final. Here, this encodes the notion that no user would walk away from a gumball machine while there's still a coin in the slot.
Line: 81 to 80
  Finite-state acceptors (or FSAs) are the simplest type of FST. Each FSA represents a set of strings (henceforth, a language), similar to a regular expression, as follows. An FSA consists of a five-tuple (Q, S, F, Σ δ) where:
Changed:
<
<
  1. Q is a finite set of states,
  2. S is the set of start states,
  3. F is the set of final states,
  4. Σ is the alphabet, and
  5. δ : Q × (Σ ∪ ε) → Q is the transition relation.

A path through an FSA (Q, S, F, Σ δ) is simply a sequence of transitions starting with a start state sS, taking only transitions licensed by δ, and terminating at a final state fF. More formally, let a path be a sequence P = p0, p1, …, pn where each element pP is a triple over Q × (Σ ∪ ε) × Q. Each triple consists of a source state, a label in the alphabet (or the null label ε) and a destination state. Then, a path is valid if and only if all of the following conditions hold:

  1. The source state for p0 is in S,
  2. δ(qi li) = qi + 1 for any pi = (qi, li, qi + 1) ∈ P, and
  3. the destination state for pn is in F.
>
>
  1. Q is a finite set of states,
  2. S is the set of start states,
  3. F is the set of final states,
  4. Σ is the alphabet, and
  5. δ : Q × (Σ ∪ ε) → Q is the transition relation.

A path through an FSA (Q, S, F, Σ δ) is simply a sequence of transitions starting with a start state sS, taking only transitions licensed by δ, and terminating at a final state fF. More formally, let a path be a sequence P = p0, p1, …, pn where each element pP is a triple over Q × (Σ ∪ ε) × Q. Each triple consists of a source state, a label in the alphabet (or the null label ε) and a destination state. Then, a path is valid if and only if all of the following conditions hold:

  1. The source state for p0 is in S,
  2. δ(qi li) = qi + 1 for any pi = (qi, li, qi + 1) ∈ P, and
  3. the destination state for pn is in F.
  The first condition requires that the path begin at a start state, the second condition requires that all transitions be licensed by δ, and the third condition requires that the final transition be to a final state.
Changed:
<
<
Let the string of a path be the (possibly empty) sequence of labels. For instance, if P is (s, l0, q1, (q1, l1, q2), (q2, l2, q3), then the string of P is simply l0, lsub>1, l2. The language described by an FSA is simply the union of all strings of all its paths.
>
>
Let the string of a path be the (possibly empty) sequence of labels. For instance, if P is (s, l0, q1, (q1, l1, q2), (q2, l2, q3), then the string of P is simply l0, lsub>1, l2. The language described by an FSA is simply the union of all strings of all its paths.
 

Example

The following FSA represents the Perl-style regular expression ab*cd+e:

Changed:
<
<
abcde.png
>
>
abcde.png
  Here, the bold circle labeled 0 indicates the start state, and the double circle labeled 4 indicates the final state.
Line: 110 to 109
 

Definition

Changed:
<
<
Finite-state transducers (FSTs) are generalization of FSAs. In the normal case of a two-way transducer, δ is instead a relation from Q × (Σi ∪ ε) × (Σo ∪ ε) → Q where Σiand Σo are the input and output alphabets, respectively. Paths through a FST are defined similarly to the definition given for FSAs above, except that each path corresponds to a set of two strings, an input string over Σi* and an output string over Σo*. Whereas FSAs describe sets of strings, FSTs describe relations between sets of strings.
>
>
Finite-state transducers (FSTs) are generalization of FSAs. In the normal case of a two-way transducer, δ is instead a relation from Q × (Σi ∪ ε) × (Σo ∪ ε) → Q where Σiand Σo are the input and output alphabets, respectively. Paths through a FST are defined similarly to the definition given for FSAs above, except that each path corresponds to a set of two strings, an input string over Σi* and an output string over Σo*. Whereas FSAs describe sets of strings, FSTs describe relations between sets of strings.
  When the relation described by an FST is such that each input string corresponds to at most one output string, we say that the FST is functional.
Line: 118 to 117
  The following FST represents the relation (a:a)(b:b)*(c:g)(d:f)+(e:e):
Changed:
<
<
abcgdfe.png
>
>
abcgdfe.png
 

Weighted finite-state transducers

Line: 126 to 125
  Weighted finite-state transducers (WFSTs) are generalizations of FSTs which use an alternative definition of both F and δ incorporating the notion of weights. FST weights and their operations can be understood by first defining the notion of semiring, which require us to first define the notion monoid.
Changed:
<
<
A monoid is a pair (M, •) where M is a set and • is a
>
>
A monoid is a pair (M, •) where M is a set and • is a
 binary operation on M such that:
Changed:
<
<
  1. • is closed over M: for all a, b in M, abM,
  2. there is an identity element eM such that ae = e • = a for all aM, and
  3. • is associative; for all a, b, cM, (ab) • c = a • (bc).
>
>
  1. • is closed over M: for all a, b in M, abM,
  2. there is an identity element eM such that ae = e • = a for all aM, and
  3. • is associative; for all a, b, cM, (ab) • c = a • (bc).
 
Changed:
<
<
A monoid (M, •) is said to be commutative if for all a, bM, ab = ba.
>
>
A monoid (M, •) is said to be commutative if for all a, bM, ab = ba.
 
Changed:
<
<
Then, a semiring is a triple (𝕂, ⊕, ⊗) such that:
>
>
Then, a semiring is a triple (𝕂, ⊕, ⊗) such that:
 
Changed:
<
<
  1. (𝕂, ⊕) is a commutative monoid,
  2. (𝕂, ⊗) is a monoid,
  3. for all a, b, c ∈ 𝕂, a ⊗ (bc) = (a ⊗ b) ⊕ (ac), and
  4. for all a ∈ 𝕂, a0 = 0a = 0 where 0 is the identity element for the monoid (𝕂, ⊕).
>
>
  1. (𝕂, ⊕) is a commutative monoid,
  2. (𝕂, ⊗) is a monoid,
  3. for all a, b, c ∈ 𝕂, a ⊗ (bc) = (a ⊗ b) ⊕ (ac), and
  4. for all a ∈ 𝕂, a0 = 0a = 0 where 0 is the identity element for the monoid (𝕂, ⊕).
 
Changed:
<
<
In many cases, 𝕂 is the set of real numbers, so a semiring can be denoted simply by specifying the pair of operators (⊕ ⊗). The so-called real semiring is simply (+, ×).
>
>
In many cases, 𝕂 is the set of real numbers, so a semiring can be denoted simply by specifying the pair of operators (⊕ ⊗). The so-called real semiring is simply (+, ×).
  When working with probabilities as weights, we often use the tropical semiring (min, +) and negative log probabilities, taking advantage of the logarithmic identity log(x) + log(y) = log(xy). The tropical semiring (and the associated standard arc type) is the default semiring in Pynini.
Changed:
<
<
At last, we can give the modified definitions for F and delta; for WFSTs. Whereas for unweighted FSTs, F is a set of final states, for WFSTs F is a set of pairs over Q × 𝕂, where the second element is the final weight for that state. And, the transition relation δ for a WFST is from Q × (Σi ∪ ε) × (Σo ∪ ε) × 𝕂 to Q. The definition of paths is parallel to those for unweighted FSTs except that each element in the path is also associated with a weight in 𝕂.
>
>
At last, we can give the modified definitions for F and delta; for WFSTs. Whereas for unweighted FSTs, F is a set of final states, for WFSTs F is a set of pairs over Q × 𝕂, where the second element is the final weight for that state. And, the transition relation δ for a WFST is from Q × (Σi ∪ ε) × (Σo ∪ ε) × 𝕂 to Q. The definition of paths is parallel to those for unweighted FSTs except that each element in the path is also associated with a weight in 𝕂.
 

Example

WFSTs are a natural representation for conditional probability distributions from strings to strings. For example, consider a text normalization rule which verbalizes 2:00 as two with P = .2 and as two o'clock with P = .8. The following probability (i.e., real-valued) semiring WFST encodes this distribution:

Changed:
<
<
time.png
>
>
time.png
 

Conventions used in Pynini

  • Pynini represents all acceptors and transducers, weighted or unweighted, as
Changed:
<
<
WFSTs. Thus, for example, a weighted finite-state acceptor (WFSA) is represented as a WFST in which input and output labels match in all cases, and an unweighted finite-state transducer is represented by a WFST in which all weights are 1 or 0.
>
>
WFSTs. Thus, for example, a weighted finite-state acceptor (WFSA) is represented as a WFST in which input and output labels match in all cases, and an unweighted finite-state transducer is represented by a WFST in which all weights are 1 or 0.
 
  • Pynini only permits one state to be designated the start state.
Changed:
<
<
  • Pynini assigns a final weight to all states; a nonfinal state is just one which has a final weight of 0.
>
>
  • Pynini assigns a final weight to all states; a nonfinal state is just one which has a final weight of 0.
 

Getting started with Pynini

Line: 167 to 166
 

Starting Pynini

Install Pynini then simply import the module as follows:

Deleted:
<
<
 
Changed:
<
<
import pynini
>
>
import pynini
 

Building FSTs

Line: 181 to 178
 

Constructing acceptors

Changed:
<
<
The acceptor function compiles a (Unicode or byte) string into a deterministic acceptor. The user may specify a final weight using the weight keyword argument; by default, the final weight is 1. The user may also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the string are to be translated into labels of the acceptor. By default (token_type="byte"), each arc in the acceptor corresponds to byte in the input string. Finally, the user may specify whether or not the symbol table used should be attached to the FST using the attach_symbols keyword argument, which defaults to True.
>
>
The acceptor function compiles a (Unicode or byte) string into a deterministic acceptor. The user may specify a final weight using the weight keyword argument; by default, the final weight is 1. The user may also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the string are to be translated into labels of the acceptor. By default (token_type="byte"), each arc in the acceptor corresponds to byte in the input string. Finally, the user may specify whether or not the symbol table used should be attached to the FST using the attach_symbols keyword argument, which defaults to True.
 
# UTF-8 bytestring, byte arcs.
print pynini.acceptor("Pont l'Evêque")
# Output below...
Deleted:
<
<
 
Deleted:
<
<
 0 1 P 1 2 o 2 3 n
Line: 204 to 198
 11 12 q 12 13 u 13 14 e
Changed:
<
<
14

If the user specifies token_type="utf8" then each arc in the FST corresponds to a Unicode codepoint.

>
>
14 If the user specifies token_type="utf8" then each arc in the FST corresponds to a Unicode codepoint.
 
# UTF-8 bytestring, Unicode codepoint arcs.
print pynini.acceptor("Pont l'Evêque", token_type="utf8")
# Output below...
Deleted:
<
<
 
Deleted:
<
<
 0 1 P 1 2 o 2 3 n
Line: 229 to 218
 10 11 q 11 12 u 12 13 e
Changed:
<
<
13
>
>
13
 Sequences of characters enclosed by [ and ] receive special interpretations in byte or utf8 mode. Pynini first attempts to parse any such sequence as an integer using the C standard library function strtoll.
Deleted:
<
<
 
Changed:
<
<
assert pynini.acceptor("b[0x61][97]") == pynini.acceptor("baa") # OK.
>
>
assert pynini.acceptor("b[0x61][97]") == pynini.acceptor("baa") # OK.
  If this fails, then Pynini treats the sequence as a sequence of one or more whitespace-delimited "generated" symbols, each of which is given a unique identifier in the resulting FST's symbol table.
Deleted:
<
<
 
x = pynini.acceptor("[It's not much of a cheese shop really]")
y = pynini.acceptor("[It's][not][much][of][a][cheese][shop][really]")
Line: 243 to 227
 
x = pynini.acceptor("[It's not much of a cheese shop really]")
y = pynini.acceptor("[It's][not][much][of][a][cheese][shop][really]")
Changed:
<
<
assert x == y # OK.
>
>
assert x == y # OK.
  A bracket character to be interpreted literally must be escaped.
Deleted:
<
<
 
bracket = pynini.acceptor("\[")  # OK.
Deleted:
<
<

unused = pynini.acceptor("[")  # Not OK: Raises FstStringCompilationError.
 
Added:
>
>
unused = pynini.acceptor("[") # Not OK: Raises FstStringCompilationError.
 Finally, if the user specifies a SymbolTable as the token_type, then the input string is parsed according and the FST labeled according to that table. String parsing failures are logged and raise a FstStringCompilationError exception from within the Python interpreter.

Nearly all FST operations permit a string to be passed in place of an Fst argument; in that case, pynini.acceptor is used (with default arguments) to compile the string into an FST.

Line: 265 to 243
  The transducer function creates a transducer from two FSTs, compiling string arguments into FSTs if necessary. The result is a cross-product of the two input FSTs, interpreted as acceptors. Specifically, the transducer is constructed by mapping output arcs in the first FST to epsilon, mapping the input arcs in the second FST to epsilon, then concatenating the two. In the case that both FSTs are strings, the resulting transducer will simply contain the input and output string converted into labels, and the shorter of the two will be padded with epsilons.
Changed:
<
<
As with acceptor, the user may specify a final weight using the weight keyword argument; the final weight is 1. The user also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer, if strings are passed in place of FST arguments. By default (input_token_type="byte" and output_token_type="byte"), each arc corresponds to a byte in the input and/or output string. Finally, the user may specify whether or not symbol tables should be attached to the FST.
>
>
As with acceptor, the user may specify a final weight using the weight keyword argument; the final weight is 1. The user also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer, if strings are passed in place of FST arguments. By default (input_token_type="byte" and output_token_type="byte"), each arc corresponds to a byte in the input and/or output string. Finally, the user may specify whether or not symbol tables should be attached to the FST.
  Whereas transducer is used to construct the cross-product of two strings, the union of many such string pair cross-products is compiled using the string_map and string_file functions. The former takes iterables of strings as an argument; the latter reads string pairs from one-to-three TSV file. Both functions use a compact prefix tree representation which can be used to represent a (multi)map.
Line: 278 to 256
 

Finnish vowel harmony

Changed:
<
<
Koskenniemi (1983) provides a number of manually-compiled FSTs modeling Finnish morphophonological patterns. One of these concerns the well-known pattern of Finnish vowel harmony. Many Finnish suffixes have two allomorphs differing only in the backness specification of their vowel. For example, the adessive suffix is usually realized as -llä [lːæː] except when the preceding stem contains one of u, o, and a and there is no intervening y, ö, or ä; in this case, it is -lla [lːɑː]. For example, käde 'hand' has an adessive kädellä, whereas vero 'tax' forms the adessive verolla because the nearest stem vowel is o (Ringen & Heinämäki 1999).
>
>
Koskenniemi (1983) provides a number of manually-compiled FSTs modeling Finnish morphophonological patterns. One of these concerns the well-known pattern of Finnish vowel harmony. Many Finnish suffixes have two allomorphs differing only in the backness specification of their vowel. For example, the adessive suffix is usually realized as -llä [lːæː] except when the preceding stem contains one of u, o, and a and there is no intervening y, ö, or ä; in this case, it is -lla [lːɑː]. For example, käde 'hand' has an adessive kädellä, whereas vero 'tax' forms the adessive verolla because the nearest stem vowel is o ( Ringen & Heinämäki 1999).
  The following gives a Pynini function make_adessive which generates the appropriate form of the noun stem. It first concatenates the stem with an abstract representation of the suffix, and then composes the result with a context-dependent rewrite rule adessive_harmony.
Deleted:
<
<
 
back_vowel = pynini.union("u", "o", "a")
neutral_vowel = pynini.union("i", "e")
Line: 300 to 277
 

def make_adessive(stem):

Changed:
<
<
return ((stem + adessive) * adessive_harmony).stringify()
>
>
return ((stem + adessive) * adessive_harmony).stringify()
 

Pluralization rules

Line: 308 to 284
 Consider an application where one is generating text such as, "The current temperature in New York is 25 degrees". Typically one would do this from a template such as the following:
Deleted:
<
<
 
Changed:
<
<
The current temperature in _ is _ degrees
>
>
The current temperature in _ is _ degrees
  This works fine if one fills in the first blank with the name of a location and the second blank with a number. But what if it’s really cold in New York and the
Line: 316 to 290
 This works fine if one fills in the first blank with the name of a location and the second blank with a number. But what if it’s really cold in New York and the thermometer shows 1° (Fahrenheit)? One does not want this:
Deleted:
<
<
 
Changed:
<
<
The current temperature in New York is 1 degrees
>
>
The current temperature in New York is 1 degrees
  So one needs to have rules that know how to inflect the unit appropriately given the context. The Unicode Consortium has some guidelines for this; they primarily specify how many “plural” forms different languages have and in which circumstances one uses each. From a linguistic point of view the specifications are sometimes nonsensical, but they are widely used and are useful for adding support for new languages.
Line: 336 to 307
  # a higher priority, if it matches the input. sigma_star + pynini.transducer("ches", "ch", -1), # Any sequence of bytes ending in "s" strips the "s".
Changed:
<
<
sigma_star + pynini.transducer("s", ""))
>
>
sigma_star + pynini.transducer("s", ""))
  Then as before we can define a context-dependent rewrite rule that performs the singularization just in case the unit is preceded by 1. We define the right
Line: 343 to 313
 singularization just in case the unit is preceded by 1. We define the right context for the application of the rule rc as punctuation, space, or [EOS], a special symbol for the end-of-string.
Deleted:
<
<
 
rc = pynini.union(".", ",", "!", ";", "?", " ", "[EOS]")
Changed:
<
<
singularize = pynini.cdrewrite(singular_map, " 1 ", rc, sigma_star)
>
>
singularize = pynini.cdrewrite(singular_map, " 1 ", rc, sigma_star)
  Then we can define a convenience function to apply this rule...
Deleted:
<
<
 
def singularize(string):
  return pynini.shortestpath(
Line: 354 to 321
 
def singularize(string):
  return pynini.shortestpath(
Changed:
<
<
pynini.compose(string.strip(), singularize)).stringify()
>
>
pynini.compose(string.strip(), singularize)).stringify()
  ...and use it like so:
Deleted:
<
<
 
>>> singularize("The current temperature in New York is 1 degrees")
"The current temperature in New York is 1 degree"
Line: 367 to 332
 >>> singularize("That measures 1.2 feet")
"That measures 1.2 feet" >>> singularize("That costs just 1 pence")
Changed:
<
<
"That costs just 1 penny"
>
>
"That costs just 1 penny"
 
Changed:
<
<
One can handle other languages by changing the rules appropriately—for example, in Russian, which has http://www.unicode.org/cldr/charts/29/supplemental/language_plural_rules.html[more complex pluralization rules]], one needs several different forms, not just singular and plural—and by changing the conditions under which the various rules apply.
>
>
One can handle other languages by changing the rules appropriately—for example, in Russian, which has http://www.unicode.org/cldr/charts/29/supplemental/language_plural_rules.html[more complex pluralization rules]], one needs several different forms, not just singular and plural—and by changing the conditions under which the various rules apply.
 

T9 disambiguation

Changed:
<
<
T9 (short for "Text on 9 keys"; Grover et al. 1998) is a patented predictive text entry system. In T9, each character in the "plaintext" alphabet is assigned to one of the 9 digit keys (0 is usually reserved to represent space) of the traditional 3x4 touch-tone phone grid. For instance, the message GO HOME is entered as 4604663. Since the resulting "ciphertext" may be highly ambiguous—this sequence could also be read as GO HOOD or as many nonsensical expressions—a hybrid language model/lexicon is used for decoding.
>
>
T9 (short for "Text on 9 keys"; Grover et al. 1998) is a patented predictive text entry system. In T9, each character in the "plaintext" alphabet is assigned to one of the 9 digit keys (0 is usually reserved to represent space) of the traditional 3x4 touch-tone phone grid. For instance, the message GO HOME is entered as 4604663. Since the resulting "ciphertext" may be highly ambiguous—this sequence could also be read as GO HOOD or as many nonsensical expressions—a hybrid language model/lexicon is used for decoding.
  The following snippet implements T9 encoding and decoding in Pynini:
Deleted:
<
<
 
LM = pynini.Fst.read("charlm.fst")
T9_ENCODER = pynini.string_file("t9.tsv").closure()
Line: 391 to 354
  def k_best(ciphertext, k): lattice = (ciphertext * T9_DECODER).project(True) * LM
Changed:
<
<
return pynini.shortestpath(lattice, nshortest=k, unique=True)
>
>
return pynini.shortestpath(lattice, nshortest=k, unique=True)
  The first line reads a language model (LM), represented as a WFSA. The second line reads the encoder table from a TSV file: in this file, each line contains an alphabetic character and the corresponding digit key. By computing the concatenative closure of this map, one obtains an FST T9_ENCODER which can encode arbitrarily-long plaintext strings. The encode_string function applies this FST to arbitrary plaintext strings: application here refers to composition of a string with a transducer followed by output projection. The
Changed:
<
<
k_best function first applies a ciphertext string—a bytestring of digits—to the inverse of the encoder FST (T9_DECODER). This creates an intermediate lattice of all possible plaintexts consistent with the T9 ciphertext. This is then scored with—that is, composed with—the character LM. Finally, this function returns the k highest probability plaintexts in the lattice. For the following example, the highest probability plaintext is in fact the correct one:
>
>
k_best function first applies a ciphertext string—a bytestring of digits—to the inverse of the encoder FST (T9_DECODER). This creates an intermediate lattice of all possible plaintexts consistent with the T9 ciphertext. This is then scored with—that is, composed with—the character LM. Finally, this function returns the k highest probability plaintexts in the lattice. For the following example, the highest probability plaintext is in fact the correct one:
 
pt = "THE SINGLE MOST POPULAR CHEESE IN THE WORLD"
ct = encode_string(pt)
print "CIPHERTEXT:", ct
print "DECIPHERED PLAINTEXT:", k_best(ct, 1).stringify()
---+ Output below...
Deleted:
<
<
 
Deleted:
<
<
 CIPHERTEXT: 8430746453066780767852702433730460843096753
Changed:
<
<
DECIPHERED PLAINTEXT: THE SINGLE MOST POPULAR CHEESE IN THE WORLD
>
>
DECIPHERED PLAINTEXT: THE SINGLE MOST POPULAR CHEESE IN THE WORLD
 

API reference Work in progress, under construction

Line: 434 to 390
  ilabel: The input label. olabel: The output label. weight: The arc weight.
Changed:
<
<
nextstate: The destination state for the arc.
>
>
nextstate: The destination state for the arc.
 
Methods
Line: 452 to 406
  This class is used for iterating over the arcs leaving some state of an FST. It supports the full C++ API, but most users should just call the `arcs`
Changed:
<
<
method of an FST object and take advantage of the Pythonic API.
>
>
method of an FST object and take advantage of the Pythonic API.
 
Methods
Deleted:
<
<
 
done(self)
Line: 464 to 416
  Returns: True if the iterator is exhausted, False otherwise.
Deleted:
<
<
 
Deleted:
<
<
 next(self)

Advances the iterator.

Deleted:
<
<
 
Deleted:
<
<
 flags(self)

Returns the current iterator behavioral flags.

Returns: The current iterator behavioral flags as an integer.

Deleted:
<
<
 
Deleted:
<
<
 position(self)

Returns the position of the iterator.

Returns: The iterator's position, expressed as an integer.

Deleted:
<
<
 
Deleted:
<
<
 reset(self)

Resets the iterator to the initial position.

Deleted:
<
<
 
Deleted:
<
<
 seek(self, a)

Advance the iterator to a new position.

Args: a: The position to seek to.

Deleted:
<
<
 
Deleted:
<
<
 set_flags(self, flags, mask)

Sets the current iterator behavioral flags.

Line: 513 to 453
 Args: flags: The properties to be set. mask: A mask to be applied to the `flags` argument before setting them.
Deleted:
<
<
 
Deleted:
<
<
 value(self)

Returns the current arc.

Returns:

Changed:
<
<
The current arc.
>
>
The current arc.
 

EncodeMapper

Line: 548 to 484
 Args: arc_type: A string indicating the arc type. encode_labels: Should labels be encoded?
Changed:
<
<
encode_weights: Should weights be encoded?
>
>
encode_weights: Should weights be encoded?
 
Methods
Deleted:
<
<
 
arc_type(self)
Line: 557 to 491
 arc_type(self)

Returns a string indicating the arc type.

Deleted:
<
<
 
Deleted:
<
<
 flags(self)

Returns the encoder's flags.

Deleted:
<
<
 
Deleted:
<
<
 input_symbols(self)

Returns the encoder's input symbol table, or None if none is present.

Deleted:
<
<
 
Deleted:
<
<
 output_symbols(self)

Returns the encoder's output symbol table, or None if none is present.

Deleted:
<
<
 
Deleted:
<
<
 properties(self, mask)

Provides property bits.

Line: 589 to 515
  Returns: A 64-bit bitmask representing the requested properties.
Deleted:
<
<
 
Deleted:
<
<
 set_input_symbols(self, syms)

Sets the encoder's input symbol table.

Args: syms: A SymbolTable.

Deleted:
<
<
 
Deleted:
<
<
 set_output_symbols(self, syms)

Sets the encoder's output symbol table.

Line: 609 to 531
 This section covers classes (representing FSTs, weights, and the like), their methods (representing various accessors as well as destructive FST algorithms), and functions (representing constructive FST algorithms, weight arithmetic, and the like).

syms: A SymbolTable.

Deleted:
<
<
 
Deleted:
<
<
 weight_type(self)
Changed:
<
<
Returns a string indicating the weight type.
>
>
Returns a string indicating the weight type.
 

Far

Line: 638 to 556
  filename: A string indicating the filename. mode: FAR IO mode; one of: "r" (open for reading), "w" (open for writing). arc_type: Desired arc type; ignored if the FAR is opened for reading.
Changed:
<
<
far_type: Desired FAR type; ignored if the FAR is opened for reading.
>
>
far_type: Desired FAR type; ignored if the FAR is opened for reading.
 
Methods
Deleted:
<
<
 
arc_type(self)
Line: 647 to 563
 arc_type(self)

Returns a string indicating the arc type.

Deleted:
<
<
 
Deleted:
<
<
 closed(self)

Indicates whether the FAR is closed for IO.

Deleted:
<
<
 
Deleted:
<
<
 far_type(self)

Returns a string indicating the FAR type.

Deleted:
<
<
 
Deleted:
<
<
 mode(self)

Returns a char indicating the FAR's current mode.

Deleted:
<
<
 
Deleted:
<
<
 name(self)
Changed:
<
<
Returns the FAR's filename.

Methods when opened for reading (mode="r")
>
>
Returns the FAR's filename.
 
Added:
>
>
Methods when opened for reading (mode="r")
 
done(self)
Line: 682 to 588
  Returns: True if the iterator is exhausted, False otherwise.
Deleted:
<
<
 
Deleted:
<
<
 find(self, key)

Sets the current position to the first entry greater than or equal to the key (a

Line: 695 to 599
  Returns: True if the key was found, False otherwise.
Deleted:
<
<
 
Deleted:
<
<
 get_fst(self)

Returns the FST at the current position.

Returns: A copy of the FST at the current position.

Deleted:
<
<
 
Deleted:
<
<
 get_key(self)

Returns the string key at the current position.

Returns: The string key at the current position.

Deleted:
<
<
 
Deleted:
<
<
 next(self)

Advances the iterator.

Deleted:
<
<
 
Deleted:
<
<
 reset(self)
Changed:
<
<
Resets the iterator to the initial position.

Methods when opened for writing (mode="w")
>
>
Resets the iterator to the initial position.
 
Added:
>
>
Methods when opened for writing (mode="w")
 
add(self, key, fst)
Line: 747 to 641
 Raises: FstOpError: Cannot invoke method in current mode. FstOpError: Incompatible or invalid arc type.
Deleted:
<
<
 
Deleted:
<
<
 close(self)

Closes the FAR and flushes to disk (when open for writing).

Raises:

Changed:
<
<
FstOpError: Cannot invoke method in current mode.
>
>
FstOpError: Cannot invoke method in current mode.
 

Fst

This class represents a mutable, expanded FST. In addition to its constructor, it can be created using the

Changed:
<
<
string compilation function acceptor, the cross-product construction functions transducer, string_file, and string_map, and by various constructive FST algorithms.
>
>
string compilation function acceptor, the cross-product construction functions transducer, string_file, and string_map, and by various constructive FST algorithms.
 
Constructors
Deleted:
<
<
 
Fst(arc_type="standard")
Line: 773 to 663
  Args: arc_type: An optional string indicating the arc type for the FST.
Deleted:
<
<
 
Deleted:
<
<
 Fst.read(filename)

Reads an FST from a file.

Line: 789 to 677
 Raises: FstIOError: Read failed. FstOpError: Read-time conversion failed.
Deleted:
<
<
 
Deleted:
<
<
 Fst.from_pywrapfst(ifst)

Constructs a Pynini FST from a pywrapfst._Fst.

Line: 807 to 693
  ifst: Input FST of type pywrapfst._Fst.

Returns:

Changed:
<
<
An FST.
>
>
An FST.
 
Methods
Line: 829 to 713
 Raises: FstIndexError: State index out of range. FstOpdexError: Incompatible or invalid weight type.
Deleted:
<
<
 
Deleted:
<
<
 add_state(self)

Adds a new state to the FST and returns the state ID.

Returns: The integer index of the new state.

Deleted:
<
<
 
Deleted:
<
<
 arc_type(self)

Returns a string indicating the arc type.

Deleted:
<
<
 
Deleted:
<
<
 arcs(self, state)

Returns an iterator over arcs leaving the specified state.

Line: 856 to 734
  Returns: An ArcIterator.
Deleted:
<
<
 
Deleted:
<
<
 arcsort(self, sort_type="ilabel")

Sorts arcs leaving each state of the FST.

Line: 874 to 750
  self.

Raises:

Changed:
<
<
FstArgError: Unknown sort type.

See also: ArcSort algorithm.

>
>
FstArgError: Unknown sort type. See also: ArcSort algorithm.
 
closure(self, lower)
closure(self, lower, upper)
Line: 922 to 795
  upper: upper bound.

Returns:

Changed:
<
<
self.
>
>
self.
  See also: Repeat algorithm.
Deleted:
<
<
 
concat(self, ifst)
Line: 946 to 817
 Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict
Changed:
<
<
without relabeling.

See also: Concat algorithm.

>
>
without relabeling.
 
Added:
>
>
See also: Concat algorithm.
 
connect(self)
Line: 960 to 829
 are not part of any successful path.

Returns:

Changed:
<
<
self.

See also: Connect algorithm.

>
>
self.
 
Added:
>
>
See also: Connect algorithm.
 
copy(self)

Makes a copy of the FST.
Deleted:
<
<
 
Deleted:
<
<
 decode(self, encoder)

Decodes encoded labels and/or weights.

Line: 982 to 847
  encoder: An EncodeMapper object used to encode the FST.

Returns:

Changed:
<
<
self.

See also: Encode and Decode algorithms.

>
>
self. See also: Encode and Decode algorithms.
 
delete_arcs(self, state, n=0)
Line: 1003 to 865
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 delete_states(self, states=None)

Deletes states.

Line: 1019 to 879
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 draw(self, filename, isymbols=None, osymbols=None, ssymbols=None, acceptor=False, title="", width=8.5, height=11, portrait=False, vertical=False, ranksep=0.4, nodesep=0.25, fontsize=14,
Line: 1050 to 908
  fontsize: Font size, in points. precision: Numeric precision for floats, in number of chars. float_format: One of: 'e', 'f' or 'g'.
Changed:
<
<
show_weight_one: Should weights equivalent to semiring One be printed?
>
>
show_weight_one: Should weights equivalent to semiring One be printed?
 See also: Printing, drawing, and summarizing FSTs.
Deleted:
<
<
 
encode(self, encoder)
Line: 1071 to 926
  encoder: An EncodeMapper object to be used as the encoder.

Returns:

Changed:
<
<
self.

See also: Encode and Decode algorithms.

>
>
self.
 
Added:
>
>
See also: Encode and Decode algorithms.
 
final(self, state)
Line: 1089 to 942
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 fst_type(self)

Returns a string indicating the FST type.

Deleted:
<
<
 
Deleted:
<
<
 input_symbols(self)

Returns the FST's input symbol table, or None if none is present.

Deleted:
<
<
 
Deleted:
<
<
 invert(self)

Inverts the FST's transduction.

Line: 1112 to 959
 input and output labels.

Returns:

Changed:
<
<
self.

See also: Invert algorithm.

>
>
self. See also: Invert algorithm.
 
minimize(self, delta=0.0009765625, allow_nondet=False)
Line: 1140 to 984
  allow_nondet: Attempt minimization of non-deterministic FST?

Returns:

Changed:
<
<
self.

See also: Minimize algorithm.

>
>
self.
 
Added:
>
>
See also: Minimize algorithm.
 
mutable_arcs(self, state)
Line: 1155 to 997
  Returns: A MutableArcIterator.
Deleted:
<
<
 
Deleted:
<
<
 mutable_input_symbols(self)

Returns the FST's (mutable) input symbol table, or None if none is present.

Deleted:
<
<
 
Deleted:
<
<
 mutable_output_symbols(self)

Returns the FST's (mutable) output symbol table, or None if none is present.

Deleted:
<
<
 
Deleted:
<
<
 num_arcs(self, state)

Returns the number of arcs leaving a state.

Line: 1182 to 1018
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 num_input_epsilons(self, state)

Returns the number of arcs with epsilon input labels leaving a state.

Line: 1197 to 1031
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 num_output_epsilons(self, state)

Returns the number of arcs with epsilon output labels leaving a state.

Line: 1212 to 1044
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 num_states(self)

Returns the number of states.

Deleted:
<
<
 
Deleted:
<
<
 optimize(self, compute_props=False)

Performs a generic optimization of the FST.

Line: 1256 to 1084
  appropriate optimizations?

Returns:

Changed:
<
<
self.
>
>
self.
 See also: Optimize algorithm.
Deleted:
<
<
 
output_symbols(self)
Line: 1265 to 1090
 output_symbols(self)

Returns the FST's output symbol table, or None if none is present.

Deleted:
<
<
 
Changed:
<
<
paths(self, token_type="byte)
>
>
paths(self, input_token_type="byte, output_token_type="byte")
  Creates iterator over all string paths in an acyclic FST.
Line: 1292 to 1115
  Raises: FstArgError: Unknown token type.
Changed:
<
<
FstArgError: FST is not acyclic.
>
>
FstArgError: FST is not acyclic.
 See also: Paths algorithm.
Deleted:
<
<
 
project(self, project_output=False)
Line: 1310 to 1130
  project_output: Should the output labels be projected?

Returns:

Changed:
<
<
self.

See also: Project algorithm.

>
>
self.
 
Added:
>
>
See also: Project algorithm.
 
properties(self, mask)
Line: 1326 to 1144
  mask: The property mask to be compared to the encoder's properties.

Returns:

Changed:
<
<
A 64-bit bitmask representing the requested properties.
>
>
A 64-bit bitmask representing the requested properties.
  See also: FST properties.
Deleted:
<
<
 
prune(self, delta=0.0009765625, nstate=NO_STATE_ID, weight=None)
Line: 1348 to 1164
  below which paths are pruned; if omitted, no paths are pruned.

Returns:

Changed:
<
<
self.

See also: Prune algorithm.

>
>
self.
 
Added:
>
>
See also: Prune algorithm.
 
push(self, delta=0.0009765625, remove_total_weight=False, to_final=False)
Line: 1374 to 1188
  to_final: Push towards final states?

Returns:

Changed:
<
<
self.

See also: Push algorithm.

>
>
self.
 
Added:
>
>
See also: Push algorithm.
 
relabel_pairs(self, ipairs=None, opairs=None)
Line: 1396 to 1208
  self.

Raises:

Changed:
<
<
FstArgError: No relabeling pairs specified.

See also: Relabel algorithm.

>
>
FstArgError: No relabeling pairs specified.
 
Added:
>
>
See also: Relabel algorithm.
 
relabel_tables(self, old_isymbols=None, new_isymbols=None,
               unknown_isymbol="", attach_new_isymbols=True,
Line: 1432 to 1242
  self.

Raises:

Changed:
<
<
FstArgError: No SymbolTable specified.

See also: Relabel algorithm.

>
>
FstArgError: No SymbolTable specified.
 
Added:
>
>
See also: Relabel algorithm.
 
reserve_arcs(self, state, n)
Line: 1451 to 1259
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 reserve_states(self, n)

Reserve n states (best effort).

Line: 1463 to 1269
  Returns: self.
Deleted:
<
<
 
Deleted:
<
<
 reweight(self, potentials, to_final=False)

Reweights an FST using an iterable of potentials.

Line: 1484 to 1288
  to_final: Push towards final states?

Returns:

Changed:
<
<
self.

See also: Reweight algorithm.

>
>
self. See also: Reweight algorithm.
 
rmepsilon(self, connect=True, delta=0.0009765625, nstate=NO_STATE_ID,
          weight=None)
Line: 1506 to 1307
  below which paths are pruned; if omitted, no paths are pruned.

Returns:

Changed:
<
<
self.

See also: RmEpsilon algorithm.

>
>
self.
 
Added:
>
>
See also: RmEpsilon algorithm.
 
set_final(self, state, weight)
Line: 1524 to 1323
 Raises: FstIndexError: State index out of range. FstOpError: Incompatible or invalid weight.
Deleted:
<
<
 
Deleted:
<
<
 set_input_symbols(self, syms)

Sets the input symbol table.

Line: 1538 to 1335
  Returns: self.
Deleted:
<
<
 
Deleted:
<
<
 set_output_symbols(self, syms)

Sets the output symbol table.

Line: 1552 to 1347
  Returns: self.
Deleted:
<
<
 
Deleted:
<
<
 set_properties(self, props, mask)

Sets the properties bits.

Line: 1566 to 1359
  Returns: self.
Deleted:
<
<
 
Deleted:
<
<
 set_start(self, state)

Sets a state to be the initial state.

Line: 1581 to 1372
  Raises: FstIndexError: State index out of range.
Deleted:
<
<
 
Deleted:
<
<
 start(self)

Returns the start state.

Deleted:
<
<
 
Deleted:
<
<
 states(self)

Returns an iterator over all states in the FST.

Returns: A StateIterator object for the FST.

Deleted:
<
<
 
Deleted:
<
<
 stringify(self, token_type="byte")

Creates a string from a string FST.

Line: 1627 to 1412
  Raises: FstArgError: FST is not a string.
Changed:
<
<
FstArgError: Unknown token type.
>
>
FstArgError: Unknown token type.
 See also: String algorithms.
Deleted:
<
<
 
text(self, isymbols=None, osymbols=None, ssymbols=None, acceptor=False,
     show_weight_one=False, missing_sym="")
Line: 1651 to 1433
  missing_symbol: The string to be printed when symbol table lookup fails.

Returns:

Changed:
<
<
A formatted string representing the machine.
>
>
A formatted string representing the machine.
  Note: use print f as an alias for print f.text().
Deleted:
<
<
 
topsort(self)
Line: 1666 to 1446
 state IDs to higher state IDs

Returns:

Changed:
<
<
self.

See also: TopSort algorithm.

>
>
self.
 
Added:
>
>
See also: TopSort algorithm.
 
union(self, ifst)
Line: 1690 to 1468
 Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict
Changed:
<
<
without relabeling.

See also: Union algorithm.

>
>
without relabeling.
 
Added:
>
>
See also: Union algorithm.
 
verify(self)

Verifies that an FST's contents are sane.

Returns:
Changed:
<
<
True if the contents are sane, False otherwise.
>
>
True if the contents are sane, False otherwise.
  Note: use this in combination with assertions in tests or library code.
Deleted:
<
<
 
weight_type(self)
Line: 1713 to 1487
  Returns: A string representing the weight type.
Deleted:
<
<
 
Deleted:
<
<
 write(self, filename)

Serializes FST to a file.

Line: 1727 to 1499
  Raises: FstIOError: Write failed.
Deleted:
<
<
 
Deleted:
<
<
 write_to_string(self)

Serializes FST to a string.

Returns:

Changed:
<
<
A string.
>
>
A string.
 

MPdtParentheses

Line: 1756 to 1524
 2) to be used.

A MPDT is expressed as an (Fst, MPdtParentheses) pair for the purposes of all

Changed:
<
<
supported MPDT operations.
>
>
supported MPDT operations.
 
Methods
Deleted:
<
<
 
add_triple(self, push, pop, assignment)
Line: 1772 to 1538
  pop: An arc label to be interpreted as a "pop" operation. assignment: An arc label indicating what stack the parentheses pair is assigned to.
Deleted:
<
<
 
Deleted:
<
<
 copy(self)

Makes a copy of this MPdtParentheses object.

Returns: A deep copy of the MPdtParentheses object.

Deleted:
<
<
 
Deleted:
<
<
 MPdtParentheses.read(filename)

Reads parentheses/assignment triples from a text file.

Line: 1799 to 1561
  Raises: FstIOError: Read failed.
Deleted:
<
<
 
Deleted:
<
<
 write(self, filename)

Writes parentheses triples to a text file.

Line: 1812 to 1572
  filename: The string location of the output file.

Raises:

Changed:
<
<
FstIOError: Write failed.
>
>
FstIOError: Write failed.
 

MutableArcIterator

Line: 1827 to 1585
 MutableArcIterator(ifst, state)

This class is used for iterating over the arcs leaving some state of an FST,

Changed:
<
<
also permitting mutation of the current arc.
>
>
also permitting mutation of the current arc.
 
Methods
Deleted:
<
<
 
done(self)
Line: 1836 to 1592
 done(self)

Indicates whether the iterator is exhausted or not.

Deleted:
<
<
 
Deleted:
<
<
 flags(self)

Returns the current iterator behavioral flags.

Returns: The current iterator behavioral flags as an integer.

Deleted:
<
<
 
Deleted:
<
<
 next(self)

Advances the iterator.

Deleted:
<
<
 
Deleted:
<
<
 position(self)

Returns the position of the iterator.

Returns: The iterator's position, expressed as an integer.

Deleted:
<
<
 
Deleted:
<
<
 reset(self)

Resets the iterator to the initial position.

Deleted:
<
<
 
Deleted:
<
<
 seek(self, a)

Advance the iterator to a new position.

Args: a: The position to seek to.

Deleted:
<
<
 
Deleted:
<
<
 set_flags(self, flags, mask)

Sets the current iterator behavioral flags.

Line: 1885 to 1629
 Args: flags: The properties to be set. mask: A mask to be applied to the `flags` argument before setting them.
Deleted:
<
<
 
Deleted:
<
<
 set_value(self, arc)

Replace the current arc with a new arc.

Args: arc: The arc to replace the current arc with.

Deleted:
<
<
 
Deleted:
<
<
 value(self)
Changed:
<
<
Returns the current arc.
>
>
Returns the current arc.
 

PdtParentheses

Line: 1919 to 1657
 indices should be contiguous.

A PDT is expressed as an (Fst, PdtParentheses) pair for the purposes of all

Changed:
<
<
supported PDT operations.
>
>
supported PDT operations.
 
Methods
Deleted:
<
<
 
add_pair(self, push, pop)
Line: 1932 to 1668
 Args: push: An arc label to be interpreted as a "push" operation. pop: An arc label to be interpreted as a "pop" operation.
Deleted:
<
<
 
Deleted:
<
<
 copy(self)

Makes a copy of this PdtParentheses object.

Returns: A deep copy of the PdtParentheses object.

Deleted:
<
<
 
Deleted:
<
<
 PdtParentheses.read(filename)

Reads parentheses pairs from a text file.

Line: 1959 to 1691
  Raises: FstIOError: Read failed.
Deleted:
<
<
 
Deleted:
<
<
 write(self, filename)

Writes parentheses triples to a text file.

Line: 1972 to 1702
  filename: The string location of the output file.

Raises:

Changed:
<
<
FstIOError: Write failed.
>
>
FstIOError: Write failed.
 

StateIterator

Line: 1984 to 1712
 
StateIterator(ifst)
Changed:
<
<
This class is used for iterating over the states in an FST.
>
>
This class is used for iterating over the states in an FST.
 
Methods
Deleted:
<
<
 
done(self)
Line: 1996 to 1722
  Returns: True if the iterator is exhausted, False otherwise.
Deleted:
<
<
 
Deleted:
<
<
 next(self)

Advances the iterator.

Deleted:
<
<
 
Deleted:
<
<
 reset(self)

Resets the iterator to the initial position.

Deleted:
<
<
 
Deleted:
<
<
 value(self)
Changed:
<
<
Returns the current state index.
>
>
Returns the current state index.
 

StringPaths

Line: 2051 to 1768
  Raises: FstArgError: Unknown token type.
Changed:
<
<
FstArgError: FST is not acyclic.
>
>
FstArgError: FST is not acyclic.
 
Methods
Deleted:
<
<
 
done(self)
Line: 2060 to 1775
 done(self)

Indicates whether the iterator is exhausted or not.

Deleted:
<
<
 
Deleted:
<
<
 error(self)

Indicates whether the StringPaths has encountered an error.

Returns: True if the StringPaths is in an errorful state, False otherwise.

Deleted:
<
<
 
Deleted:
<
<
 istring(self)

Returns the current path's input string.

Line: 2081 to 1792
  Returns: The path's input string.
Deleted:
<
<
 
Deleted:
<
<
 next(self)

Advances the iterator.

Deleted:
<
<
 
Deleted:
<
<
 istrings(self)

Generates all input strings in the FST.

Line: 2099 to 1806
  Yields: All input strings.
Deleted:
<
<
 
Deleted:
<
<
 ostrings(self)

Generates all output strings in the FST.

Line: 2111 to 1816
  Yields: All output strings.
Deleted:
<
<
 
Deleted:
<
<
 weights(self)

Generates all path weights in the FST.

Line: 2123 to 1826
  Yields: All weights.
Deleted:
<
<
 
Deleted:
<
<
 ostring(self)

Returns the current path's output string.

Returns: The path's output string.

Deleted:
<
<
 
Deleted:
<
<
 weight(self)

Returns the current path's total weight.

Returns:

Changed:
<
<
The path's Weight.
>
>
The path's Weight.
 

SymbolTable

Changed:
<
<
A symbol table stores a bidirectional mapping between arc labels and "symbols" (strings). They can be attached to FSTs or encoders, or iterated through using a SymbolTableIterator.
>
>
A symbol table stores a bidirectional mapping between arc labels and "symbols" (strings). They can be attached to FSTs or encoders, or iterated through using a SymbolTableIterator .
 
Constructor
Deleted:
<
<
 
SymbolTable(name="<unspecified>")
Line: 2155 to 1852
 Mutable SymbolTable class.

This class wraps the library SymbolTable and exposes both const (i.e.,

Changed:
<
<
access) and non-const (i.e., mutation) methods of wrapped object.
>
>
access) and non-const (i.e., mutation) methods of wrapped object.
 
Methods
Deleted:
<
<
 
add_symbol(self, symbol, key=NO_SYMBOL)
Line: 2175 to 1870
  Returns: The integer key of the new symbol.
Deleted:
<
<
 
Deleted:
<
<
 add_table(self, syms)

Adds another SymbolTable to this table.

Line: 2187 to 1880
  Args: syms: A SymbolTable to be merged with the current table.
Deleted:
<
<
 
Deleted:
<
<
 available_key(self)

Returns an integer indicating the next available key index in the table.

Deleted:
<
<
 
Deleted:
<
<
 checksum(self)

Returns a string indicating the label-agnostic MD5 checksum for the table.

Deleted:
<
<
 
Deleted:
<
<
 copy(self)

Returns a mutable copy of the SymbolTable.

Deleted:
<
<
 
Deleted:
<
<
 find(self, key)

Given a symbol or index, finds the other one.

Line: 2224 to 1909
  Raises: KeyError: Key not found.
Deleted:
<
<
 
Deleted:
<
<
 get_nth_key(self, pos)

Retrieves the integer index of the n-th key in the table.

Line: 2239 to 1922
  Raises: KeyError: index not found.
Deleted:
<
<
 
Deleted:
<
<
 labeled_checksum(self)

Returns a string indicating the label-dependent MD5 checksum for the table.

Deleted:
<
<
 
Deleted:
<
<
 member(self, key)

Given a symbol or index, returns whether it is found in the table.

Line: 2261 to 1940
  Returns: Whether or not the key is present (as a string or a index) in the table.
Deleted:
<
<
 
Deleted:
<
<
 name(self)

Returns the symbol table's name.

Deleted:
<
<
 
Deleted:
<
<
 name(self)

Returns the symbol table's name.

Deleted:
<
<
 
Deleted:
<
<
 SymbolTable.read(filename)

Reads symbol table from binary file.

Line: 2287 to 1960
  Returns: A new SymbolTable instance.
Deleted:
<
<
 
Deleted:
<
<
 SymbolTable.read_text(filename)

Reads symbol table from text file.

Line: 2303 to 1974
  Returns: A new SymbolTable instance.
Deleted:
<
<
 
Deleted:
<
<
 write(self, filename)

Serializes symbol table to a file.

Line: 2317 to 1986
  Raises: FstIOError: Write failed.
Deleted:
<
<
 
Deleted:
<
<
 write_text(self, filename)

Writes symbol table to text file.

Line: 2330 to 1997
  filename: The string location of the output file.

Raises:

Changed:
<
<
FstIOError: Write failed.
>
>
FstIOError: Write failed.
 

SymbolTableIterator

Line: 2343 to 2007
 
SymbolTableIterator(syms)
Changed:
<
<
This class is used for iterating over a symbol table.
>
>
This class is used for iterating over a symbol table.
 
Methods
Deleted:
<
<
 
done(self)
Line: 2355 to 2017
  Returns: True if the iterator is exhausted, False otherwise.
Deleted:
<
<
 
Deleted:
<
<
 next(self)

Advances the iterator.

Deleted:
<
<
 
Deleted:
<
<
 reset(self)

Resets the iterator to the initial position.

Deleted:
<
<
 
Deleted:
<
<
 symbol(self)

Returns the current symbol string.

Line: 2378 to 2034
  Returns: A symbol string.
Deleted:
<
<
 
Deleted:
<
<
 value(self)

Returns the current integer index of the symbol.

Returns:

Changed:
<
<
An integer index.
>
>
An integer index.
 

Weight

Line: 2411 to 2063
 Raises: FstArgError: Weight type not found. FstBadWeightError: Invalid weight.
Deleted:
<
<
 
Deleted:
<
<
 NoWeight(weight_type)

Constructs a non-member weight in the semiring.

Deleted:
<
<
 
Deleted:
<
<
 Weight.One(weight_type)

Constructs semiring One.

Deleted:
<
<
 
Deleted:
<
<
 Weight.Zero(weight_type)
Changed:
<
<
Constructs semiring zero.
>
>
Constructs semiring zero.
 
Methods
Deleted:
<
<
 
copy(self)
Line: 2437 to 2081
 copy(self)

Returns a copy of the Weight.

Deleted:
<
<
 
Deleted:
<
<
 type(self)
Changed:
<
<
Returns a string indicating the weight type.
>
>
Returns a string indicating the weight type.
 

Fst compilation functions

Line: 2480 to 2119
  FstArgError: Unknown arc type. FstArgError: Unknown token type. FstOpError: Operation failed.
Changed:
<
<
FstStringCompilationError: String compilation failed.
>
>
FstStringCompilationError: String compilation failed.
  See also:
Line: 2530 to 2167
  An FST.

Raises:

Changed:
<
<
FstIOError: Read failed.
>
>
FstIOError: Read failed.
  See also: algorithms.
Deleted:
<
<
 
string_map(lines, arc_type="standard",
           input_token_type="byte", output_token_type="byte",
Line: 2577 to 2212
  An FST.

Raises:

Changed:
<
<
FstArgError: String map compilation failed.
>
>
FstArgError: String map compilation failed.
  See also: algorithms.
Deleted:
<
<
 
transducer(istring, ostring, weight=None, arc_type="standard",
           input_token_type="byte", output_token_type="byte",
Line: 2627 to 2260
  FstArgError: Unknown token type. FstArgError: Weight types do not match. FstOpError: Operation failed.
Changed:
<
<
FstStringCompilationError: String compilation failed.
>
>
FstStringCompilationError: String compilation failed.
  See also: algorithms.
Line: 2671 to 2302
  An FST.

Raises:

Changed:
<
<
FstArgError: Unknown map type.

See also: ArcMap algorithm.

>
>
FstArgError: Unknown map type.
 
Added:
>
>
See also: ArcMap algorithm.
 
cdrewrite(tau, lambda, rho, sigma_star, direction="ltr", mode="obl")
Line: 2708 to 2337
  FstArgError: Unknown cdrewrite direction type. FstArgError: Unknown cdrewrite mode type. FstOpError: Operation failed.
Deleted:
<
<
 
Deleted:
<
<
 compose(ifst1, ifst2, compose_filter="auto", connect=True)

Constructively composes two FSTs.

Line: 2729 to 2356
  connect: Should output be trimmed?

Returns:

Changed:
<
<
A composed FST.

See also: Compose algorithm.

>
>
A composed FST. See also: Compose algorithm.
 
determinize(ifst, delta=0.0009765625, det_type="functional",
            nstate=NO_STATE_ID, subsequential_label=0, weight=None,
Line: 2764 to 2388
  An equivalent deterministic FST.

Raises:

Changed:
<
<
FstArgError: Unknown determinization type.

See also: Determinize algorithm.

>
>
FstArgError: Unknown determinization type.
 
Added:
>
>
See also: Determinize algorithm.
 
disambiguate(ifst, delta=0.0009765625, nstate=NO_STATE_ID,
             subsequential_label=0, weight=None):
Line: 2790 to 2412
  below which paths are pruned; if omitted, no paths are pruned.

Returns:

Changed:
<
<
An equivalent disambiguated FST.

See also: Disambiguate algorithm.

>
>
An equivalent disambiguated FST.
 
Added:
>
>
See also: Disambiguate algorithm.
 
difference(ifst1, ifst2, compose_filter="auto", connect=True)
Line: 2815 to 2435
  connect: Should the output FST be trimmed?

Returns:

Changed:
<
<
An FST representing the difference of the two FSTs.

See also: Difference algorithm.

>
>
An FST representing the difference of the two FSTs.
 
Added:
>
>
See also: Difference algorithm.
 
epsnormalize(ifst, eps_norm_output=False)
Line: 2837 to 2455
  eps_norm_output: Should the FST be output epsilon-normalized?

Returns:

Changed:
<
<
An equivalent epsilon-normalized FST.
>
>
An equivalent epsilon-normalized FST.
  See also: EpsNormalize algorithm.
Deleted:
<
<
 
equal(ifst1, ifst2, delta=0.0009765625)
Line: 2857 to 2473
  delta: Comparison/quantization delta.

Returns:

Changed:
<
<
True if the FSTs satisfy the above condition, else False.
>
>
True if the FSTs satisfy the above condition, else False.
  See also: Equal algorithm.
Deleted:
<
<
 
equivalent(ifst1, ifst2, delta=0.0009765625)
Line: 2880 to 2494
  True if the FSTs satisfy the above condition, else False.

Raises:

Changed:
<
<
FstOpError: Equivalence test encountered error.

See also: Equivalent algorithm.

>
>
FstOpError: Equivalence test encountered error.
 
Added:
>
>
See also: Equivalent algorithm.
 
intersect(ifst1, ifst2, compose_filter="auto", connect=True)
Line: 2903 to 2515
  connect: Should output be trimmed?

Returns:

Changed:
<
<
An intersected FST.

See also: Intersect algorithm.

>
>
An intersected FST.
 
Added:
>
>
See also: Intersect algorithm.
 
leniently_compose(ifst1, ifst2, compose_filter="auto", connect=True)
Line: 2933 to 2543
 Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict without
Changed:
<
<
relabeling.
>
>
relabeling.
  See also: LenientlyCompose algorithm.
Deleted:
<
<
 
prune(ifst, delta=0.0009765625, nstate=NO_STATE_ID, weight=None)
Line: 2956 to 2564
  below which paths are pruned; if omitted, no paths are pruned.

Returns:

Changed:
<
<
A pruned FST.
>
>
A pruned FST.
  See also: Prune algorithm
Deleted:
<
<
 
push(ifst, delta=0.0009765625, push_weights=False, push_labels=False,
     remove_common_affix=False, remove_total_weight=False, to_final=False)
Line: 2994 to 2600
  to_final: Push towards final states?

Returns:

Changed:
<
<
An equivalent pushed FST.

See also: Push algorithm.

>
>
An equivalent pushed FST.
 
Added:
>
>
See also: Push algorithm.
 
randgen(ifst, npath=1, seed=0, select="uniform", max_length=2147483647,
Changed:
<
<
weight=False, remove_total_weight=False)
>
>
weighted=False, remove_total_weight=False)
  Randomly generate successful paths in an FST.
Line: 3026 to 2630
  `weighted` is False)?

Returns:

Changed:
<
<
An FST containing one or more random paths.

See also: RandGen.

>
>
An FST containing one or more random paths.
 
Added:
>
>
See also: RandGen.
 
reverse(ifst, require_superinitial=True)
Line: 3047 to 2649
  require_superinitial: Should a superinitial state be created?

Returns:

Changed:
<
<
A reversed FST.

See also: Reverse algorithm.

>
>
A reversed FST.
 
Added:
>
>
See also: Reverse algorithm.
 
rmepsilon(ifst, connect=True, delta=0.0009765625, nstate=NO_STATE_ID,
          queue_type="auto", reverse=False, weight=None)
Line: 3073 to 2673
  weights below this threshold will be pruned.

Returns:

Changed:
<
<
An equivalent FST with no epsilon transitions.

See also: RmEpsilon algorithm.

>
>
An equivalent FST with no epsilon transitions.
 
Added:
>
>
See also: RmEpsilon algorithm.
 
shortestdistance(ifst, delta=0.0009765625, nstate=NO_STATE_ID,
                 queue_type="auto", reverse=False)
Line: 3103 to 2701
  be computed?

Returns:

Changed:
<
<
A list of Weight objects representing the shortest distance for each state.

See also: ShortestDistance algorithm.

>
>
A list of Weight objects representing the shortest distance for each state.
 
Added:
>
>
See also: ShortestDistance algorithm.
 
shortestpath(ifst, delta=0.0009765625, nshortest=1, nstate=NO_STATE_ID,
             queue_type="auto", unique=False, weight=None)
Line: 3136 to 2732
  below which paths are pruned; if omitted, no paths are pruned.

Returns:

Changed:
<
<
An FST containing the n-shortest paths.

See also: ShortestPath algorithm.

>
>
An FST containing the n-shortest paths.
 
Added:
>
>
See also: ShortestPath algorithm.
 
state_map(ifst, map_type)
Line: 3159 to 2753
  An FST with states remapped.

Raises:

Changed:
<
<
FstArgError: Unknown map type.

See also: StateMap algorithm.

>
>
FstArgError: Unknown map type.
 
Added:
>
>
See also: StateMap algorithm.
 
synchronize(ifst)
Line: 3180 to 2772
  ifst: The input FST.

Returns:

Changed:
<
<
An equivalent synchronized FST.
>
>
An equivalent synchronized FST.
 
Changed:
<
<
See also: Synchronize algorithm.
>
>
See also: Synchronize algorithm.
  The following destructive methods of Fst also can be called as constructive functions; like the above, these take one or more Fst arguments and return a new Fst.
Line: 3205 to 2796
  The following FST functions have binary operator aliases:
Changed:
<
<
  • compose: *
  • concat: +
  • union: |
>
>
  • compose: *
  • concat: +
  • union: |
 

Weight arithmetic functions

Line: 3232 to 2822
 Raises: FstArgError: Weight type not found (or not in same semiring). FstBadWeightError: Invalid weight.
Deleted:
<
<
 
Deleted:
<
<
 times(lhs, rhs)

Computes the product of two Weights in the same semiring.

Line: 3252 to 2840
 Raises: FstArgError: Weight type not found (or not in same semiring). FstBadWeightError: Invalid weight.
Deleted:
<
<
 
Deleted:
<
<
 power(lhs, rhs)

Computes the iterated product of a weight.

Line: 3269 to 2855
 Raises: FstArgError: Weight type not found (or not in same semiring). FstBadWeightError: Invalid weight.
Deleted:
<
<
 
Deleted:
<
<
 divide(lhs, rhs)

Computes the quotient of two Weights in the same semiring.

Line: 3290 to 2874
  Raises: FstArgError: Weight type not found (or not in same semiring).
Changed:
<
<
FstBadWeightError: Invalid weight.
>
>
FstBadWeightError: Invalid weight.
 

Exceptions

Line: 3321 to 2902
 
while not iterator.done():
  # Do something with =iterator= in its current state.
Changed:
<
<
iterator.next() # Advances the iterator.
>
>
iterator.next() # Advances the iterator.
  Attempting to use an iterator which is "done" may result in a fatal error. To return an iterator to the initial state, one can use the reset method.
Line: 3331 to 2910
 
def size(f):
  """Computes the number of arcs and states in the FST."""
Changed:
<
<
return sum(1 + f.num_arcs(state) for state in f.states())
>
>
return sum(1 + f.num_arcs(state) for state in f.states())
  The method arcs returns an ArcIterator. When iterated over, it yields all arcs leaving some state of the FST. For example, the following function computes the in-degree (number of incoming states) for a given state ID.
Deleted:
<
<
 
def indegree(f, state):
  """Computes in-degree of some state in an FST."""
Line: 3343 to 2920
  for s in f.states(): for arc in f.arcs(s): n += (arc.nextstate == state)
Changed:
<
<
return n
>
>
return n
  The method mutable_arcs is similar to arcs, except that the MutableArcIterator it returns has a special method set_value which replaces the arc at the iterator's current position. For example, the following function changes input labels matching a certain label to epsilon (cf. the relabel_pairs method).
Deleted:
<
<
 
def map_ilabel_to_epsilon(f, ilabel):
  """Maps any input label matching =ilabel= to epsilon (0)."""
Line: 3359 to 2934
  arc.ilabel = 0 aiter.set_value(arc) aiter.next()
Changed:
<
<
return f
>
>
return f
  The method paths returns a StringPaths iterator. This provides several ways to access the paths of an acyclic FST. For each path, one can access the input and output strings via the istring and ostring methods, and the path weight via the weight method. One can also access all input strings, all output strings, and all path weights using the istrings, ostrings, and weights methods, which return generators; after invoking any one of these methods, the caller must reset the iterator to reuse it.
Line: 3377 to 2950
 
# Optimizes the FST if it is *not known* to be deterministic.
if f.properties(pynini.I_DETERMINISTIC, False) != pynini.I_DETERMINISTIC:
Changed:
<
<
f.optimize()
>
>
f.optimize()
  Computing unknown properties is linear in the size of the FST in the worst case.
Deleted:
<
<
 
# Optimizes the FST if it is not already deterministic and epsilon-free.
desired_props = pynini.I_DETERMINISTIC | pynini.NO_EPSILONS
Line: 3386 to 2957
 # Optimizes the FST if it is not already deterministic and epsilon-free. desired_props = pynini.I_DETERMINISTIC | pynini.NO_EPSILONS if f.properties(desired_props, True) = desired_props:
Changed:
<
<
f.optimize()
>
>
f.optimize()
  An FST's properties can be set using the set_properties method.
Changed:
<
<
For more information on FST properties, see here.
>
>
For more information on FST properties, see here.
 

Creating FSTs manually

Line: 3404 to 2973
 state = f.add_state() f.set_start(state) f.set_final(state)
Changed:
<
<
assert f.verify() # OK.
>
>
assert f.verify() # OK.
  The epsilon_machine function eliminates the need for this tedious book-keeping; it returns a one-state, no-arc machine.
Line: 3415 to 2982
 f = pynini.epsilon_machine() assert f.num_states() == 0 # OK. assert f.num_arcs(e.start()) == 0 # OK.
Changed:
<
<
assert f.final(e.start()) == pynini.Weight.One(f.weight_type()) # OK.
>
>
assert f.final(e.start()) == pynini.Weight.One(f.weight_type()) # OK.
  In general, attempting to refer to a state before it has been added to the FST will raise an FstIndexError.
Deleted:
<
<
 
f = pynini.epsilon_machine()
f.set_start(1)  # Not OK: Raises FstIndexError.
Line: 3423 to 2988
 
f = pynini.epsilon_machine()
f.set_start(1)  # Not OK: Raises FstIndexError.
Deleted:
<
<
 
Deleted:
<
<
 f = pynini.epsilon_machine() f.set_final(1) # Not OK: Raises FstIndexError.
Deleted:
<
<
 
Deleted:
<
<
 f = pynini.epsilon_machine()
Changed:
<
<
f.add_arc(1, Arc(65, 65, 0, f.start())) # Not OK: Raises FstIndexError.
>
>
f.add_arc(1, Arc(65, 65, 0, f.start())) # Not OK: Raises FstIndexError.
 The one exception to this generalization is that one may add an arc to a destination state not yet added to the FST, though the FST will not verify until that state is added.
Deleted:
<
<
 
f = pynini.epsilon_machine()
f.add_arc(f.start(), pynini.Arc(65, 65, 0, 1))
Line: 3443 to 3001
 assert not f.verify() # OK, though arc has non-existent destination state ID. state = f.add_state() # OK, now that state exists. f.set_final(state)
Changed:
<
<
assert f.verify() # OK.
>
>
assert f.verify() # OK.
 

Pushdown transducers and multi-pushdown transducers

Changed:
<
<
Pushdown transducers (PDTs) and multi-pushdown transducers (MPDTs) are generalizations of (W)FSTs.
>
>
Pushdown transducers (PDTs) and multi-pushdown transducers (MPDTs) are generalizations of (W)FSTs.
  PDTs represent the class of weighted context-free grammars, and are often used to encode a set of phrase-structure grammar rules. A PDT can be thought of as an WFST in which the machine's memory is augmented by a stack. In Pynini a PDT is encoded as an WFST in which some transitions represent "push" or "pop" operations (represented as open and close parentheses) rather than input and/or output symbols. A path through the PDT is then valid if and only if it is a path through the WFST and the open and close parenthese balance along that path.
Line: 3458 to 3015
 In Pynini, there is no one class representing a PDT or MPDT; rather they are represented as a pair of an Fst and a table of parentheses labels. The PdtParentheses object contains pairs of open/close parentheses labels, and the MPdtParentheses object also contains an integer representing the identity of the stack each parentheses pair is assigned to.

PDTs and MPDT support composition with an FST using the pdt_compose and mpdt_compose functions, respectively. These functions take Fst arguments, a parentheses table (PdtParentheses or MPdtParentheses, respectively), and a specification of whether the right or the left argument should be interpreted as a (M)PDT. Both functions return the FST component of an (M)PDT; the parentheses table can be reused.

Deleted:
<
<
 
(f, fparens) = ipdt
Changed:
<
<
opdt = (pynini.pdt_compose(f, g, fparens), fparens)
>
>
opdt = (pynini.pdt_compose(f, g, fparens), fparens)
  The pdt_expand and mpdt_expand functions can be used to expand a PDT or MPDT into an FST (assuming they have a finite expansion). They take an Fst and parentheses table and return an Fst. The boolean keep_parentheses argument species whether or not expansion should keep parentheses arcs; by default, they are removed during expansion.
Deleted:
<
<
 
(f, fparens) = ipdt
Changed:
<
<
g = pynini.pdt_expand(f, fparens)
>
>
g = pynini.pdt_expand(f, fparens)
  The pdt_reverse and mpdt_reverse functions can be used to reverse a PDT or MPDT. Both take as arguments the two components of an (M)PDT. The pdt_reverse function return the FST component of a PDT, whereas the mpdt_function also returns a modified MPdtParentheses object.
Deleted:
<
<
 
(f, fparen) = impdt
Changed:
<
<
(g, gparens) = pynini.mpdt_reverse(f, fparens)
>
>
(g, gparens) = pynini.mpdt_reverse(f, fparens)
  Finally, PDTs (but not MPDTs) provide a shortest-path operation using pdt_shortestpath. Note that the result is an FST rather than a PDT.
Deleted:
<
<
 
(f, fparens) = ipdt
Changed:
<
<
sp = pynini.pdt_shortestpath(f, fparens)
>
>
sp = pynini.pdt_shortestpath(f, fparens)
 

References

Revision 152018-05-25 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 2072 to 2072
 
Deleted:
<
<
ilabels(self)

Returns the input labels for the current path.

Returns: A list of input labels for the current path.

 istring(self)

Returns the current path's input string.

Line: 2099 to 2090
 
Changed:
<
<
iter_istrings(self)
>
>
istrings(self)
  Generates all input strings in the FST.
Line: 2111 to 2102
 
Changed:
<
<
iter_ostrings(self)
>
>
ostrings(self)
  Generates all output strings in the FST.
Line: 2123 to 2114
 
Changed:
<
<
iter_weights(self)
>
>
weights(self)
  Generates all path weights in the FST.
Line: 2135 to 2126
 
Deleted:
<
<
olabels(self)

Returns the output labels for the current path.

Returns: A list of output labels for the current path.

 ostring(self)

Returns the current path's output string.

Line: 3381 to 3363
 

The method paths returns a StringPaths iterator. This provides several ways to access the paths of an acyclic FST. For each

Changed:
<
<
path, one can access the associated input or output labels (lists of integers) via the ilabels and olabels methods, the input and output strings via the istring and ostring methods, and the path weight via the weight method. One can also access all input strings, all output strings, and all path weights using the iter_istrings, iter_ostrings, and iter_weights methods, which return generators; after invoking any one of these methods, the caller must reset the iterator to reuse it.
>
>
path, one can access the input and output strings via the istring and ostring methods, and the path weight via the weight method. One can also access all input strings, all output strings, and all path weights using the istrings, ostrings, and weights methods, which return generators; after invoking any one of these methods, the caller must reset the iterator to reuse it.
  See in-module documentation strings for more information on iterators.

Revision 142018-03-03 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 2753 to 2753
 See also: Compose algorithm.
Deleted:
<
<
containment(ifst, sigma_star)

Creates a containment of a transducer with respect to a universal language.

This operation constructs a transducer consisting of the input FST optionally preceded and/or followed by any string from some alphabet.

Args: ifst: The input FST. sigma_star: A cyclic, unweighted acceptor representing the closure over the alphabet.

Returns: An FST.

Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict without relabeling.

See also: Containment algorithm.

convert(ifst, fst_type=None)

Constructively converts an FST to a new internal representation.

Args:
  ifst: The input FST.
  fst_type: A string indicating the FST type to convert to, or None if
      no conversion is desired.

  Returns:
    An equivalent Fst converted to the desired FST type.
 determinize(ifst, delta=0.0009765625, det_type="functional", nstate=NO_STATE_ID, subsequential_label=0, weight=None, incremental_subsequential_label=False)

Revision 132018-01-27 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 267 to 267
  As with acceptor, the user may specify a final weight using the weight keyword argument; the final weight is 1. The user also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer, if strings are passed in place of FST arguments. By default (input_token_type="byte" and output_token_type="byte"), each arc corresponds to a byte in the input and/or output string. Finally, the user may specify whether or not symbol tables should be attached to the FST.
Changed:
<
<
Whereas transducer is used to construct the cross-product of two strings, the union of many such string pair cross-products is compiled using the string_map and string_file functions. The former takes 2. δ(qi li) = qi + 1 for any pi = (qi, li, qi + 1) ∈ P, and an iterable of pairs of strings (or a string-to-string dictionary) as an argument; the latter reads string pairs from two-column TSV file. Both functions use a compact prefix tree representation which can be used to represent a (multi)map.
>
>
Whereas transducer is used to construct the cross-product of two strings, the union of many such string pair cross-products is compiled using the string_map and string_file functions. The former takes iterables of strings as an argument; the latter reads string pairs from one-to-three TSV file. Both functions use a compact prefix tree representation which can be used to represent a (multi)map.
  As with the above, the user may specify the desired arc type using the arc_type keyword argument. And as with transducer, the user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer using the appropriate keyword arguments, and whether input and/or output symbol tables are attached.
Line: 2513 to 2512
 Creates a transducer that maps between elements of mappings read from a tab-delimited file.
Added:
>
>
The first column is interpreted as the input string to a transduction.

The second column, separated from the first by a single tab character, is interpreted as the output string for the transduction; an acceptor can be modeled by using identical first and second columns.

An optional third column, separated from the second by a single tab character, is interpreted as a weight for the transduction; if not specified the weight defaults to semiring One. Note that weights are never permitted in the second column.

 The comment character is #, and has scope until the end of the line. Any preceding whitespace before a comment is ignored. To use the '#' literal (i.e., to ensure it is not interpreted as the start of a comment) escape it with \; the escaping \ in the string "\#" also ignored.

Args:

Changed:
<
<
filename: The path to a file consisting of lines of input/output pairs separated by a tab character; if a line element is a epsilon_machine, the identity mapping is used.
>
>
filename: The path to a TSV file formatted as described above.
  arc_type: A string indicating the arc type. input_token_type: A string indicating how the input strings are to be
Changed:
<
<
encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
>
>
encoded as arc labels---one of: "utf8" (encodes strings as a UTF-8
  encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a SymbolTable. output_token_type: A string indicating how the output strings are to be
Changed:
<
<
encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
>
>
encoded as arc labels---one of: "utf8" (encodes strings as a UTF-8
  encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a SymbolTable. attach_input_symbols: should the symbol table used to compile the
Line: 2543 to 2551
  FstIOError: Read failed.
Changed:
<
<
See also: String algorithms.
>
>
See also: algorithms.
 
Changed:
<
<
string_map(pairs, arc_type="standard", input_token_type="byte", output_token_type="byte", attach_input_symbols=True,
>
>
string_map(lines, arc_type="standard", input_token_type="byte", output_token_type="byte", attach_input_symbols=True,
  attach_output_symbols=True)
Changed:
<
<
Creates a transducer that maps between elements of mappings.
>
>
Creates a transducer that maps between elements of mappings read from an iterable.

The first element in each iterable is interpreted as the input string.

The optional second element is interpreted as the output string for the transduction; if not specified it defaults to the value of the first element.

An optional third element is interpreted as a weight for the transduction; if not specified it defaults to semiring One.

  Args:
Changed:
<
<
pairs: An iterable containing strings. If the iterable implements .iteritems or .items, this is used to extract the pairs. If any element of the iterable is a string, or an iterable containing exactly one string, the identity mapping is used.
>
>
lines: An iterable of indexables of size one, two, or three. If the iterable implements .iteritems or .items, this is used to extract the indexables. The first element in each indexable is interpreted as the input string, the second (optional) as the output string, defaulting to the input string, and the third (optional) as a string to be parsed as a weight, defaulting to semiring One.
  arc_type: A string indicating the arc type. input_token_type: A string indicating how the input strings are to be
Changed:
<
<
encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
>
>
encoded as arc labels---one of: "utf8" (encodes strings as a UTF-8
  encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a SymbolTable. output_token_type: A string indicating how the output strings are to be
Changed:
<
<
encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
>
>
encoded as arc labels---one of: "utf8" (encodes strings as a UTF-8
  encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a SymbolTable. attach_symbols: should the symbol table used to compile the
Line: 2575 to 2595
  An FST.

Raises:

Deleted:
<
<
FstArgError: Mappings must be of length 1 or 2.
  FstArgError: String map compilation failed.
Deleted:
<
<
 

See also: algorithms.

Revision 122018-01-26 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 139 to 139
 
  1. (𝕂, ⊕) is a commutative monoid,
  2. (𝕂, ⊗) is a monoid,
Changed:
<
<
  1. for all a, b, c ∈ 𝕂, a ⊗ (bc) = (a ⊗ b) ⊕ (ac), and
>
>
  1. for all a, b, c ∈ 𝕂, a ⊗ (bc) = (a ⊗ b) ⊕ (ac), and
 
  1. for all a ∈ 𝕂, a0 = 0a = 0 where 0 is the identity element for the monoid (𝕂, ⊕).

In many cases, 𝕂 is the set of real numbers, so a semiring can be denoted simply by specifying the pair of operators (⊕ ⊗). The so-called real semiring is simply (+, ×).

Revision 112017-11-27 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 181 to 181
 

Constructing acceptors

Changed:
<
<
The acceptor function compiles a (Unicode or byte) string into a deterministic acceptor. The user may specify a final weight using the weight keyword argument; by default, the final weight is 1. The user may also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the string are to be translated into labels of the acceptor. By default (token_type="byte"), each arc in the acceptor corresponds to byte in the input string.
>
>
The acceptor function compiles a (Unicode or byte) string into a deterministic acceptor. The user may specify a final weight using the weight keyword argument; by default, the final weight is 1. The user may also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the string are to be translated into labels of the acceptor. By default (token_type="byte"), each arc in the acceptor corresponds to byte in the input string. Finally, the user may specify whether or not the symbol table used should be attached to the FST using the attach_symbols keyword argument, which defaults to True.
 
# UTF-8 bytestring, byte arcs.
Line: 265 to 265
  The transducer function creates a transducer from two FSTs, compiling string arguments into FSTs if necessary. The result is a cross-product of the two input FSTs, interpreted as acceptors. Specifically, the transducer is constructed by mapping output arcs in the first FST to epsilon, mapping the input arcs in the second FST to epsilon, then concatenating the two. In the case that both FSTs are strings, the resulting transducer will simply contain the input and output string converted into labels, and the shorter of the two will be padded with epsilons.
Changed:
<
<
As with acceptor, the user may specify a final weight using the weight keyword argument; the final weight is 1. The user also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer, if strings are passed in place of FST arguments. By default (input_token_type="byte" and output_token_type="byte"), each arc corresponds to a byte in the input and/or output string.
>
>
As with acceptor, the user may specify a final weight using the weight keyword argument; the final weight is 1. The user also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer, if strings are passed in place of FST arguments. By default (input_token_type="byte" and output_token_type="byte"), each arc corresponds to a byte in the input and/or output string. Finally, the user may specify whether or not symbol tables should be attached to the FST.
  Whereas transducer is used to construct the cross-product of two strings, the union of many such string pair cross-products is compiled using the string_map and string_file functions. The former takes 2. δ(qi li) = qi + 1 for any pi = (qi, li, qi + 1) ∈ P, and an iterable of pairs of strings (or a string-to-string dictionary) as an argument; the latter reads string pairs from two-column TSV file. Both functions use a compact prefix tree representation which can be used to represent a (multi)map.
Changed:
<
<
As with the above, the user may specify the desired arc type using the arc_type keyword argument. And as with transducer, the user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer using the appropriate keyword arguments.
>
>
As with the above, the user may specify the desired arc type using the arc_type keyword argument. And as with transducer, the user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer using the appropriate keyword arguments, and whether input and/or output symbol tables are attached.
 

Worked examples

Line: 606 to 606
  Sets the encoder's output symbol table.
Changed:
<
<
Args:
>
>
Args:This section covers classes (representing FSTs, weights, and the like), their methods (representing various accessors as well as destructive FST algorithms), and functions (representing constructive FST algorithms, weight arithmetic, and the like). This section covers classes (representing FSTs, weights, and the like), their methods (representing various accessors as well as destructive FST algorithms), and functions (representing constructive FST algorithms, weight arithmetic, and the like).
  syms: A SymbolTable.
Line: 2462 to 2464
 Returns a string indicating the weight type.
Added:
>
>

Fst compilation functions

The following functions are used to construct FSTs from strings, returning a new Fst.

acceptor(astring, weight=None, arc_type="standard", token_type="byte",
         attach_symbols=True)

Creates an acceptor from a string.

This function creates an FST which accepts its input with a fixed weight
(defaulting to semiring One).

Args:
  astring: The input string.
  weight: A Weight or weight string indicating the desired path weight. If
      omitted or null, the path weight is set to semiring One.
  arc_type: An optional string indicating the arc type for the compiled FST.
      This argument is silently ignored if istring and/or ostring is already
      compiled.
  token_type: Either a string indicating how the input string is to be
      encoded as arc labels---one of: utf8" (encodes the strings as UTF-8
      encoded Unicode string), "byte" (encodes the string as raw bytes)---or
      a SymbolTable to be used to encode the string.
  attach_symbols: Should the symbol table used to compile the acceptor be
      attached to the FST?

  Returns:
    An FST acceptor.

  Raises:
    FstArgError: Unknown arc type.
    FstArgError: Unknown token type.
    FstOpError: Operation failed.
    FstStringCompilationError: String compilation failed.

See also:

See also: String algorithms.

string_file(filename, arc_type="standard", input_token_type="byte",
            output_token_type="byte")

Creates a transducer that maps between elements of mappings read from
a tab-delimited file.

The comment character is #, and has scope until the end of the line. Any
preceding whitespace before a comment is ignored. To use the '#' literal
(i.e., to ensure it is not interpreted as the start of a comment) escape it
with \; the escaping \ in the string "\#" also ignored.

Args:
  filename: The path to a file consisting of lines of input/output pairs
      separated by a tab character; if a line element is a epsilon_machine,
      the identity mapping is used.
  arc_type: A string indicating the arc type.
  input_token_type: A string indicating how the input strings are to be
      encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
      encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a
      SymbolTable.
  output_token_type: A string indicating how the output strings are to be
      encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
      encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a
      SymbolTable.
  attach_input_symbols: should the symbol table used to compile the
      input-side acceptor be attached to the FST?
  attach_output_symbols: should the symbol table used to compile the
      output-side acceptor be attached to the FST?

Returns:
  An FST.

Raises:
  FstIOError: Read failed.

See also: String algorithms.

string_map(pairs, arc_type="standard", input_token_type="byte",
           output_token_type="byte", attach_input_symbols=True,
           attach_output_symbols=True)

Creates a transducer that maps between elements of mappings.

Args:
  pairs: An iterable containing strings. If the iterable implements .iteritems
    or .items, this is used to extract the pairs. If any element of the
    iterable is a string, or an iterable containing exactly one string, the
    identity mapping is used.
  arc_type: A string indicating the arc type.
  input_token_type: A string indicating how the input strings are to be
      encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
      encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a
      SymbolTable.
  output_token_type: A string indicating how the output strings are to be
      encoded as arc labels---one of: utf8" (encodes strings as a UTF-8
      encoded Unicode strings), "byte" (encodes strings as raw bytes)---or a
      SymbolTable.
  attach_symbols: should the symbol table used to compile the
      input-side acceptor be attached to the FST?
  attach_output_symbols: should the symbol table used to compile the
      output-side acceptor be attached to the FST?

Returns:
  An FST.

Raises:
  FstArgError: Mappings must be of length 1 or 2.
  FstArgError: String map compilation failed.

See also: algorithms.

transducer(istring, ostring, weight=None, arc_type="standard",
           input_token_type="byte", output_token_type="byte",
           attach_input_symbols=True, attach_output_symbols=True)

Creates a transducer from a pair of strings or acceptor FSTs.

This function creates an FST which transduces from the first string to
the second with a fixed weight (defaulting to semiring One). If one or both
of the input arguments is already compiled as an FST, the resulting transducer
is simply the cross-product between the language accepted by the upper and
lower FSTs.

Args:
  istring: The input string, or an acceptor FST representing the upper
      language.
  ostring: The output string, or an acceptor FST representing the upper
      language.
  weight: A Weight or weight string indicating the desired path weight. If
      omitted or null, the path weight is set to semiring One. This argument
      is silently ignored if istring and/or ostring is already compiled.
  arc_type: An optional string indicating the arc type for the compiled FST.
      This argument is silently ignored if istring and/or ostring is already
      compiled.
  input_token_type: Either a string indicating how the upper string is to be
      encoded as arc labels---one of: utf8" (encodes the strings as UTF-8
      encoded Unicode string), "byte" (encodes the string as raw bytes)---or
      a SymbolTable to be used to encode the string.
  output_token_type: Either a string indicating how the lower string is to be
      encoded as arc labels---one of: utf8" (encodes the strings as UTF-8
      encoded Unicode string), "byte" (encodes the string as raw bytes)---or
      a SymbolTable to be used to encode the string.
  attach_input_symbols: should the symbol table used to compile the
      input-side acceptor (if applicable) be attached to the FST?
  attach_output_symbols: should the symbol table used to compile the
      output-side acceptor (if applicable) be attached to the FST?

Returns:
  An FST transducer.

Raises:
  FstArgError: Unknown arc type.
  FstArgError: Unknown token type.
  FstArgError: Weight types do not match.
  FstOpError: Operation failed.
  FstStringCompilationError: String compilation failed.

See also: algorithms.

 

Fst algorithm functions

Line: 2951 to 3121
  queue_type="auto", reverse=False)

Compute the shortest distance from the initial or final state.

Changed:
<
<
>
>
manuallymanually
 This operation computes the shortest distance from the initial state (when `reverse` is False) or from every state to the final state (when `reverse` is True). The shortest distance from p to q is the otimes-sum of the weights of

Revision 102017-11-26 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 2504 to 2504
  FstArgError: Unknown map type.
Changed:
<
<
See also: ArcMap algortihm.
>
>
See also: ArcMap algorithm.
 
cdrewrite(tau, lambda, rho, sigma_star, direction="ltr", mode="obl")

Revision 92017-10-18 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 235 to 235
 Sequences of characters enclosed by [ and ] receive special interpretations in byte or utf8 mode. Pynini first attempts to parse any such sequence as an integer using the C standard library function strtoll.
Changed:
<
<
assert pynini.acceptor("b0x61") == pynini.acceptor("baa") # OK.
>
>
assert pynini.acceptor("b[0x61][97]") == pynini.acceptor("baa") # OK.
 

If this fails, then Pynini treats the sequence as a sequence of one or more whitespace-delimited "generated" symbols, each of which is given a unique identifier in the resulting FST's symbol table.

Line: 249 to 249
 A bracket character to be interpreted literally must be escaped.
Changed:
<
<
bracket = pynini.acceptor("[") # OK.
>
>
bracket = pynini.acceptor("\[") # OK.
 

Revision 82017-09-05 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 1331 to 1331
 See also: FST properties.
Changed:
<
<
prune(self, delta=0.0009765625, nstate=-1, weight=None)
>
>
prune(self, delta=0.0009765625, nstate=NO_STATE_ID, weight=None)
  Removes paths with weights below a certain threshold.
Line: 1489 to 1489
 See also: Reweight algorithm.
Changed:
<
<
rmepsilon(self, connect=True, delta=0.0009765625, nstate=-1, weight=None)
>
>
rmepsilon(self, connect=True, delta=0.0009765625, nstate=NO_STATE_ID, weight=None)
  Removes epsilon transitions.
Line: 2177 to 2178
 
Methods
Changed:
<
<
add_symbol(self, symbol, key=-1)
>
>
add_symbol(self, symbol, key=NO_SYMBOL)
  Adds a symbol to the table and returns the index.
Line: 2186 to 2187
  Args: symbol: A symbol string.
Changed:
<
<
key: An index for the symbol (-1 is reserved for "no symbol requested").
>
>
key: An index for the symbol; if not specified, the next index will be used.
  Returns: The integer key of the new symbol.
Line: 2601 to 2603
 
Changed:
<
<
determinize(ifst, delta=0.0009765625, det_type="functional", nstate=-1, subsequential_label=0, weight=None,
>
>
determinize(ifst, delta=0.0009765625, det_type="functional", nstate=NO_STATE_ID, subsequential_label=0, weight=None,
  incremental_subsequential_label=False)

Constructively determinizes a weighted FST.

Line: 2636 to 2638
 See also: Determinize algorithm.
Changed:
<
<
disambiguate(ifst, delta=0.0009765625, nstate=-1, subsequential_label=0, weight=None):
>
>
disambiguate(ifst, delta=0.0009765625, nstate=NO_STATE_ID, subsequential_label=0, weight=None):
  Constructively disambiguates a weighted transducer.
Line: 2805 to 2807
 See also: LenientlyCompose algorithm.
Changed:
<
<
prune(ifst, delta=0.0009765625, nstate=-1, weight=None)
>
>
prune(ifst, delta=0.0009765625, nstate=NO_STATE_ID, weight=None)
  Constructively removes paths with weights below a certain threshold.
Line: 2919 to 2921
 See also: Reverse algorithm.
Changed:
<
<
rmepsilon(ifst, connect=True, delta=0.0009765625, nstate=-1,
>
>
rmepsilon(ifst, connect=True, delta=0.0009765625, nstate=NO_STATE_ID,
  queue_type="auto", reverse=False, weight=None)

Constructively removes epsilon transitions from an FST.

Line: 2945 to 2947
 See also: RmEpsilon algorithm.
Changed:
<
<
shortestdistance(ifst, delta=0.0009765625, nstate=-1, queue_type="auto", reverse=False)
>
>
shortestdistance(ifst, delta=0.0009765625, nstate=NO_STATE_ID, queue_type="auto", reverse=False)
  Compute the shortest distance from the initial or final state.
Line: 2975 to 2977
 See also: ShortestDistance algorithm.
Changed:
<
<
shortestpath(ifst, delta=0.0009765625, nshortest=1, nstate=-1,
>
>
shortestpath(ifst, delta=0.0009765625, nshortest=1, nstate=NO_STATE_ID,
  queue_type="auto", unique=False, weight=None)

Construct an FST containing the shortest path(s) in the input FST.

Revision 72017-09-05 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 636 to 636
 Args: filename: A string indicating the filename. mode: FAR IO mode; one of: "r" (open for reading), "w" (open for writing).
Changed:
<
<
arc_type: Desired arc type; this is ignored if the FAR is opened for reading. far_type: Desired FAR type; this is ignored if the FAR is opened for reading.
>
>
arc_type: Desired arc type; ignored if the FAR is opened for reading. far_type: Desired FAR type; ignored if the FAR is opened for reading.
 

Methods
Line: 2493 to 2491
  delta: Comparison/quantization delta (ignored unless `map_type` is `quantize`). map_type: A string matching a known mapping operation (see above).
Changed:
<
<
weight: A Weight or weight string passed to the arc-mapper; this is ignored unless `map_type` is `plus` (in which case it defaults to semiring Zero) or `times` (in which case it defaults to semiring One).
>
>
weight: A Weight or weight string passed to the arc-mapper; ignored unless `map_type` is `plus` (in which case it defaults to semiring Zero) or `times` (in which case it defaults to semiring One).
  Returns: An FST.
Line: 2874 to 2872
 Randomly generate successful paths in an FST.

This operation randomly generates a set of successful paths in the input FST.

Changed:
<
<
This relies on a mechanism for selecting arcs, specified using the select
>
>
This relies on a mechanism for selecting arcs, specified using the `select`
 argument. The default selector, "uniform", randomly selects a transition using a uniform distribution. The "log_prob" selector randomly selects a transition w.r.t. the weights treated as negative log probabilities after
Line: 2953 to 2951
 Compute the shortest distance from the initial or final state.

This operation computes the shortest distance from the initial state (when

Changed:
<
<
reverse is False) or from every state to the final state (when reverse is
>
>
`reverse` is False) or from every state to the final state (when `reverse` is
 True). The shortest distance from p to q is the otimes-sum of the weights of all the paths between p and q. The weights must be right (if `reverse` is False) or left (if `reverse` is True) distributive, and k-closed (i.e., 1
Line: 2963 to 2961
 Args: ifst: The input FST. delta: Comparison/quantization delta.
Changed:
<
<
nstate: State number threshold (this is ignored if reverse is True).
>
>
nstate: State number threshold (ignored if `reverse` is True).
  queue_type: A string matching a known queue type; one of: "auto", "fifo",
Changed:
<
<
"lifo", "shortest", "state", "top" (this is ignored if reverse is
>
>
"lifo", "shortest", "state", "top" (ignored if `reverse` is
  True). reverse: Should the reverse distance (from each state to the final state) be computed?

Revision 62017-08-21 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 135 to 135
  A monoid (M, •) is said to be commutative if for all a, bM, ab = ba.
Changed:
<
<
Then, a semiring is a triple (𝕂, ⊕, &otimes) such that:
>
>
Then, a semiring is a triple (𝕂, ⊕, ⊗) such that:
 
  1. (𝕂, ⊕) is a commutative monoid,
  2. (𝕂, ⊗) is a monoid,
Line: 267 to 267
  As with acceptor, the user may specify a final weight using the weight keyword argument; the final weight is 1. The user also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer, if strings are passed in place of FST arguments. By default (input_token_type="byte" and output_token_type="byte"), each arc corresponds to a byte in the input and/or output string.
Changed:
<
<
Whereas transducer is used to construct the cross-product of two strings, the union of many such string pair cross-products is compiled using the string_map and string_file functions. The former takes an iterable of pairs of strings (or a string-to-string dictionary) as an argument; the latter reads string pairs from two-column TSV file. Both functions use a compact prefix tree representation which can be used to represent a (multi)map.
>
>
Whereas transducer is used to construct the cross-product of two strings, the union of many such string pair cross-products is compiled using the string_map and string_file functions. The former takes 2. δ(qi li) = qi + 1 for any pi = (qi, li, qi + 1) ∈ P, and an iterable of pairs of strings (or a string-to-string dictionary) as an argument; the latter reads string pairs from two-column TSV file. Both functions use a compact prefix tree representation which can be used to represent a (multi)map.
  As with the above, the user may specify the desired arc type using the arc_type keyword argument. And as with transducer, the user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer using the appropriate keyword arguments.

Revision 52017-07-07 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Line: 1185 to 1185
 
Changed:
<
<
num_input_epsilsons(self, state)
>
>
num_input_epsilons(self, state)
  Returns the number of arcs with epsilon input labels leaving a state.
Line: 1200 to 1200
 
Changed:
<
<
num_output_epsilsons(self, state)
>
>
num_output_epsilons(self, state)
  Returns the number of arcs with epsilon output labels leaving a state.

Revision 42017-07-06 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python

Added:
>
>

        ,ggggggggggg,
       dP"""88""""""Y8,
       Yb,  88      `8b
        `"  88      ,8P                        gg                  gg
            88aaaad8P"                         ""                  ""
            88"""""  gg     gg   ,ggg,,ggg,    gg    ,ggg,,ggg,    gg
            88       I8     8I  ,8" "8P" "8,   88   ,8" "8P" "8,   88
            88       I8,   ,8I  I8   8I   8I   88   I8   8I   8I   88
            88      ,d8b, ,d8I ,dP   8I   Yb,_,88,_,dP   8I   Yb,_,88,_
            88      P""Y88P"8888P'   8I   `Y88P""Y88P'   8I   `Y88P""Y8
                          ,d8I'
                        ,dP'8I
                       ,8"  8I
                       I8   8I
                       `8, ,8I
                        `Y8P"

 

Introduction

Pynini (Gorman 2016) is a library for compiling a grammar of strings, regular expressions, and context-dependent rewrite rules into weighted finite-state transducers.

Revision 32017-06-30 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"
Changed:
<
<

Pynini: Grammar compilation in Python Work in progress, under construction

>
>

Pynini: Grammar compilation in Python

 

Introduction

Line: 21 to 21
 
  1. formal preliminaries underlying finite-state transducers,
  2. getting started with Pynini,
Changed:
<
<
  1. examples of Pynini in action, and
    <--4.  API reference, and-->
  2. advanced topics.
>
>
  1. examples of Pynini in action,
  2. API reference, and
  3. advanced topics.
  Those who are familiar with FSTs may prefer to jump straight to getting started with Pynini. Or, for a quick start, you may also wish to read our Pynini tutorial for O'Reilly.
Line: 388 to 389
 DECIPHERED PLAINTEXT: THE SINGLE MOST POPULAR CHEESE IN THE WORLD
Changed:
<
<
>
>
  • FstOpError: signals an error returned by an FST algorithm; usually, this is accompanied by a low-level logging statement.
  • FstStringCompilationError: signals an error compiling a string into an FST.
  • FstSymbolTableMergeError: signals that a symbol table relabeling is required but disabled (using the --fst_relabel_symbol_conflicts command-line flag).
 

Advanced topics

Revision 22017-06-30 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python Work in progress, under construction

Line: 8 to 8
 

Design

Changed:
<
<
Pynini supports much of the functionality of Thrax, but whereas Thrax is essentially a compiler for a domain-specific language, Pynini is implemented as a Python extension module. This provides several advantages:
>
>
Pynini supports much of the functionality of Thrax, but whereas Thrax is essentially a compiler for a domain-specific language, Pynini is implemented as a Python extension module. This provides several advantages:
 
  • Pynini users can exploit Python's rich tooling ecosystem, including logging and unit testing frameworks.
  • Pynini users can incorporate Thrax primitives like string compilation into executables.

Revision 12017-06-29 - KyleGorman

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="Pynini"

Pynini: Grammar compilation in Python Work in progress, under construction

Introduction

Pynini (Gorman 2016) is a library for compiling a grammar of strings, regular expressions, and context-dependent rewrite rules into weighted finite-state transducers.

Design

Pynini supports much of the functionality of Thrax, but whereas Thrax is essentially a compiler for a domain-specific language, Pynini is implemented as a Python extension module. This provides several advantages:

  • Pynini users can exploit Python's rich tooling ecosystem, including logging and unit testing frameworks.
  • Pynini users can incorporate Thrax primitives like string compilation into executables.

Pynini is based on the OpenFst library, a C++11 template library for weighted finite-state transducers, and particularly its Python extension.

Outline

This document covers the following areas:

  1. formal preliminaries underlying finite-state transducers,
  2. getting started with Pynini,
  3. examples of Pynini in action, and
    <--4.  API reference, and-->
  4. advanced topics.

Those who are familiar with FSTs may prefer to jump straight to getting started with Pynini. Or, for a quick start, you may also wish to read our Pynini tutorial for O'Reilly.

Formal preliminaries

Finite-state transducers (FSTs) are models of computation widely used for speech and language processing:

  • Speech recognition: language models, the pronunciation dictionaries, and the lattices produced by the acoustic model are all represented as FSTs; combined, these define the hypothesis space for the recognizer.
  • Speech synthesis: FSTs are used for text normalization, the phase of linguistic analysis that converts written text like "meet me at St. Catherine on Bergen St. @ 9:30" to a pronounceable utterance like "meet me at the Saint Catherine on Bergen Street at nine thirty".
  • Natural language processing: FSTs are used to represent sequence models such as those used for part of speech tagging, noun chunking, and named entity recognition.
  • Information retrieval: finite automata are used to detect dates, times, etc., in web text.

FSTs are less "expressive" than e.g., arbitrary C++ functions or neural networks. However, the formal constraints that limit their expressivity guarantee that FSTs can be efficiently combined, optimized, and searched.

State machines

Finite automata are simply state machines with a finite number of states. And, a state machine is any devices whose behavior can be modeled using transitions between a number of states. While our finite automata are implemented in software, there are also hardware state machines, like the traditional gumball machine (image credit: Wikimedia Commons):

360px-Kosher_gumballs.JPG

A working gumball machine can be in one of two states: either its coin slot does, or does not, contain a coin. Initially the gumball machine has an empty coin slot; and turning the knob has no effect on the state of the machine. But once a coin is inserted into the coin slot, turning the knob dispenses a gumball and empties the coin slot. If the knob is turned while the coin slot is vacant, no change of state occurs. However, if the knob is turned while the coin slot is occupied, a gumball is dispensed and the coin slot is cleared. We can represent this schematic description of a gumball machine as a directed graph in which the states are represented by circles and the transitions between states—henceforth, arcs—are represented by arrows between states. Each arc has a pair of labels representing the input consumed, and output emitted, respectively, when that arc is traversed. By convention, the Greek letter ε (epsilon) is used to represent null input and/or output along an arc.

gumball.png

The bold double-circle state labeled 0 simply indicates that that state is "final", whereas the single-circle state 1 is non-final. Here, this encodes the notion that no user would walk away from a gumball machine while there's still a coin in the slot.

Finite-state acceptors

In what follows, we draw heavily from chapter 1 of Roark & Sproat 2008, and interested readers are advised to consult that resource for further details.

Definition

Finite-state acceptors (or FSAs) are the simplest type of FST. Each FSA represents a set of strings (henceforth, a language), similar to a regular expression, as follows. An FSA consists of a five-tuple (Q, S, F, Σ δ) where:

  1. Q is a finite set of states,
  2. S is the set of start states,
  3. F is the set of final states,
  4. Σ is the alphabet, and
  5. δ : Q × (Σ ∪ ε) → Q is the transition relation.

A path through an FSA (Q, S, F, Σ δ) is simply a sequence of transitions starting with a start state sS, taking only transitions licensed by δ, and terminating at a final state fF. More formally, let a path be a sequence P = p0, p1, …, pn where each element pP is a triple over Q × (Σ ∪ ε) × Q. Each triple consists of a source state, a label in the alphabet (or the null label ε) and a destination state. Then, a path is valid if and only if all of the following conditions hold:

  1. The source state for p0 is in S,
  2. δ(qi li) = qi + 1 for any pi = (qi, li, qi + 1) ∈ P, and
  3. the destination state for pn is in F.

The first condition requires that the path begin at a start state, the second condition requires that all transitions be licensed by δ, and the third condition requires that the final transition be to a final state.

Let the string of a path be the (possibly empty) sequence of labels. For instance, if P is (s, l0, q1, (q1, l1, q2), (q2, l2, q3), then the string of P is simply l0, lsub>1, l2. The language described by an FSA is simply the union of all strings of all its paths.

Example

The following FSA represents the Perl-style regular expression ab*cd+e:

abcde.png

Here, the bold circle labeled 0 indicates the start state, and the double circle labeled 4 indicates the final state.

Finite-state transducers

Definition

Finite-state transducers (FSTs) are generalization of FSAs. In the normal case of a two-way transducer, δ is instead a relation from Q × (Σi ∪ ε) × (Σo ∪ ε) → Q where Σiand Σo are the input and output alphabets, respectively. Paths through a FST are defined similarly to the definition given for FSAs above, except that each path corresponds to a set of two strings, an input string over Σi* and an output string over Σo*. Whereas FSAs describe sets of strings, FSTs describe relations between sets of strings.

When the relation described by an FST is such that each input string corresponds to at most one output string, we say that the FST is functional.

Example

The following FST represents the relation (a:a)(b:b)*(c:g)(d:f)+(e:e):

abcgdfe.png

Weighted finite-state transducers

Definition

Weighted finite-state transducers (WFSTs) are generalizations of FSTs which use an alternative definition of both F and δ incorporating the notion of weights. FST weights and their operations can be understood by first defining the notion of semiring, which require us to first define the notion monoid.

A monoid is a pair (M, •) where M is a set and • is a binary operation on M such that:

  1. • is closed over M: for all a, b in M, abM,
  2. there is an identity element eM such that ae = e • = a for all aM, and
  3. • is associative; for all a, b, cM, (ab) • c = a • (bc).

A monoid (M, •) is said to be commutative if for all a, bM, ab = ba.

Then, a semiring is a triple (𝕂, ⊕, &otimes) such that:

  1. (𝕂, ⊕) is a commutative monoid,
  2. (𝕂, ⊗) is a monoid,
  3. for all a, b, c ∈ 𝕂, a ⊗ (bc) = (a ⊗ b) ⊕ (ac), and
  4. for all a ∈ 𝕂, a0 = 0a = 0 where 0 is the identity element for the monoid (𝕂, ⊕).

In many cases, 𝕂 is the set of real numbers, so a semiring can be denoted simply by specifying the pair of operators (⊕ ⊗). The so-called real semiring is simply (+, ×).

When working with probabilities as weights, we often use the tropical semiring (min, +) and negative log probabilities, taking advantage of the logarithmic identity log(x) + log(y) = log(xy). The tropical semiring (and the associated standard arc type) is the default semiring in Pynini.

At last, we can give the modified definitions for F and delta; for WFSTs. Whereas for unweighted FSTs, F is a set of final states, for WFSTs F is a set of pairs over Q × 𝕂, where the second element is the final weight for that state. And, the transition relation δ for a WFST is from Q × (Σi ∪ ε) × (Σo ∪ ε) × 𝕂 to Q. The definition of paths is parallel to those for unweighted FSTs except that each element in the path is also associated with a weight in 𝕂.

Example

WFSTs are a natural representation for conditional probability distributions from strings to strings. For example, consider a text normalization rule which verbalizes 2:00 as two with P = .2 and as two o'clock with P = .8. The following probability (i.e., real-valued) semiring WFST encodes this distribution:

time.png

Conventions used in Pynini

  • Pynini represents all acceptors and transducers, weighted or unweighted, as WFSTs. Thus, for example, a weighted finite-state acceptor (WFSA) is represented as a WFST in which input and output labels match in all cases, and an unweighted finite-state transducer is represented by a WFST in which all weights are 1 or 0.
  • Pynini only permits one state to be designated the start state.
  • Pynini assigns a final weight to all states; a nonfinal state is just one which has a final weight of 0.

Getting started with Pynini

Starting Pynini

Install Pynini then simply import the module as follows:

import pynini

Building FSTs

In Pynini, all FST objects are an instance of a class called pynini.Fst, representing a mutable WFST. The user must specify the arc type at construction time; by default, the standard arc type (and the associated tropical semiring) is are used.

Pynini provides several functions for compiling strings into FSTs; we proceed to review a few of these methods.

Constructing acceptors

The acceptor function compiles a (Unicode or byte) string into a deterministic acceptor. The user may specify a final weight using the weight keyword argument; by default, the final weight is 1. The user may also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the string are to be translated into labels of the acceptor. By default (token_type="byte"), each arc in the acceptor corresponds to byte in the input string.

# UTF-8 bytestring, byte arcs.
print pynini.acceptor("Pont l'Evêque")
# Output below...

0 1 P
1 2 o
2 3 n
3 4 t
4 5 <SPACE>
5 6 l
6 7 '
7 8 E
8 9 v
9 10  <0xc3>
10  11  <0xaa>
11  12  q
12  13  u
13  14  e
14

If the user specifies token_type="utf8" then each arc in the FST corresponds to a Unicode codepoint.

# UTF-8 bytestring, Unicode codepoint arcs.
print pynini.acceptor("Pont l'Evêque", token_type="utf8")
# Output below...

0 1 P
1 2 o
2 3 n
3 4 t
4 5 <SPACE>
5 6 l
6 7 '
7 8 E
8 9 v
9 10  <0xea>
10  11  q
11  12  u
12  13  e
13

Sequences of characters enclosed by [ and ] receive special interpretations in byte or utf8 mode. Pynini first attempts to parse any such sequence as an integer using the C standard library function strtoll.

assert pynini.acceptor("b0x61") == pynini.acceptor("baa")  # OK.

If this fails, then Pynini treats the sequence as a sequence of one or more whitespace-delimited "generated" symbols, each of which is given a unique identifier in the resulting FST's symbol table.

x = pynini.acceptor("[It's not much of a cheese shop really]")
y = pynini.acceptor("[It's][not][much][of][a][cheese][shop][really]")
assert x == y   # OK.

A bracket character to be interpreted literally must be escaped.

bracket = pynini.acceptor("[")  # OK.

unused = pynini.acceptor("[")  # Not OK: Raises FstStringCompilationError.

Finally, if the user specifies a SymbolTable as the token_type, then the input string is parsed according and the FST labeled according to that table. String parsing failures are logged and raise a FstStringCompilationError exception from within the Python interpreter.

Nearly all FST operations permit a string to be passed in place of an Fst argument; in that case, pynini.acceptor is used (with default arguments) to compile the string into an FST.

Constructing transducers

The transducer function creates a transducer from two FSTs, compiling string arguments into FSTs if necessary. The result is a cross-product of the two input FSTs, interpreted as acceptors. Specifically, the transducer is constructed by mapping output arcs in the first FST to epsilon, mapping the input arcs in the second FST to epsilon, then concatenating the two. In the case that both FSTs are strings, the resulting transducer will simply contain the input and output string converted into labels, and the shorter of the two will be padded with epsilons.

As with acceptor, the user may specify a final weight using the weight keyword argument; the final weight is 1. The user also specify the desired arc type using the arc_type keyword argument. The user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer, if strings are passed in place of FST arguments. By default (input_token_type="byte" and output_token_type="byte"), each arc corresponds to a byte in the input and/or output string.

Whereas transducer is used to construct the cross-product of two strings, the union of many such string pair cross-products is compiled using the string_map and string_file functions. The former takes an iterable of pairs of strings (or a string-to-string dictionary) as an argument; the latter reads string pairs from two-column TSV file. Both functions use a compact prefix tree representation which can be used to represent a (multi)map.

As with the above, the user may specify the desired arc type using the arc_type keyword argument. And as with transducer, the user may also specify how the characters in the input and/or output strings are to be translated into labels of the transducer using the appropriate keyword arguments.

Worked examples

The following examples, taken from Gorman 2016 and Gorman & Sproat 2016, show features of Pynini in action.

Finnish vowel harmony

Koskenniemi (1983) provides a number of manually-compiled FSTs modeling Finnish morphophonological patterns. One of these concerns the well-known pattern of Finnish vowel harmony. Many Finnish suffixes have two allomorphs differing only in the backness specification of their vowel. For example, the adessive suffix is usually realized as -llä [lːæː] except when the preceding stem contains one of u, o, and a and there is no intervening y, ö, or ä; in this case, it is -lla [lːɑː]. For example, käde 'hand' has an adessive kädellä, whereas vero 'tax' forms the adessive verolla because the nearest stem vowel is o (Ringen & Heinämäki 1999).

The following gives a Pynini function make_adessive which generates the appropriate form of the noun stem. It first concatenates the stem with an abstract representation of the suffix, and then composes the result with a context-dependent rewrite rule adessive_harmony.

back_vowel = pynini.union("u", "o", "a")
neutral_vowel = pynini.union("i", "e")
front_vowel = pynini.union("y", "ö", "ä")
vowel = pynini.union(back_vowel, neutral_vowel, front_vowel)
archiphoneme = pynini.union("A", "I", "E", "O", "U")
consonant = pynini.union("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n",
                         "p", "q", "r", "s", "t", "v", "w", "x", "z")
sigma_star = pynini.union(vowel, consonant, archiphoneme).closure()
adessive = "llA"
intervener = pynini.union(consonant, neutral_vowel).closure()
adessive_harmony = (pynini.cdrewrite(pynini.transducer("A", "a"),
                                     back_vowel + intervener, "", sigma_star) *
                    pynini.cdrewrite(pynini.t("A", "ä"), "", "", sigma_star)
                   ).optimize()


def make_adessive(stem):
  return ((stem + adessive) * adessive_harmony).stringify()

Pluralization rules

Consider an application where one is generating text such as, "The current temperature in New York is 25 degrees". Typically one would do this from a template such as the following:

The current temperature in __ is __ degrees

This works fine if one fills in the first blank with the name of a location and the second blank with a number. But what if it’s really cold in New York and the thermometer shows 1° (Fahrenheit)? One does not want this:

The current temperature in New York is 1 degrees

So one needs to have rules that know how to inflect the unit appropriately given the context. The Unicode Consortium has some guidelines for this; they primarily specify how many “plural” forms different languages have and in which circumstances one uses each. From a linguistic point of view the specifications are sometimes nonsensical, but they are widely used and are useful for adding support for new languages.

For handling degrees, we can assume that the default is the plural form, in which case we must singularize the plural form in certain contexts. For the word degrees and many other cases, it’s just a matter of stripping off the final s, but for a word ending in -ches (such as inches) one would want to strip off the es, and for feet and pence the necessary changes are irregular (foot, penny).

One could of course take care of this with a special purpose Python function, but here we consider how simple this is in Pynini. We start by defining singular_map to handle irregular cases, plus the general cases:

singular_map = pynini.union(
    pynini.transducer("feet", "foot"),
    pynini.transducer("pence", "penny"),
    # Any sequence of bytes ending in "ches" strips the "es";
    # the last argument -1 is a "weight" that gives this analysis
    # a higher priority, if it matches the input.
    sigma_star + pynini.transducer("ches", "ch", -1),
    # Any sequence of bytes ending in "s" strips the "s".
    sigma_star + pynini.transducer("s", ""))

Then as before we can define a context-dependent rewrite rule that performs the singularization just in case the unit is preceded by 1. We define the right context for the application of the rule rc as punctuation, space, or [EOS], a special symbol for the end-of-string.

rc = pynini.union(".", ",", "!", ";", "?", " ", "[EOS]")
singularize = pynini.cdrewrite(singular_map, " 1 ", rc, sigma_star)

Then we can define a convenience function to apply this rule...

def singularize(string):
  return pynini.shortestpath(
      pynini.compose(string.strip(), singularize)).stringify()

...and use it like so:

>>> singularize("The current temperature in New York is 1 degrees")
"The current temperature in New York is 1 degree"
>>> singularize("That measures 1 feet")
"That measures 1 foot"
>>> singularize("That measures 1.2 feet")
"That measures 1.2 feet"
>>> singularize("That costs just 1 pence")
"That costs just 1 penny"

One can handle other languages by changing the rules appropriately—for example, in Russian, which has http://www.unicode.org/cldr/charts/29/supplemental/language_plural_rules.html[more complex pluralization rules]], one needs several different forms, not just singular and plural—and by changing the conditions under which the various rules apply.

T9 disambiguation

T9 (short for "Text on 9 keys"; Grover et al. 1998) is a patented predictive text entry system. In T9, each character in the "plaintext" alphabet is assigned to one of the 9 digit keys (0 is usually reserved to represent space) of the traditional 3x4 touch-tone phone grid. For instance, the message GO HOME is entered as 4604663. Since the resulting "ciphertext" may be highly ambiguous—this sequence could also be read as GO HOOD or as many nonsensical expressions—a hybrid language model/lexicon is used for decoding.

The following snippet implements T9 encoding and decoding in Pynini:

LM = pynini.Fst.read("charlm.fst")
T9_ENCODER = pynini.string_file("t9.tsv").closure()
T9_DECODER = pynini.invert(T9_ENCODER)


def encode_string(plaintext):
  return (plaintext * T9_ENCODER).stringify()


def k_best(ciphertext, k):
  lattice = (ciphertext * T9_DECODER).project(True) * LM
  return pynini.shortestpath(lattice, nshortest=k, unique=True)

The first line reads a language model (LM), represented as a WFSA. The second line reads the encoder table from a TSV file: in this file, each line contains an alphabetic character and the corresponding digit key. By computing the concatenative closure of this map, one obtains an FST T9_ENCODER which can encode arbitrarily-long plaintext strings. The encode_string function applies this FST to arbitrary plaintext strings: application here refers to composition of a string with a transducer followed by output projection. The k_best function first applies a ciphertext string—a bytestring of digits—to the inverse of the encoder FST (T9_DECODER). This creates an intermediate lattice of all possible plaintexts consistent with the T9 ciphertext. This is then scored with—that is, composed with—the character LM. Finally, this function returns the k highest probability plaintexts in the lattice. For the following example, the highest probability plaintext is in fact the correct one:

pt = "THE SINGLE MOST POPULAR CHEESE IN THE WORLD"
ct = encode_string(pt)
print "CIPHERTEXT:", ct
print "DECIPHERED PLAINTEXT:", k_best(ct, 1).stringify()
---+ Output below...

CIPHERTEXT: 8430746453066780767852702433730460843096753
DECIPHERED PLAINTEXT: THE SINGLE MOST POPULAR CHEESE IN THE WORLD

<--

API reference {#api}

This section covers classes (representing FSTs, weights, and the like), their methods (representing various accessors as well as destructive FST algorithms), and functions (representing constructive FST algorithms, weight arithmetic, and the like).

Objects {#objects}

Arc

An FST arc is essentially a tuple of input label, output label, weight, and the next state ID.

Constructor

===none Arc(ilabel, olabel, weight, nextstate)

This class represents an arc while remaining agnostic about the underlying arc type. Attributes of the arc can be accessed or mutated, and the arc can be copied.

Attributes: ilabel: The input label. olabel: The output label. weight: The arc weight. nextstate: The destination state for the arc. =

Methods

(None.)

ArcIterator

An arc iterator is used to iterate through arcs leaving some state in an FST; it is normally constructed by calling the arcs method of an Fst object.

Constructor

===none ArcIterator(ifst, state)

This class is used for iterating over the arcs leaving some state of an FST. It supports the full C++ API, but most users should just call the arcs method of an FST object and take advantage of the Pythonic API. =

Methods

===none done(self)

Indicates whether the iterator is exhausted or not.

Returns: True if the iterator is exhausted, False otherwise. =

===none next(self)

Advances the iterator. =

===none flags(self)

Returns the current iterator behavioral flags.

Returns: The current iterator behavioral flags as an integer. =

===none position(self)

Returns the position of the iterator.

Returns: The iterator's position, expressed as an integer. =

===none reset(self)

Resets the iterator to the initial position. =

===none seek(self, a)

Advance the iterator to a new position.

Args: a: The position to seek to. =

===none set_flags(self, flags, mask)

Sets the current iterator behavioral flags.

Args: flags: The properties to be set. mask: A mask to be applied to the flags argument before setting them. =

===none value(self)

Returns the current arc.

Returns: The current arc. =

EncodeMapper

An encode mapper is used to "encode" and "decode" an FST; it is frequently used to apply FST algorithms while viewing the input FST as unweighted, as an acceptor, or both.

Constructor

===none EncodeMapper(arc_type="standard", encode_labels=False, encode_weights=False)

Arc encoder class, wrapping EncodeMapperClass.

This class provides an object which can be used to encode or decode FST arcs. This is most useful to convert an FST to an unweighted acceptor, on which some FST operations are more efficient, and then decoding the FST afterwards.

To use an instance of this class to encode or decode a mutable FST, pass it as the first argument to the FST instance methods encode and decode.

For implementational reasons, it is not currently possible to use an encoder on disk to construct this class.

Args: arc_type: A string indicating the arc type. encode_labels: Should labels be encoded? encode_weights: Should weights be encoded? =

Methods

===none arc_type(self)

Returns a string indicating the arc type. =

===none flags(self)

Returns the encoder's flags. =

===none input_symbols(self)

Returns the encoder's input symbol table, or None if none is present. =

===none output_symbols(self)

Returns the encoder's output symbol table, or None if none is present. =

===none properties(self, mask)

Provides property bits.

This method provides user access to the properties of the encoder.

Args: mask: The property mask to be compared to the encoder's properties.

Returns: A 64-bit bitmask representing the requested properties. =

===none set_input_symbols(self, syms)

Sets the encoder's input symbol table.

Args: syms: A SymbolTable. =

===none set_output_symbols(self, syms)

Sets the encoder's output symbol table.

Args: syms: A SymbolTable. =

= weight_type(self)

Returns a string indicating the weight type. =

Far

A FAR is an "FST archive": a file used to store multiple FSTs. Each FST in a FAR is stored with a string key.

The Far object behaves similarly to a Python file handle: it can be opened for reading, or for writing. When opened for reading, it behaves similarly to an iterator. FSTs can be written or read using Python's [] syntax, as well as via instance methods. A Far can also be used as part of a [PEP-343 context guard](https://www.python.org/dev/peps/pep-0343/).

Constructor

===none Far(filename, mode="r", arc_type="standard", far_type="default")

Pynini FAR ("Fst ARchive") object.

This class is used to either read FSTs from or write FSTs to a FAR. When opening a FAR for writing, the user may also specify the desired arc type and FAR type.

Args: filename: A string indicating the filename. mode: FAR IO mode; one of: "r" (open for reading), "w" (open for writing). arc_type: Desired arc type; this is ignored if the FAR is opened for reading. far_type: Desired FAR type; this is ignored if the FAR is opened for reading. =

Methods

===none arc_type(self)

Returns a string indicating the arc type. =

===none closed(self)

Indicates whether the FAR is closed for IO. =

===none far_type(self)

Returns a string indicating the FAR type. =

===none mode(self)

Returns a char indicating the FAR's current mode. =

===none name(self)

Returns the FAR's filename. =

Methods when opened for reading (mode="r")

===none done(self)

Indicates whether the iterator is exhausted or not.

Returns: True if the iterator is exhausted, False otherwise. =

===none find(self, key)

Sets the current position to the first entry greater than or equal to the key (a string) and indicates whether or not a match was found.

Args: key: A string key.

Returns: True if the key was found, False otherwise. =

===none get_fst(self)

Returns the FST at the current position.

Returns: A copy of the FST at the current position. =

===none get_key(self)

Returns the string key at the current position.

Returns: The string key at the current position. =

===none next(self)

Advances the iterator. =

===none reset(self)

Resets the iterator to the initial position. =

Methods when opened for writing (mode="w")

===none add(self, key, fst)

Adds an FST to the FAR (when open for writing).

This methods adds an FST to the FAR which can be retrieved with the specified string key.

This method is provided for compatibility with the C++ API only; most users should use the Pythonic API.

Args: key: The string used to key the input FST. fst: The FST to write to the FAR.

Raises: FstOpError: Cannot invoke method in current mode. FstOpError: Incompatible or invalid arc type. =

===none close(self)

Closes the FAR and flushes to disk (when open for writing).

Raises: FstOpError: Cannot invoke method in current mode. =

Fst

This class represents a mutable, expanded FST. In addition to its constructor, it can be created using the [string compilation function](#acceptor) acceptor, [the cross-product construction functions](#transducer) transducer, string_map, and string_file, and the [epsilon machine function](#manual) epsilon_machine, as well as by the various constructive [algorithms](#algorithms).

Constructors

===none Fst(arc_type="standard")

This class wraps a mutable FST and exposes all destructive methods.

Args: arc_type: An optional string indicating the arc type for the FST. =

===none Fst.read(filename)

Reads an FST from a file.

Args: filename: The string location of the input file.

Returns: An FST.

Raises: FstIOError: Read failed. FstOpError: Read-time conversion failed. =

===none Fst.from_pywrapfst(ifst)

Constructs a Pynini FST from a pywrapfst._Fst.

This class method converts an FST from the pywrapfst module (pywrapfst._Fst or its subclass pywrapfst._MutableFst) to a Pynini.Fst. This is essentially a downcasting operation which grants the object additional instance methods, including an enhanced closure, paths, and stringify. The input FST is not invalidated, but mutation of the input or output object while the other is still in scope will trigger a deep copy.

Args: ifst: Input FST of type pywrapfst._Fst.

Returns: An FST. =

Methods

Note that many methods here are "destructive" in that they mutate the underlying FST. Wherever possible, such methods return self to allow [method chaining](https://en.wikipedia.org/wiki/Method_chaining).

===none add_arc(self, state, arc)

Adds a new arc to the FST and return self.

Args: state: The integer index of the source state. arc: The arc to add.

Returns: self.

Raises: FstIndexError: State index out of range. FstOpdexError: Incompatible or invalid weight type. =

===none add_state(self)

Adds a new state to the FST and returns the state ID.

Returns: The integer index of the new state. =

===none arc_type(self)

Returns a string indicating the arc type. =

===none arcs(self, state)

Returns an iterator over arcs leaving the specified state.

Args: state: The source state ID.

Returns: An ArcIterator. =

arcsort-method

===none arcsort(self, sort_type="ilabel")

Sorts arcs leaving each state of the FST.

This operation destructively sorts arcs leaving each state using either input or output labels.

Args: sort_type: Either "ilabel" (sort arcs according to input labels) or "olabel" (sort arcs according to output labels).

Returns: self.

Raises: FstArgError: Unknown sort type. =

See also: [ArcSort algorithm](http://www.openfst.org/twiki/bin/view/FST/ArcSortDoc).

closure-method

===none closure(self, lower) closure(self, lower, upper)

Computes concatenative closure.

This operation destructively converts the FST to its concatenative closure. If A transduces string x to y with weight w, then the zero-argument form A.closure() mutates A so that it transduces between empty strings with weight 1, transduces string x to y with weight w, transduces xx to yy with weight w otimes w, string xxx to yyy with weight w otimes w otimes w (and so on).

When called with two optional positive integer arguments, these act as lower and upper bounds, respectively, for the number of cycles through the original FST that the mutated FST permits. Therefore, A.closure(0, 1) mutates A so that it permits 0 or 1 cycles; i.e., the mutated A transduces between empty strings or transduces x to y.

When called with one optional positive integer argument, this argument acts as the lower bound, with the upper bound implicitly set to infinity. Therefore, A.closure(1) performs a mutation roughly equivalent to A.closure() except that the former does not transduce between empty strings.

The following are the equivalents for the closure-style syntax used in Perl-style regular expressions:

Regexp: This method: Copy shortcuts:

/x?/ x.closure(0, 1) x.ques /x*/ x.closure() x.star /x+/ x.closure(1) x.plus /x{N}/ x.closure(N, N) /x{M,N}/ x.closure(M, N) /x{N,}/ x.closure(N) /x{,N}/ x.closure(0, N)

Args: lower: lower bound. upper: upper bound.

Returns: self. =

See also: [Closure algorithm](http://www.openfst.org/twiki/bin/view/FST/ClosureDoc), [Repeat algorithm](http://go/fst-operators#repeat).

concat-method

===none concat(self, ifst)

Computes the concatenation (product) of two FSTs.

This operation destructively concatenates the FST with a second FST. If A transduces string x to y with weight a and B transduces string w to v with weight b, then their concatenation transduces string xw to yv with weight a otimes b.

Args: ifst: The second input Fst.

Returns: self.

Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict without relabeling. =

See also: [Concat algorithm](http://www.openfst.org/twiki/bin/view/FST/ConcatDoc).

connect-method

===none connect(self)

Removes unsuccessful paths.

This operation destructively trims the FST, removing states and arcs that are not part of any successful path.

Returns: self. =

See also: [Connect algorithm](http://www.openfst.org/twiki/bin/view/FST/ConnectDoc).

===none copy(self)

Makes a copy of the FST. =

decode-method

===none decode(self, encoder)

Decodes encoded labels and/or weights.

This operation reverses the encoding performed by encode.

Args: encoder: An EncodeMapper object used to encode the FST.

Returns: self. =

See also: [Encode and Decode algorithms](http://www.openfst.org/twiki/bin/view/FST/EncodeDecodeDoc).

===none delete_arcs(self, state, n=0)

Deletes arcs leaving a particular state.

Args: state: The integer index of a state. n: An optional argument indicating how many arcs to be deleted. If this argument is omitted or passed as zero, all arcs from this state are deleted.

Returns: self.

Raises: FstIndexError: State index out of range. =

===none delete_states(self, states=None)

Deletes states.

Args: states: An optional iterable of integer indices of the states to be deleted. If this argument is omitted, all states are deleted.

Returns: self.

Raises: FstIndexError: State index out of range.

See also: delete_arcs. =

===none draw(self, filename, isymbols=None, osymbols=None, ssymbols=None, acceptor=False, title="", width=8.5, height=11, portrait=False, vertical=False, ranksep=0.4, nodesep=0.25, fontsize=14, precision=5, float_format="g", show_weight_one=False):

Writes out the FST in Graphviz text format.

This method writes out the FST in the dot graph description language. The graph can be rendered using the dot executable provided by Graphviz.

Args: filename: The string location of the output dot/Graphviz file. isymbols: An optional symbol table used to label input symbols. osymbols: An optional symbol table used to label output symbols. ssymbols: An optional symbol table used to label states. acceptor: Should the figure be rendered in acceptor format if possible? title: An optional string indicating the figure title. width: The figure width, in inches. height: The figure height, in inches. portrait: Should the figure be rendered in portrait rather than landscape? vertical: Should the figure be rendered bottom-to-top rather than left-to-right? ranksep: The minimum separation separation between ranks, in inches. nodesep: The minimum separation between nodes, in inches. fontsize: Font size, in points. precision: Numeric precision for floats, in number of chars. float_format: One of: 'e', 'f' or 'g'. show_weight_one: Should weights equivalent to semiring One be printed? =

See also: [Printing, Drawing, and summarizing FSTs](http://openfst.org/twiki/bin/view/FST/FstQuickTour#Printing_Drawing_and_Summarizing).

encode-method

===none encode(self, encoder)

Encodes labels and/or weights.

This operation allows for the representation of a weighted transducer as a weighted acceptor, an unweighted transducer, or an unweighted acceptor by considering the pair (input label, output label), the pair (input label, weight), or the triple (input label, output label, weight) as a single label. Applying this operation mutates the EncodeMapper argument, which can then be used to decode.

Args: encoder: An EncodeMapper object to be used as the encoder.

Returns: self. =

See also: [Encode and Decode algorithms](http://www.openfst.org/twiki/bin/view/FST/EncodeDecodeDoc).

===none final(self, state)

Returns the final weight of a state.

Args: state: The integer index of a state.

Returns: The final Weight of that state.

Raises: FstIndexError: State index out of range. =

===none fst_type(self)

Returns a string indicating the FST type. =

===none input_symbols(self)

Returns the FST's input symbol table, or None if none is present. =

invert-method

===none invert(self)

Inverts the FST's transduction.

This operation destructively inverts the FST's transduction by exchanging input and output labels.

Returns: self. =

See also: [Invert algorithm](http://www.openfst.org/twiki/bin/view/FST/InvertDoc).

minimize-method

===none minimize(self, delta=0.0009765625, allow_nondet=False)

Minimizes the FST.

This operation destructively performs the minimization of deterministic weighted automata and transducers. If the input FST A is an acceptor, this operation produces the minimal acceptor B equivalent to A, i.e. the acceptor with a minimal number of states that is equivalent to A. If the input FST A is a transducer, this operation internally builds an equivalent transducer with a minimal number of states. However, this minimality is obtained by allowing transition having strings of symbols as output labels, this known in the litterature as a real-time transducer. Such transducers are not directly supported by the library. This function will convert such transducer by expanding each string-labeled transition into a sequence of transitions. This will results in the creation of new states, hence losing the minimality property.

Args: delta: Comparison/quantization delta. allow_nondet: Attempt minimization of non-deterministic FST?

Returns: self. =

See also: [Minimize algorithm](http://www.openfst.org/twiki/bin/view/FST/MinimizeDoc).

===none mutable_arcs(self, state)

Returns a mutable iterator over arcs leaving the specified state.

Args: state: The source state ID.

Returns: A MutableArcIterator. =

===none mutable_input_symbols(self)

Returns the FST's (mutable) input symbol table, or None if none is present. =

===none mutable_output_symbols(self)

Returns the FST's (mutable) output symbol table, or None if none is present. =

===none num_arcs(self, state)

Returns the number of arcs leaving a state.

Args: state: The integer index of a state.

Returns: The number of arcs leaving that state.

Raises: FstIndexError: State index out of range. =

===none num_input_epsilsons(self, state)

Returns the number of arcs with epsilon input labels leaving a state.

Args: state: The integer index of a state.

Returns: The number of epsilon-input-labeled arcs leaving that state.

Raises: FstIndexError: State index out of range. =

===none num_output_epsilsons(self, state)

Returns the number of arcs with epsilon output labels leaving a state.

Args: state: The integer index of a state.

Returns: The number of epsilon-output-labeled arcs leaving that state.

Raises: FstIndexError: State index out of range. =

===none num_states(self)

Returns the number of states. =

===none optimize(self, compute_props=False)

Performs a generic optimization of the FST.

This operation destructively optimizes the FST using epsilon-removal, arc-sum mapping, determinization, and minimization (where possible). The algorithm is as follows:

  • If the FST is not (known to be) epsilon-free, perform epsilon-removal.
  • Combine identically labeled multi-arcs and sum their weights.
  • If the FST does not have idempotent weights, halt.
  • If the FST is not (known to be) deterministic: - If the FST is a (known) acceptor: * If the FST is not (known to be) unweighted and/or acyclic, encode weights. - Otherwise, encode labels and, if the FST is not (known to be) unweighted, encode weights. - Determinize the FST.
  • Minimize the FST.
  • Decode the FST and combine identically-labeled multi-arcs and sum their weights, if the FST was previously encoded.

By default, FST properties are not computed if they are not already set.

This optimization may result in a reduction of size (due to epsilon-removal, arc-sum mapping, and minimization) and possibly faster composition, but determinization (a prerequisite of minimization) may result in an exponential blowup in size in the worst case. Judicious use of optimization is a bit of a black art.

Args: compute_props: Should unknown FST properties be computed to help choose appropriate optimizations?

Returns: self. =

See also: [Optimize algorithm](http://go/fst-optimize).

===none output_symbols(self)

Returns the FST's output symbol table, or None if none is present. =

===none paths(self, token_type="byte)

Creates iterator over all string paths in an acyclic FST.

This method returns an iterator over all paths (represented as pairs of strings and an associated path weight) through an acyclic FST. This operation is only feasible when the FST is acyclic. Depending on the requested token type, the arc labels along the input and output sides of a path are interpreted as UTF-8-encoded Unicode strings, raw bytes, or a concatenation of string labels from a symbol table.

Args: input_token_type: A string indicating how the input strings are to be constructed from arc labels---one of: "byte" (interprets arc labels as raw bytes), "utf8" (interprets arc labels as Unicode code points), or "symbol" (interprets arc labels using the input symbol table). output_token_type: A string indicating how the output strings are to be constructed from arc labels---one of: "byte" (interprets arc labels as raw bytes), "utf8" (interprets arc labels as Unicode code points), or "symbol" (interprets arc labels using the input symbol table). rm_epsilon: Should epsilons be removed?

Raises: FstArgError: Unknown token type. FstArgError: FST is not acyclic. =

See also: [StringPaths algorithm](http://go/fst-paths#string-paths).

project-method

===none project(self, project_output=False)

Converts the FST to an acceptor using input or output labels.

This operation destructively projects an FST onto its domain or range by either copying each arc's input label to its output label (the default) or vice versa.

Args: project_output: Should the output labels be projected?

Returns: self. =

See also: [Project algorithm](http://www.openfst.org/twiki/bin/view/FST/ProjectDoc).

===none properties(self, mask)

Provides property bits.

This method provides user access to the properties of the encoder.

Args: mask: The property mask to be compared to the encoder's properties.

Returns: A 64-bit bitmask representing the requested properties. =

See also: [FST properties](#properties).

prune-method

===none prune(self, delta=0.0009765625, nstate=-1, weight=None)

Removes paths with weights below a certain threshold.

This operation deletes states and arcs in the input FST that do not belong to a successful path whose weight is no more (w.r.t the natural semiring order) than the threshold t otimes-times the weight of the shortest path in the input FST. Weights must be commutative and have the path property.

Args: delta: Comparison/quantization delta. nstate: State number threshold. weight: A Weight or weight string indicating the desired weight threshold below which paths are pruned; if omitted, no paths are pruned.

Returns: self. =

See also: [Prune algorithm](http://www.openfst.org/twiki/bin/view/FST/PruneDoc).

push-method

===none push(self, delta=0.0009765625, remove_total_weight=False, to_final=False)

Pushes weights towards the initial or final states.

This operation destructively produces an equivalent transducer by pushing the weights towards the initial state or toward the final states. When pushing weights towards the initial state, the sum of the weight of the outgoing transitions and final weight at any non-initial state is equal to one in the resulting machine. When pushing weights towards the final states, the sum of the weight of the incoming transitions at any state is equal to one. Weights need to be left distributive when pushing towards the initial state and right distributive when pushing towards the final states.

Args: delta: Comparison/quantization delta. remove_total_weight: If pushing weights, should the total weight be removed? to_final: Push towards final states?

Returns: self. =

See also: [Push algorithm](http://www.openfst.org/twiki/bin/view/FST/PushDoc).

relabel_pairs-method

===none relabel_pairs(self, ipairs=None, opairs=None)

Replaces input and/or output labels using pairs of labels.

This operation destructively relabels the input and/or output labels of the FST using pairs of the form (old_ID, new_ID); omitted indices are identity-mapped.

Args: ipairs: An iterable containing (older index, newer index) integer pairs. opairs: An iterable containing (older index, newer index) integer pairs.

Returns: self.

Raises: FstArgError: No relabeling pairs specified. =

See also: [Relabel algorithm](http://www.openfst.org/twiki/bin/view/FST/RelabelDoc).

relabel_tables-method

===none relabel_tables(self, old_isymbols=None, new_isymbols=None, unknown_isymbol="", attach_new_isymbols=True, old_osymbols=None, new_osymbols=None, unknown_osymbol="", attach_new_osymbols=True)

Replaces input and/or output labels using SymbolTables.

This operation destructively relabels the input and/or output labels of the FST using user-specified symbol tables; omitted symbols are identity-mapped.

Args: old_isymbols: The old SymbolTable for input labels, defaulting to the FST's input symbol table. new_isymbols: A SymbolTable used to relabel the input labels unknown_isymbol: Input symbol to use to relabel OOVs (if empty, OOVs raise an exception) attach_new_isymbols: Should new_isymbols be made the FST's input symbol table? old_osymbols: The old SymbolTable for output labels, defaulting to the FST's output symbol table. new_osymbols: A SymbolTable used to relabel the output labels. unknown_osymbol: Outnput symbol to use to relabel OOVs (if empty, OOVs raise an exception) attach_new_isymbols: Should new_osymbols be made the FST's output symbol table?

Returns: self.

Raises: FstArgError: No SymbolTable specified. =

See also: [Relabel algorithm](http://www.openfst.org/twiki/bin/view/FST/RelabelDoc).

===none reserve_arcs(self, state, n)

Reserve n arcs at a particular state (best effort).

Args: state: The integer index of a state. n: The number of arcs to reserve.

Returns: self.

Raises: FstIndexError: State index out of range. =

===none reserve_states(self, n)

Reserve n states (best effort).

Args: n: The number of states to reserve.

Returns: self. =

reweight-method

===none reweight(self, potentials, to_final=False)

Reweights an FST using an iterable of potentials.

This operation destructively reweights an FST according to the potentials and in the direction specified by the user. An arc of weight w, with an origin state of potential p and destination state of potential q, is reweighted by p^{-1} otimes (w otimes q) when reweighting towards the initial state, and by (p otimes w) otimes q^{-1} when reweighting towards the final states. The weights must be left distributive when reweighting towards the initial state and right distributive when reweighting towards the final states (e.g., TropicalWeight and LogWeight).

Args: potentials: An iterable of Weight or weight strings. to_final: Push towards final states?

Returns: self. =

See also: [Reweight algorithm](http://www.openfst.org/twiki/bin/view/FST/ReweightDoc).

rmepsilon-method

===none rmepsilon(self, connect=True, delta=0.0009765625, nstate=-1, weight=None)

Removes epsilon transitions.

This operation destructively removes epsilon transitions, i.e., those where both input and output labels are epsilon) from an FST.

Args: connect: Should output be trimmed? delta: Comparison/quantization delta. nstate: State number threshold. weight: A Weight or weight string indicating the desired weight threshold below which paths are pruned; if omitted, no paths are pruned.

Returns: self. =

See also: [RmEpsilon algorithm](http://www.openfst.org/twiki/bin/view/FST/RmEpsilonDoc).

===none set_final(self, state, weight)

Sets the final weight for a state.

Args: state: The integer index of a state. weight: A Weight or weight string indicating the desired final weight; if omitted, it is set to semiring One.

Raises: FstIndexError: State index out of range. FstOpError: Incompatible or invalid weight. =

===none set_input_symbols(self, syms)

Sets the input symbol table.

Passing None as a value will delete the input symbol table.

Args: syms: A SymbolTable.

Returns: self. =

===none set_output_symbols(self, syms)

Sets the output symbol table.

Passing None as a value will delete the output symbol table.

Args: syms: A SymbolTable.

Returns: self. =

===none set_properties(self, props, mask)

Sets the properties bits.

Args: props: The properties to be set. mask: A mask to be applied to the props argument before setting the FST's properties.

Returns: self. =

===none set_start(self, state)

Sets a state to be the initial state.

Args: state: The integer index of a state.

Returns: self.

Raises: FstIndexError: State index out of range. =

===none start(self)

Returns the start state. =

===none states(self)

Returns an iterator over all states in the FST.

Returns: A StateIterator object for the FST. =

===none stringify(self, token_type="byte")

Creates a string from a string FST.

This method returns the string recognized by the FST as a Python byte or Unicode string. This is only well-defined when the FST is an acceptor and a "string" FST (meaning that the start state is numbered 0, and there is exactly one transition from each state i to each state i + 1, there are no other transitions, and the last state is final). Depending on the requested token type, the arc labels are interpreted as a UTF-8-encoded Unicode string, raw bytes, or as a concatenation of string labels from the output symbol table.

The underlying routine reads only the output labels, so if the FST is not an acceptor, it will be treated as the output projection of the FST.

Args: token_type: A string indicating how the string is to be constructed from arc labels---one of: "byte" (interprets arc labels as raw bytes), "utf8" (interprets arc labels as Unicode code points), or a SymbolTable. rm_epsilon: Should epsilons be removed?

Returns: The string corresponding to (an output projection) of the FST.

Raises: FstArgError: FST is not a string. FstArgError: Unknown token type. =

===none text(self, isymbols=None, osymbols=None, ssymbols=None, acceptor=False, show_weight_one=False, missing_sym="")

Produces a human-readable string representation of the FST.

This method generates a human-readable string representation of the FST. The caller may optionally specify SymbolTables used to label input labels, output labels, or state labels, respectively.

Args: isymbols: An optional symbol table used to label input symbols. osymbols: An optional symbol table used to label output symbols. ssymbols: An optional symbol table used to label states. acceptor: Should the FST be rendered in acceptor format if possible? show_weight_one: Should weights equivalent to semiring One be printed? missing_symbol: The string to be printed when symbol table lookup fails.

Returns: A formatted string representing the machine. =

Note: use print f as an alias for print f.text().

===none topsort(self)

Sorts transitions by state IDs.

This operation destructively topologically sorts the FST, if it is acyclic; otherwise it remains unchanged. Once sorted, all transitions are from lower state IDs to higher state IDs

Returns: self. =

See also: [TopSort algorithm](http://www.openfst.org/twiki/bin/view/FST/TopSortDoc).

===none union(self, ifst)

Computes the union (sum) of two FSTs.

This operation destructively computes the union (sum) of two FSTs. If A transduces string x to y with weight a and B transduces string w to v with weight b, then their union transduces x to y with weight a and w to v with weight b.

Args: ifst: The second input FST.

Returns: self.

Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict without relabeling. =

See also: [Union algorithm](http://www.openfst.org/twiki/bin/view/FST/UnionDoc).

===none verify(self)

Verifies that an FST's contents are sane.

Returns: True if the contents are sane, False otherwise. =

Note: use this in combination with assertions in tests or library code.

===none weight_type(self)

Provides the FST's weight type.

Returns: A string representing the weight type. =

===none write(self, filename)

Serializes FST to a file.

This method writes the FST to a file in a binary format.

Args: filename: The string location of the output file.

Raises: FstIOError: Write failed. =

===none write_to_string(self)

Serializes FST to a string.

Returns: A string. =

MPdtParentheses

The MPDT parentheses class stores triples consisting of "push" and "pop" parentheses arc labels and an associated stack label for a [_multi-pushdown transducer_](http://openfst.org/twiki/bin/view/FST/MpdtExtension).

Constructor

===none MPdtParentheses()

Multi-pushdown transducer parentheses class.

This class wraps a vector of pairs of arc labels in which the first label is interpreted as a "push" stack operation and the second represents the corresponding "pop" operation, and an equally sized vector which assigns each pair to a stack. The library currently only permits two stacks (numbered 1 and 2) to be used.

A MPDT is expressed as an (Fst, MPdtParentheses) pair for the purposes of all supported MPDT operations. =

Methods

===none add_triple(self, push, pop, assignment)

Adds a triple of (left parenthesis, right parenthesis, stack assignment) triples to the object.

Args: push: An arc label to be interpreted as a "push" operation. pop: An arc label to be interpreted as a "pop" operation. assignment: An arc label indicating what stack the parentheses pair is assigned to. =

===none copy(self)

Makes a copy of this MPdtParentheses object.

Returns: A deep copy of the MPdtParentheses object. =

===none MPdtParentheses.read(filename)

Reads parentheses/assignment triples from a text file.

This class method creates a new MPdtParentheses object from a pairs of integer labels in a text file.

Args: filename: The string location of the input file.

Returns: A new MPdtParentheses instance.

Raises: FstIOError: Read failed. =

===none write(self, filename)

Writes parentheses triples to a text file.

This method writes the MPdtParentheses object to a text file.

Args: filename: The string location of the output file.

Raises: FstIOError: Write failed. =

MutableArcIterator

A mutable arc iterator is used to iterate through arcs leaving some state in an An arc iterator is used to iterate through arcs leaving some state in the FST; it is normally constructed by calling the mutable_arcs method of an Fst object.

This iterator is invalidated if the underlying FST is mutated _by any other means than the set_value method_.

Constructor

===none MutableArcIterator(ifst, state)

This class is used for iterating over the arcs leaving some state of an FST, also permitting mutation of the current arc. =

Methods

===none done(self)

Indicates whether the iterator is exhausted or not. =

===none flags(self)

Returns the current iterator behavioral flags.

Returns: The current iterator behavioral flags as an integer. =

===none next(self)

Advances the iterator. =

===none position(self)

Returns the position of the iterator.

Returns: The iterator's position, expressed as an integer. =

===none reset(self)

Resets the iterator to the initial position. =

===none seek(self, a)

Advance the iterator to a new position.

Args: a: The position to seek to. =

===none set_flags(self, flags, mask)

Sets the current iterator behavioral flags.

Args: flags: The properties to be set. mask: A mask to be applied to the flags argument before setting them. =

===none set_value(self, arc)

Replace the current arc with a new arc.

Args: arc: The arc to replace the current arc with. =

===none value(self)

Returns the current arc. =

PdtParentheses

The PDT parentheses class stores pairs of "push" and "pop" parentheses arc labels for a [_pushdown transducer_](http://openfst.org/twiki/bin/view/FST/PdtExtension).

Constructor

===none PdtParentheses()

Pushdown transducer parentheses class.

This class wraps a vector of pairs of arc labels in which the first label is interpreted as a "push" stack operation and the second represents the corresponding "pop" operation. When efficiency is desired, the push and pop indices should be contiguous.

A PDT is expressed as an (Fst, PdtParentheses) pair for the purposes of all supported PDT operations. =

Methods

===none add_pair(self, push, pop)

Adds a pair of parentheses to the set.

Args: push: An arc label to be interpreted as a "push" operation. pop: An arc label to be interpreted as a "pop" operation. =

===none copy(self)

Makes a copy of this PdtParentheses object.

Returns: A deep copy of the PdtParentheses object. =

===none PdtParentheses.read(filename)

Reads parentheses pairs from a text file.

This class method creates a new PdtParentheses object from a pairs of integer labels in a text file.

Args: filename: The string location of the input file.

Returns: A new PdtParentheses instance.

Raises: FstIOError: Read failed. =

===none write(self, filename)

Writes parentheses triples to a text file.

This method writes the MPdtParentheses object to a text file.

Args: filename: The string location of the output file.

Raises: FstIOError: Write failed. =

StateIterator

An state iterator is used to iterate through state IDs in an FST; it is normally constructed by calling the states method of an Fst object.

This iterator is invalidated if the underlying FST is mutated.

Constructor

= StateIterator(ifst)

This class is used for iterating over the states in an FST. =

Methods

===none done(self)

Indicates whether the iterator is exhausted or not.

Returns: True if the iterator is exhausted, False otherwise. =

===none next(self)

Advances the iterator. =

===none reset(self)

Resets the iterator to the initial position. =

===none value(self)

Returns the current state index. =

SymbolTable

A symbol table stores a bidirectional mapping between arc labels and "symbols" (strings). They can be attached to FSTs or encoders, or iterated through using a [=SymbolTableIterator=](#symbol-table-iterator).

Constructor

===none SymbolTable(name="")

Mutable SymbolTable class.

This class wraps the library SymbolTable and exposes both const (i.e., access) and non-const (i.e., mutation) methods of wrapped object. =

Methods

===none add_symbol(self, symbol, key=-1)

Adds a symbol to the table and returns the index.

This method adds a symbol to the table. The caller can optionally specify a non-negative integer index for the key.

Args: symbol: A symbol string. key: An index for the symbol (-1 is reserved for "no symbol requested").

Returns: The integer key of the new symbol. =

===none add_table(self, syms)

Adds another SymbolTable to this table.

This method merges another symbol table into the current table. All key values will be offset by the current available key.

Args: syms: A SymbolTable to be merged with the current table. =

===none available_key(self)

Returns an integer indicating the next available key index in the table. =

===none checksum(self)

Returns a string indicating the label-agnostic MD5 checksum for the table. =

===none copy(self)

Returns a mutable copy of the SymbolTable. =

===none find(self, key)

Given a symbol or index, finds the other one.

This method returns the index associated with a symbol key, or the symbol associated with a index key.

Args: key: Either a string or an index.

Returns: If key is a string, the associated index; if key is an integer, the associated symbol.

Raises: KeyError: Key not found. =

===none get_nth_key(self, pos)

Retrieves the integer index of the n-th key in the table.

Args: pos: The n-th key to retrieve.

Returns: The integer index of the n-th key.

Raises: KeyError: index not found. =

===none labeled_checksum(self)

Returns a string indicating the label-dependent MD5 checksum for the table. =

===none member(self, key)

Given a symbol or index, returns whether it is found in the table.

This method returns a boolean indicating whether the given symbol or index is present in the table. If one intends to perform subsequent lookup, it is better to simply call the find method, catching the KeyError.

Args: key: Either a string or an index.

Returns: Whether or not the key is present (as a string or a index) in the table. =

===none name(self)

Returns the symbol table's name. =

===none name(self)

Returns the symbol table's name. =

===none SymbolTable.read(filename)

Reads symbol table from binary file.

This class method creates a new SymbolTable from a symbol table binary file.

Args: filename: The string location of the input binary file.

Returns: A new SymbolTable instance. =

===none SymbolTable.read_text(filename)

Reads symbol table from text file.

This class method creates a new SymbolTable from a symbol table text file.

Args: filename: The string location of the input text file. allow_negative_labels: Should negative labels be allowed? (Not recommended; may cause conflicts).

Returns: A new SymbolTable instance. =

===none write(self, filename)

Serializes symbol table to a file.

This methods writes the SymbolTable to a file in binary format.

Args: filename: The string location of the output file.

Raises: FstIOError: Write failed. =

===none write_text(self, filename)

Writes symbol table to text file.

This method writes the SymbolTable to a file in human-readable format.

Args: filename: The string location of the output file.

Raises: FstIOError: Write failed. =

StringPaths {#string-paths}

The string paths object is used to iterate through paths of an [acyclic](https://en.wikipedia.org/wiki/Directed_acyclic_graph) FST, in the forms of input and output label sequences, input and output strings, and path weights.

See also: [StringPaths algorithm](http://go/fst-paths#string-paths).

Constructor

= StringPaths(fst, token_type="byte", isymbols=None, osymbols=None)

Iterator for string paths in acyclic FST.

This class provides an iterator over all paths (represented as pairs of strings and an associated path weight) through an acyclic FST. This operation is only feasible when the FST is acyclic. Depending on the requested token type, the arc labels along the input and output sides of a path are interpreted as UTF-8-encoded Unicode strings, raw bytes, or a concatenation of string labels from a symbol table. This class is normally created by invoking the paths method of Fst.

Args: input_token_type: A string indicating how the input strings are to be constructed from arc labels---one of: "byte" (interprets arc labels as raw bytes), "utf8" (interprets arc labels as Unicode code points), or a SymbolTable. output_token_type: A string indicating how the output strings are to be constructed from arc labels---one of: "byte" (interprets arc labels as raw bytes), "utf8" (interprets arc labels as Unicode code points), or a SymbolTable. rm_epsilon: Should epsilons be removed?

Raises: FstArgError: Unknown token type. FstArgError: FST is not acyclic. =

Methods

===none done(self)

Indicates whether the iterator is exhausted or not. =

===none error(self)

Indicates whether the StringPaths has encountered an error.

Returns: True if the StringPaths is in an errorful state, False otherwise. =

===none ilabels(self)

Returns the input labels for the current path.

Returns: A list of input labels for the current path. =

===none istring(self)

Returns the current path's input string.

This method is provided for compatibility with the C++ API only; most users should use the Pythonic API.

Returns: The path's input string. =

===none next(self)

Advances the iterator. =

===none iter_istrings(self)

Generates all input strings in the FST.

This method returns a generator over all input strings in the path. The caller is responsible for resetting the iterator if desired.

Yields: All input strings. =

===none iter_ostrings(self)

Generates all output strings in the FST.

This method returns a generator over all output strings in the path. The caller is responsible for resetting the iterator if desired.

Yields: All output strings. =

===none iter_weights(self)

Generates all path weights in the FST.

This method returns a generator over all path weights. The caller is responsible for resetting the iterator if desired.

Yields: All weights. =

===none olabels(self)

Returns the output labels for the current path.

Returns: A list of output labels for the current path. =

===none ostring(self)

Returns the current path's output string.

Returns: The path's output string. =

===none weight(self)

Returns the current path's total weight.

Returns: The path's Weight. =

SymbolTableIterator {#symbol-table-iterator}

A symbol table iterator is used to iterate through label/string pairs stored in a symbol table.

This iterator is invalidated if the underlying symbol table is mutated.

Constructor

===none SymbolTableIterator(syms)

This class is used for iterating over a symbol table. =

Methods

===none done(self)

Indicates whether the iterator is exhausted or not.

Returns: True if the iterator is exhausted, False otherwise. =

===none next(self)

Advances the iterator. =

===none reset(self)

Resets the iterator to the initial position. =

===none symbol(self)

Returns the current symbol string.

This method returns the current symbol string at this point in the table.

Returns: A symbol string. =

===none """ value(self)

Returns the current integer index of the symbol.

Returns: An integer index. =

Weight

The weight class is used to represent weights which can be attached to arcs or final states in an FST. They also support semiring-dependent arithmetic operations via external functions.

Constructors

===none Weight(weight_type, weight_string)

FST weight class.

This class represents an FST weight. When passed as an argument to an FST operation, it should have the weight type of the input FST(s) to said operation.

Args: weight_type: A string indicating the weight type. weight_string: A string indicating the underlying weight.

Raises: FstArgError: Weight type not found. FstBadWeightError: Invalid weight. =

===none NoWeight(weight_type)

Constructs a non-member weight in the semiring. =

===none Weight.One(weight_type)

Constructs semiring One. =

===none Weight.Zero(weight_type)

Constructs semiring zero. =

Methods

===none copy(self)

Returns a copy of the Weight. =

===none type(self)

Returns a string indicating the weight type. =

Fst algorithm functions {#algorithms}

FSTs support the following constructive FST algorithm functions, which take one or more Fst arguments and (usually) return a new Fst.

===none arcmap(ifst, delta=0.0009765625, map_type="identity", weight=None)

Constructively applies a transform to all arcs and final states.

This operation transforms each arc and final state in the input FST using one of the following:

* identity: maps to self. * input_epsilon: replaces all input labels with epsilon. * invert: reciprocates all non-Zero weights. * output_epsilon: replaces all output labels with epsilon. * plus: adds a constant to all weights. * quantize: quantizes weights. * rmweight: replaces all non-Zero weights with 1. * superfinal: redirects final states to a new superfinal state. * times: right-multiplies a constant to all weights. * to_log: converts weights to the log semiring. * to_log64: converts weights to the log64 semiring. * to_standard: converts weights to the tropical ("standard") semiring.

Args: ifst: The input FST. delta: Comparison/quantization delta (ignored unless map_type is quantize). map_type: A string matching a known mapping operation (see above). weight: A Weight or weight string passed to the arc-mapper; this is ignored unless map_type is plus (in which case it defaults to semiring Zero) or times (in which case it defaults to semiring One).

Returns: An FST.

Raises: FstArgError: Unknown map type. =

See also: [ArcMap algorithm](http://www.openfst.org/twiki/bin/view/FST/ArcMapDoc).

===none cdrewrite(tau, lambda, rho, sigma_star, direction="ltr", mode="obl")

Generates a transducer expressing a context-dependent rewrite rule.

This operation compiles a transducer representing a context-dependent rewrite rule of the form

phi -> psi / lambda __ rho

over a finite vocabulary. To apply the resulting transducer, simply compose it with an input string or lattice.

Args: tau: A (weighted) transducer representing phi -> psi. lambda: An unweighted acceptor representing the left context. rho: An unweighted acceptor representing the right context. sigma_star: A cyclic, unweighted acceptor representing the closure over the alphabet. direction: A string specifying the direction of rule application; one of: "ltr" (left-to-right application), "rtl" (right-to-left application), or "sim" (simultaneous application). mode: A string specifying the mode of rule application; one of: "obl" (obligatory application), "opt" (optional application).

Returns: An rewrite rule FST.

Raises: FstArgError: Unknown cdrewrite direction type. FstArgError: Unknown cdrewrite mode type. FstOpError: Operation failed. =

===none compose(ifst1, ifst2, compose_filter="auto", connect=True)

Constructively composes two FSTs.

This operation computes the composition of two FSTs. If A transduces string x to y with weight a and B transduces y to z with weight b, then their composition transduces string x to z with weight a otimes b. The output labels of the first transducer or the input labels of the second transducer must be sorted (or otherwise support appropriate matchers).

Args: ifst1: The first input FST. ifst2: The second input FST. compose_filter: A string matching a known composition filter; one of: "alt_sequence", "auto", "match", "null", "sequence", "trivial". connect: Should output be trimmed?

Returns: A composed FST. =

See also: [Compose algorithm](http://www.openfst.org/twiki/bin/view/FST/ComposeDoc).

===none containment(ifst, sigma_star)

Creates a containment of a transducer with respect to a universal language.

This operation constructs a transducer consisting of the input FST optionally preceded and/or followed by any string from some alphabet.

Args: ifst: The input FST. sigma_star: A cyclic, unweighted acceptor representing the closure over the alphabet.

Returns: An FST.

Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict without relabeling. =

See also: [Containment algorithm](http://go/fst-operators#containment).

===none convert(ifst, fst_type=None)

Constructively converts an FST to a new internal representation.

Args: ifst: The input FST. fst_type: A string indicating the FST type to convert to, or None if no conversion is desired.

Returns: An equivalent Fst converted to the desired FST type. =

===none determinize(ifst, delta=0.0009765625, det_type="functional", nstate=-1, subsequential_label=0, weight=None, incremental_subsequential_label=False)

Constructively determinizes a weighted FST.

This operations creates an equivalent FST that has the property that no state has two transitions with the same input label. For this algorithm, epsilon transitions are treated as regular symbols (cf. rmepsilon).

Args: ifst: The input FST. delta: Comparison/quantization delta. det_type: Type of determinization; one of: "functional" (input transducer is functional), "nonfunctional" (input transducer is not functional) and disambiguate" (input transducer is not functional but only keep the min of ambiguous outputs). nstate: State number threshold. subsequential_label: Input label of arc corresponding to residual final output when producing a subsequential transducer. weight: A Weight or weight string indicating the desired weight threshold below which paths are pruned; if omitted, no paths are pruned. increment_subsequential_label: Increment subsequential when creating several arcs for the residual final output at a given state.

Returns: An equivalent deterministic FST.

Raises: FstArgError: Unknown determinization type. =

See also: [Determinize algorithm](http://www.openfst.org/twiki/bin/view/FST/DeterminizeDoc).

===none disambiguate(ifst, delta=0.0009765625, nstate=-1, subsequential_label=0, weight=None):

Constructively disambiguates a weighted transducer.

This operation disambiguates a weighted transducer. The result will be an equivalent FST that has the property that no two successful paths have the same input labeling. For this algorithm, epsilon transitions are treated as regular symbols (cf. rmepsilon).

Args: ifst: The input FST. delta: Comparison/quantization delta. nstate: State number threshold. subsequential_label: Input label of arc corresponding to residual final output when producing a subsequential transducer. weight: A Weight or weight string indicating the desired weight threshold below which paths are pruned; if omitted, no paths are pruned.

Returns: An equivalent disambiguated FST. =

See also: [Disambiguate algorithm](http://www.openfst.org/twiki/bin/view/FST/DisambiguateDoc).

===none difference(ifst1, ifst2, compose_filter="auto", connect=True)

Constructively computes the difference of two FSTs.

This operation computes the difference between two FSAs. Only strings that are in the first automaton but not in second are retained in the result. The first argument must be an acceptor; the second argument must be an unweighted, epsilon-free, deterministic acceptor. The output labels of the first transducer or the input labels of the second transducer must be sorted (or otherwise support appropriate matchers).

Args: ifst1: The first input FST. ifst2: The second input FST. compose_filter: A string matching a known composition filter; one of: "alt_sequence", "auto", "match", "null", "sequence", "trivial". connect: Should the output FST be trimmed?

Returns: An FST representing the difference of the two FSTs. =

See also: [Difference algorithm](http://www.openfst.org/twiki/bin/view/FST/DifferenceDoc).

===none epsnormalize(ifst, eps_norm_output=False)

Constructively epsilon-normalizes an FST.

This operation creates an equivalent FST that is epsilon-normalized. An acceptor is epsilon-normalized if it is epsilon-removed (cf. rmepsilon). A transducer is input epsilon-normalized if, in addition, along any path, all arcs with epsilon input labels follow all arcs with non-epsilon input labels. Output epsilon-normalized is defined similarly. The input FST must be functional.

Args: ifst: The input FST. eps_norm_output: Should the FST be output epsilon-normalized?

Returns: An equivalent epsilon-normalized FST. =

See also: [EpsNormalize algorithm](http://openfst.org/twiki/bin/view/FST/EpsNormalizeDoc).

===none equal(ifst1, ifst2, delta=0.0009765625)

Are two FSTs equal?

This function tests whether two FSTs have the same states with the same numbering and the same transitions with the same labels and weights in the same order.

Args: ifst1: The first input FST. ifst2: The second input FST. delta: Comparison/quantization delta.

Returns: True if the FSTs satisfy the above condition, else False. =

See also: [Equal algorithm](http://openfst.org/twiki/bin/view/FST/EqualDoc).

===none equivalent(ifst1, ifst2, delta=0.0009765625)

Are the two acceptors equivalent?

This operation tests whether two epsilon-free deterministic weighted acceptors are equivalent, that is if they accept the same strings with the same weights.

Args: ifst1: The first input FST. ifst2: The second input FST. delta: Comparison/quantization delta.

Returns: True if the FSTs satisfy the above condition, else False.

Raises: FstOpError: Equivalence test encountered error. =

See also: [Equivalent algorithm](http://www.openfst.org/twiki/bin/view/FST/EquivalentDoc).

===none intersect(ifst1, ifst2, compose_filter="auto", connect=True)

Constructively intersects two FSTs.

This operation computes the intersection (Hadamard product) of two FSTs. Only strings that are in both automata are retained in the result. The two arguments must be acceptors. One of the arguments must be label-sorted (or otherwise support appropriate matchers).

Args: ifst1: The first input FST. ifst2: The second input FST. compose_filter: A string matching a known composition filter; one of: "alt_sequence", "auto", "match", "null", "sequence", "trivial". connect: Should output be trimmed?

Returns: An intersected FST. =

See also: [Intersect algorithm](http://www.openfst.org/twiki/bin/view/FST/IntersectDoc).

===none leniently_compose(ifst1, ifst2, compose_filter="auto", connect=True)

Constructively leniently-composes two FSTs.

This operation computes the lenient composition of two FSTs. The lenient composition of two FSTs the priority union of their composition and the left-hand side argument, where priority union is simply union in which the left-hand side argument's relations have "priority" over the right-hand side argument's relations.

Args: ifst: The input FST. sigma_star: A cyclic, unweighted acceptor representing the closure over the alphabet. compose_filter: A string matching a known composition filter; one of: "alt_sequence", "auto", "match", "null", "sequence", "trivial". connect: Should output be trimmed?

Returns: A leniently-composed FST.

Raises: FstOpError: Operation failed. FstSymbolTableMergeError: Unable to resolve symbol table conflict without relabeling. =

See also: [LenientlyCompose algorithm](http://go/fst-operators#leniently-compose).

===none prune(ifst, delta=0.0009765625, nstate=-1, weight=None)

Constructively removes paths with weights below a certain threshold.

This operation deletes states and arcs in the input FST that do not belong to a successful path whose weight is no more (w.r.t the natural semiring order) than the threshold t otimes-times the weight of the shortest path in the input FST. Weights must be commutative and have the path property.

Args: ifst: The input FST. delta: Comparison/quantization delta. nstate: State number threshold. weight: A Weight or weight string indicating the desired weight threshold below which paths are pruned; if omitted, no paths are pruned.

Returns: A pruned FST. =

See also: [Prune algorithm](http://openfst.org/twiki/bin/view/FST/PruneDoc).

===none push(ifst, delta=0.0009765625, push_weights=False, push_labels=False, remove_common_affix=False, remove_total_weight=False, to_final=False)

Constructively pushes weights/labels towards initial or final states.

This operation produces an equivalent transducer by pushing the weights and/or the labels towards the initial state or toward the final states.

When pushing weights towards the initial state, the sum of the weight of the outgoing transitions and final weight at any non-initial state is equal to 1 in the resulting machine. When pushing weights towards the final states, the sum of the weight of the incoming transitions at any state is equal to 1. Weights need to be left distributive when pushing towards the initial state and right distributive when pushing towards the final states.

Pushing labels towards the initial state consists in minimizing at every state the length of the longest common prefix of the output labels of the outgoing paths. Pushing labels towards the final states consists in minimizing at every state the length of the longest common suffix of the output labels of the incoming paths.

Args: ifst: The input FST. delta: Comparison/quantization delta. push_weights: Should weights be pushed? push_labels: Should labels be pushed? remove_common_affix: If pushing labels, should common prefix/suffix be removed? remove_total_weight: If pushing weights, should total weight be removed? to_final: Push towards final states?

Returns: An equivalent pushed FST. =

See also: [Push algorithm](http://www.openfst.org/twiki/bin/view/FST/PushDoc).

===none randgen(ifst, npath=1, seed=0, select="uniform", max_length=2147483647, weight=False, remove_total_weight=False)

Randomly generate successful paths in an FST.

This operation randomly generates a set of successful paths in the input FST. This relies on a mechanism for selecting arcs, specified using the select argument. The default selector, "uniform", randomly selects a transition using a uniform distribution. The "log_prob" selector randomly selects a transition w.r.t. the weights treated as negative log probabilities after normalizing for the total weight leaving the state. In all cases, finality is treated as a transition to a super-final state.

Args: ifst: The input FST. npath: The number of random paths to generate. seed: An optional seed value for random path generation; if zero, the current time and process ID is used. select: A string matching a known random arc selection type; one of: "uniform", "log_prob", "fast_log_prob". max_length: The maximum length of each random path. weighted: Should the output be weighted by path count? remove_total_weight: Should the total weight be removed (ignored when weighted is False)?

Returns: An FST containing one or more random paths. =

See also: [RandGen algorithm](http://www.openfst.org/twiki/bin/view/FST/RandGenDoc).

===none reverse(ifst, require_superinitial=True)

Constructively reverses an FST's transduction.

This operation reverses an FST. If A transduces string x to y with weight a, then the reverse of A transduces the reverse of x to the reverse of y with weight a.Reverse(). (Typically, a = a.Reverse() and Arc = RevArc, e.g., TropicalWeight and LogWeight.) In general, e.g., when the weights only form a left or right semiring, the output arc type must match the input arc type.

Args: ifst: The input FST. require_superinitial: Should a superinitial state be created?

Returns: A reversed FST. =

See also: [Reverse algorithm](http://www.openfst.org/twiki/bin/view/FST/ReverseDoc).

===none rmepsilon(ifst, connect=True, delta=0.0009765625, nstate=-1, queue_type="auto", reverse=False, weight=None)

Constructively removes epsilon transitions from an FST.

This operation removes epsilon transitions (those where both input and output labels are epsilon) from an FST.

Args: ifst: The input FST. connect: Should output be trimmed? delta: Comparison/quantization delta. nstate: State number threshold. queue_type: A string matching a known queue type; one of: "auto", "fifo", "lifo", "shortest", "state", "top". reverse: Should epsilon transitions be removed in reverse order? weight: A string indicating the desired weight threshold; paths with weights below this threshold will be pruned.

Returns: An equivalent FST with no epsilon transitions. =

See also: [RmEpsilon algorithm](http://www.openfst.org/twiki/bin/view/FST/RmEpsilonDoc).

===none shortestdistance(ifst, delta=0.0009765625, nstate=-1, queue_type="auto", reverse=False)

Compute the shortest distance from the initial or final state.

This operation computes the shortest distance from the initial state (when reverse is False) or from every state to the final state (when reverse is True). The shortest distance from p to q is the otimes-sum of the weights of all the paths between p and q. The weights must be right (if reverse is False) or left (if reverse is True) distributive, and k-closed (i.e., 1 otimes x otimes x^2 otimes ... otimes x^{k + 1} = 1 otimes x otimes x^2 otimes ... otimes x^k; e.g., TropicalWeight).

Args: ifst: The input FST. delta: Comparison/quantization delta. nstate: State number threshold (this is ignored if reverse is True). queue_type: A string matching a known queue type; one of: "auto", "fifo", "lifo", "shortest", "state", "top" (this is ignored if reverse is True). reverse: Should the reverse distance (from each state to the final state) be computed?

Returns: A list of Weight objects representing the shortest distance for each state. =

See also: [ShortestDistance algorithm](http://www.openfst.org/twiki/bin/view/FST/ShortestDistanceDoc).

===none shortestpath(ifst, delta=0.0009765625, nshortest=1, nstate=-1, queue_type="auto", unique=False, weight=None)

Construct an FST containing the shortest path(s) in the input FST.

This operation produces an FST containing the n-shortest paths in the input FST. The n-shortest paths are the n-lowest weight paths w.r.t. the natural semiring order. The single path that can be read from the ith of at most n transitions leaving the initial state of the resulting FST is the ith shortest path. The weights need to be right distributive and have the path property. They also need to be left distributive as well for n-shortest with n > 1 (e.g., TropicalWeight).

Args: ifst: The input FST. delta: Comparison/quantization delta. nshortest: The number of paths to return. nstate: State number threshold. queue_type: A string matching a known queue type; one of: "auto", "fifo", "lifo", "shortest", "state", "top". unique: Should the resulting FST only contain distinct paths? (Requires the input FST to be an acceptor; epsilons are treated as if they are regular symbols.) weight: A Weight or weight string indicating the desired weight threshold below which paths are pruned; if omitted, no paths are pruned.

Returns: An FST containing the n-shortest paths. =

See also: [ShortestPath algorithm](http://www.openfst.org/twiki/bin/view/FST/ShortestPathDoc).

===none state_map(ifst, map_type)

Constructively applies a transform to all states.

This operation transforms each state according to the requested map type. Note that currently, only one state-mapping operation is supported.

Args: ifst: The input FST. map_type: A string matching a known mapping operation; one of: "arc_sum" (sum weights of identically-labeled multi-arcs), "arc_unique" (deletes non-unique identically-labeled multi-arcs).

Returns: An FST with states remapped.

Raises: FstArgError: Unknown map type. =

See also: [StateMap algorithm](http://www.openfst.org/twiki/bin/view/FST/StateMapDoc).

===none synchronize(ifst)

Constructively synchronizes an FST.

This operation synchronizes a transducer. The result will be an equivalent FST that has the property that during the traversal of a path, the delay is either zero or strictly increasing, where the delay is the difference between the number of non-epsilon output labels and input labels along the path. For the algorithm to terminate, the input transducer must have bounded delay, i.e., the delay of every cycle must be zero.

Args: ifst: The input FST.

Returns: An equivalent synchronized FST. =

See also: [Synchronize algorithm](http://www.openfst.org/twiki/bin/view/FST/SynchronizeDoc).

The following destructive methods of Fst also can be called as constructive functions; like the above, these take one or more Fst arguments and return a new Fst.

  • arcsort
  • closure
  • concat
  • connect
  • decode and encode
  • invert
  • minimize
  • optimize
  • project
  • relabel_pairs
  • relabel_tables
  • reweight
  • topsort

Fst binary operators

The following FST functions have binary operator aliases.

  • compose: *
  • concat: +
  • union: |

Weight arithmetic functions {#arithmetic}

Weights of the same semiring also support the following arithmetic functions. which return a new Weight.

===none plus(lhs, rhs)

Computes the sum of two Weights in the same semiring.

This function computes lhs oplus rhs, raising an exception if lhs and rhs are not in the same semiring.

Args:

lhs
left-hand side Weight.
rhs
right-hand side Weight.

Returns: A Weight object.

Raises: FstArgError: Weight type not found (or not in same semiring). FstBadWeightError: Invalid weight. =

===none times(lhs, rhs)

Computes the product of two Weights in the same semiring.

This function computes lhs otimes rhs, raising an exception if lhs and rhs are not in the same semiring.

Args:

lhs
left-hand side Weight.
rhs
right-hand side Weight.

Returns: A Weight object.

Raises: FstArgError: Weight type not found (or not in same semiring). FstBadWeightError: Invalid weight. =

===none power(lhs, rhs)

Computes the iterated product of a weight.

Args:

w
The weight.
n
The power.

Returns: A Weight object.

Raises: FstArgError: Weight type not found (or not in same semiring). FstBadWeightError: Invalid weight. =

===none divide(lhs, rhs)

Computes the quotient of two Weights in the same semiring.

This function computes lhs oslash rhs, raising an exception if lhs and rhs are not in the same semiring. As there is no way to specify whether to use left vs. right division, this assumes a commutative semiring in which these are equivalent operations.

Args:

lhs
left-hand side Weight.
rhs
right-hand side Weight.

Returns: A Weight object.

Raises: FstArgError: Weight type not found (or not in same semiring). FstBadWeightError: Invalid weight. =

Exceptions

The following exceptions (all errors) are raised by Pynini:

  • FstError: abstract base class for all exceptions raised by Pynini.
  • FstArgError: signals an error parsing input arguments (often in matching string arguments to underlying enum values).
  • FstBadWeightError: signals an error in constructing a weight.
  • FstDeletedConstructorError: signals an attempt to use a constructor which is syntactically-required, but not implemented; should not be encountered by end users.
  • FstIndexError: signals an attempt to access an out-of-range state ID.
  • FstIOError: signals an error in reading or writing from disk.
* FstOpError: signals an error returned by an FST algorithm; usually, this is accompanied by a low-level logging statement.
  • FstStringCompilationError: signals an error compiling a string into an FST.
  • FstSymbolTableMergeError: signals that a symbol table relabeling is required but disabled (using the --fst_relabel_symbol_conflicts command-line flag).

-->

Advanced topics

Getting the hang of iterators

The Fst class provides four types of iterators (plus SymbolTableIterator), which are usually constructed by invoking the appropriate method of Fst. Some examples are given below.

There are two ways to use the various iterators. First, they can be used similarly to Python built-in iterators over file handles or containers by placing them in a =for=-loop, but they also expose a lower-level interface used as follows:

while not iterator.done():
  # Do something with =iterator= in its current state.
  iterator.next()  # Advances the iterator.

Attempting to use an iterator which is "done" may result in a fatal error. To return an iterator to the initial state, one can use the reset method.

The method states creates a StateIterator. When iterated over, it yields state ID integers in strictly increasing order. For example, the following function computes the number of arcs and states of an FST (a simple measure of FST size).

def size(f):
  """Computes the number of arcs and states in the FST."""
  return sum(1 + f.num_arcs(state) for state in f.states())

The method arcs returns an ArcIterator. When iterated over, it yields all arcs leaving some state of the FST. For example, the following function computes the in-degree (number of incoming states) for a given state ID.

def indegree(f, state):
  """Computes in-degree of some state in an FST."""
  n = 0
  for s in f.states():
    for arc in f.arcs(s):
      n += (arc.nextstate == state)
  return n

The method mutable_arcs is similar to arcs, except that the MutableArcIterator it returns has a special method set_value which replaces the arc at the iterator's current position. For example, the following function changes input labels matching a certain label to epsilon (cf. the relabel_pairs method).

def map_ilabel_to_epsilon(f, ilabel):
  """Maps any input label matching =ilabel= to epsilon (0)."""
  for state in f.states():
    aiter = f.mutable_arcs(state)
    while not aiter.done():
      arc = aiter.value()
      if arc.ilabel == ilabel:
        arc.ilabel = 0
        aiter.set_value(arc)
      aiter.next()
  return f

The method paths returns a StringPaths iterator. This provides several ways to access the paths of an acyclic FST. For each path, one can access the associated input or output labels (lists of integers) via the ilabels and olabels methods, the input and output strings via the istring and ostring methods, and the path weight via the weight method. One can also access all input strings, all output strings, and all path weights using the iter_istrings, iter_ostrings, and iter_weights methods, which return generators; after invoking any one of these methods, the caller must reset the iterator to reuse it.

See in-module documentation strings for more information on iterators.

FST properties

Each Fst has an associated set of stored properties which assert facts about the FST. These can be queried by using the properties method. Some properties are binary (either true or false), whereas others are ternary (true, false, or unknown). The first argument to properties is a bitmask for the desired property/properties, and the second argument indicates whether unknown properties are to be computed.

Querying known properties is a constant time operation.

# Optimizes the FST if it is *not known* to be deterministic.
if f.properties(pynini.I_DETERMINISTIC, False) != pynini.I_DETERMINISTIC:
  f.optimize()

Computing unknown properties is linear in the size of the FST in the worst case.

# Optimizes the FST if it is not already deterministic and epsilon-free.
desired_props = pynini.I_DETERMINISTIC | pynini.NO_EPSILONS
if f.properties(desired_props, True) != desired_props:
  f.optimize()

An FST's properties can be set using the set_properties method.

For more information on FST properties, see here.

Creating FSTs manually

Fst (which happens to be the constructor for the Fst class) creates an FST with no states or arcs. Before this FST can be used, one must first add a state, set it as initial, and set it or some other state as final.

f = pynini.Fst()
assert not f.verify()  # OK.
state = f.add_state()
f.set_start(state)
f.set_final(state)
assert f.verify()      # OK.

The epsilon_machine function eliminates the need for this tedious book-keeping; it returns a one-state, no-arc machine.

A much more useful initialization is provided by epsilon_machine, which creates a one-state, no-arc acceptor which matches the empty string.

f = pynini.epsilon_machine()
assert f.num_states() == 0                                       # OK.
assert f.num_arcs(e.start()) == 0                                # OK.
assert f.final(e.start()) == pynini.Weight.One(f.weight_type())  # OK.

In general, attempting to refer to a state before it has been added to the FST will raise an FstIndexError.

f = pynini.epsilon_machine()
f.set_start(1)  # Not OK: Raises FstIndexError.

f = pynini.epsilon_machine()
f.set_final(1)  # Not OK: Raises FstIndexError.

f = pynini.epsilon_machine()
f.add_arc(1, Arc(65, 65, 0, f.start()))  # Not OK: Raises FstIndexError.

The one exception to this generalization is that one may add an arc to a destination state not yet added to the FST, though the FST will not verify until that state is added.

f = pynini.epsilon_machine()
f.add_arc(f.start(), pynini.Arc(65, 65, 0, 1))
assert not f.verify()  # OK, though arc has non-existent destination state ID.
state = f.add_state()  # OK, now that state exists.
f.set_final(state)
assert f.verify()      # OK.

Pushdown transducers and multi-pushdown transducers

Pushdown transducers (PDTs) and multi-pushdown transducers (MPDTs) are generalizations of (W)FSTs.

PDTs represent the class of weighted context-free grammars, and are often used to encode a set of phrase-structure grammar rules. A PDT can be thought of as an WFST in which the machine's memory is augmented by a stack. In Pynini a PDT is encoded as an WFST in which some transitions represent "push" or "pop" operations (represented as open and close parentheses) rather than input and/or output symbols. A path through the PDT is then valid if and only if it is a path through the WFST and the open and close parenthese balance along that path.

MPDTs are a further generalization of PDTs in which there is more than one stack. While these are computationally quite unconstrained, in speech and language applications they are primarily used for modeling full reduplication.

In Pynini, there is no one class representing a PDT or MPDT; rather they are represented as a pair of an Fst and a table of parentheses labels. The PdtParentheses object contains pairs of open/close parentheses labels, and the MPdtParentheses object also contains an integer representing the identity of the stack each parentheses pair is assigned to.

PDTs and MPDT support composition with an FST using the pdt_compose and mpdt_compose functions, respectively. These functions take Fst arguments, a parentheses table (PdtParentheses or MPdtParentheses, respectively), and a specification of whether the right or the left argument should be interpreted as a (M)PDT. Both functions return the FST component of an (M)PDT; the parentheses table can be reused.

(f, fparens) = ipdt
opdt = (pynini.pdt_compose(f, g, fparens), fparens)

The pdt_expand and mpdt_expand functions can be used to expand a PDT or MPDT into an FST (assuming they have a finite expansion). They take an Fst and parentheses table and return an Fst. The boolean keep_parentheses argument species whether or not expansion should keep parentheses arcs; by default, they are removed during expansion.

(f, fparens) = ipdt
g = pynini.pdt_expand(f, fparens)

The pdt_reverse and mpdt_reverse functions can be used to reverse a PDT or MPDT. Both take as arguments the two components of an (M)PDT. The pdt_reverse function return the FST component of a PDT, whereas the mpdt_function also returns a modified MPdtParentheses object.

(f, fparen) = impdt
(g, gparens) = pynini.mpdt_reverse(f, fparens)

Finally, PDTs (but not MPDTs) provide a shortest-path operation using pdt_shortestpath. Note that the result is an FST rather than a PDT.

(f, fparens) = ipdt
sp = pynini.pdt_shortestpath(f, fparens)

References

Gorman, Kyle. 2016. Pynini: A Python library for weighted finite-state grammar compilation. In Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata, 75-80.

Gorman, Kyle, and Richard Sproat. 2016. How to get superior text processing in Python with Pynini. O'Reilly Ideas blog, accessed 11/16/16.

Grover, Dale L., Martin T. King, and Clifford A. Kushler. 1998. Reduced keyboard disambiguating computer. US Patent 5,818,437.

Koskenniemi, Kimmo. 1983. Two-level morphology: A general computational model for word-form recognition and production. Doctoral dissertation, University of Helsinki.

Ringen, Catherine O. and Orvokki Heinämäki. 1999. Variation in Finnish vowel harmony: An OT account. Natural Language and Linguistic Theory 17(2): 303-337.

Roark, Brian, and Richard Sproat. 2007. Computational approaches to syntax and morphology. Oxford: Oxford University Press.

META FILEATTACHMENT attachment="gumball.png" attr="h" comment="Gumball FST" date="1498773831" name="gumball.png" path="gumball.png" size="10299" user="KyleGorman" version="1"
META FILEATTACHMENT attachment="abcde.png" attr="h" comment="" date="1498774898" name="abcde.png" path="abcde.png" size="12107" user="KyleGorman" version="1"
META FILEATTACHMENT attachment="abcgdfe.png" attr="h" comment="" date="1498775087" name="abcgdfe.png" path="abcgdfe.png" size="14050" user="KyleGorman" version="1"
META FILEATTACHMENT attachment="time.png" attr="h" comment="Time grammar fragment" date="1498775174" name="time.png" path="time.png" size="18484" user="KyleGorman" version="1"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback