Difference: MpdtExtension (1 vs. 9)

Revision 92015-10-29 - RichardSproat

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"

Multi-Pushdown Transducer Library (MPDTs)

Line: 6 to 6
 for multiple stacks. As with the PDT extension, an MPDT is encoded as an FST where some transitions are labeled with open or close parentheses, with each mated pair of parentheses being assigned to one of the stacks. To be interpreted as an MPDT, parentheses within a stack must balance on a path.

configure include
Changed:
<
<
--enable-mpdt <fst/extensions/mpdt/mpdtlib.h>
>
>
--enable-mpdt (coming soon) <fst/extensions/mpdt/mpdtlib.h>
  An MPDT is encoded as an FST where some transitions are labeled with open or close parentheses, with each mated pair of parentheses being assigned to one of the stacks. To be interpreted as an MPDT, parentheses within a stack must balance on a path. The subset of the transducer labels that correspond to parenthesis (open, closed) pairs is designated from the C++-library in a vector<pair<Label, Label>>, and the assignments are given as a correponding vector<typename Arc::Label>. From the command line the parentheses and assignments are passed from a file of lines of label pairs, with a third column showing the stack affiliation (in a 1-based enumeration) of each pair
Line: 21 to 21
 
  Compose(a_fst, b_mpdt, parens, assignments, &c_mpdt);    
  mpdtcompose -mpdt_parentheses=mpdt.parens a.mpdt b.fst >c.mpdt    
  mpdtcompose -mpdt_parentheses=mpdt.parens -left_mpdt=false a.fst b.mpdt >c.mpdt    
Changed:
<
<
Expand Expand(a_mpdt, parens, assignments, &b_fst); expands a (bounded-stack) MPDT as an FST Time, Space: O(e(V + E)) for the default 2-stack read-restrict configuration
>
>
Expand Expand(a_mpdt, parens, assignments, &b_fst); expands a (bounded-stack) MPDT as an FST Time, Space: O(eO(V + E)) for the default 2-stack read-restrict configuration
 
  mpdtexpand -mpdt_parentheses=mpdt.parens a.mpdt >b.fst    
Info mpdtinfo -mpdt_parentheses=mpdt.parens a.mpdt prints out information about an MPDT  
Reverse Reverse(a_mpdt, parens, &b_mpdt); reverses an MPDT; also reverses the stack assignments Time, Space: O(V + E)

Revision 82015-10-23 - RichardSproat

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"

Multi-Pushdown Transducer Library (MPDTs)

Line: 12 to 12
 vector<typename Arc::Label>. From the command line the parentheses and assignments are passed from a file of lines of label pairs, with a third column showing the stack affiliation (in a 1-based enumeration) of each pair (using the flag --mpdt_parentheses).
Changed:
<
<
The implementation supports multiple stacks, but the library only exposes two stacks. In addition there are three push-pop protocols, READ_RESTRICT, WRITE_RESTRICT, NO_RESTRICT. With READ_RESTRICT, one can only read (i.e. pop) from the first empty stack. So if the first stack is not empty, one can continue to write (push) to the second stack, but one cannot pop. With WRITE_RESTRICT, one can only write (i.e. push) to the first empty stack. With NO_RESTRICT, as the name implies, there are no restrictions on read-write operations. Note that a NO_RESTRICT machine is Turing-equivalent. The default configuration is READ_RESTRICT.
>
>
The implementation supports multiple stacks, but the library only exposes two stacks. In addition there are three push-pop disciplines, READ_RESTRICT, WRITE_RESTRICT, NO_RESTRICT. With READ_RESTRICT, one can only read (i.e. pop) from the first empty stack. So if the first stack is not empty, one can continue to write (push) to the second stack, but one cannot pop. With WRITE_RESTRICT, one can only write (i.e. push) to the first empty stack. With NO_RESTRICT, as the name implies, there are no restrictions on read-write operations. Note that a NO_RESTRICT machine is Turing-equivalent. The default configuration is READ_RESTRICT.
  The following operations, many which have FST analogues (but are distinguished in C++ by having a vector<pair<Label, Label>> parenthesis pair argument), are provided for MPDTs:

Revision 72015-10-22 - RichardSproat

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"
Changed:
<
<

Pushdown Transducer Library (PDTs)

>
>

Multi-Pushdown Transducer Library (MPDTs)

 
Changed:
<
<
This is a push-down transducer (PDT) extension of the OpenFst library. A PDT is encoded as an FST, where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path.
>
>
The multi-pushdown transducer (MPDT) extension extends the Pushdown Transducer Library extension to allow for multiple stacks. As with the PDT extension, an MPDT is encoded as an FST where some transitions are labeled with open or close parentheses, with each mated pair of parentheses being assigned to one of the stacks. To be interpreted as an MPDT, parentheses within a stack must balance on a path.
 
configure include
Changed:
<
<
--enable-pdt <fst/extensions/pdt/pdtlib.h>
>
>
--enable-mpdt <fst/extensions/mpdt/mpdtlib.h>
 
Changed:
<
<
A PDT is encoded as an FST where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path. The subset of the transducer labels that correspond to parenthesis (open, closed) pairs is designated from the C++-library in a vector<pair<Label, Label>> and from the command line in a file of lines of label pairs (typically passed with the flag --pdt_parentheses). See Cyril Allauzen and Michael Riley, "A Pushdown Transducer Extension for the OpenFst Library", Proceedings of the Seventeenth International Conference on Implementation and Application of Automata, (CIAA 2012)
>
>
An MPDT is encoded as an FST where some transitions are labeled with open or close parentheses, with each mated pair of parentheses being assigned to one of the stacks. To be interpreted as an MPDT, parentheses within a stack must balance on a path. The subset of the transducer labels that correspond to parenthesis (open, closed) pairs is designated from the C++-library in a vector<pair<Label, Label>>, and the assignments are given as a correponding vector<typename Arc::Label>. From the command line the parentheses and assignments are passed from a file of lines of label pairs, with a third column showing the stack affiliation (in a 1-based enumeration) of each pair (using the flag --mpdt_parentheses).
 
Changed:
<
<
The following operations, many which have FST analogues (but are distinguished in C++ by having a vector<pair<Label, Label>> parenthesis pair argument), are provided for PDTs:
>
>
The implementation supports multiple stacks, but the library only exposes two stacks. In addition there are three push-pop protocols, READ_RESTRICT, WRITE_RESTRICT, NO_RESTRICT. With READ_RESTRICT, one can only read (i.e. pop) from the first empty stack. So if the first stack is not empty, one can continue to write (push) to the second stack, but one cannot pop. With WRITE_RESTRICT, one can only write (i.e. push) to the first empty stack. With NO_RESTRICT, as the name implies, there are no restrictions on read-write operations. Note that a NO_RESTRICT machine is Turing-equivalent. The default configuration is READ_RESTRICT.

The following operations, many which have FST analogues (but are distinguished in C++ by having a vector<pair<Label, Label>> parenthesis pair argument), are provided for MPDTs:

 
Operation Usage Description Complexity
Changed:
<
<
Compose Compose(a_pdt, parens, b_fst, &c_pdt); compose a PDT and an FST with PDT result (Bar-Hillel) Same as FST composition
  Compose(a_fst, b_pdt, parens, &c_pdt);    
  pdtcompose -pdt_parentheses=pdt.parens a.pdt b.fst >c.pdt    
  pdtcompose -pdt_parentheses=pdt.parens -pdt_left_pdt=false a.fst b.pdt >c.pdt    
Expand Expand(a_pdt, parens, &b_fst); expands a (bounded-stack) PDT as an FST Time, Space: O(e(V + E))
  pdtexpand -pdt_parentheses=pdt.parens a.pdt >b.fst    
Info pdtinfo -pdt_parentheses=pdt.parens a.pdt prints out information about a PDT  
Replace Replace(fst_label_pairs, &b_pdt, root_label, &parens); Converts an RTN represented by FSTs and non-terminal labels into a PDT Time, Space: O(∑ (Vi + Ei))
  pdtreplace -pdt_parentheses=pdt.parens root.fst rootlabel [rule1.fst rule1.label ...] out.pdt    
Reverse Reverse(a_pdt, parens, &b_pdt); reverses a PDT Time, Space: O(V + E)
  pdtreverse -pdt_parentheses=pdt.parens a.pdt >b.pdt    
ShortestPath ShortestPath(a_pdt, parens, &b_fst); find the shortest path in a (bounded-stack) PDT (cf. Earley) Time: O((V + E)3), Space: O((V + E)2)
  pdtshortestpath -pdt_parentheses=pdt.parens a.pdt >b.fst    
>
>
Compose Compose(a_pdt, parens, b_fst, &c_pdt); compose an MPDT and an FST with MPDT result (Bar-Hillel) Same as FST composition
  Compose(a_fst, b_mpdt, parens, assignments, &c_mpdt);    
  mpdtcompose -mpdt_parentheses=mpdt.parens a.mpdt b.fst >c.mpdt    
  mpdtcompose -mpdt_parentheses=mpdt.parens -left_mpdt=false a.fst b.mpdt >c.mpdt    
Expand Expand(a_mpdt, parens, assignments, &b_fst); expands a (bounded-stack) MPDT as an FST Time, Space: O(e(V + E)) for the default 2-stack read-restrict configuration
  mpdtexpand -mpdt_parentheses=mpdt.parens a.mpdt >b.fst    
Info mpdtinfo -mpdt_parentheses=mpdt.parens a.mpdt prints out information about an MPDT  
Reverse Reverse(a_mpdt, parens, &b_mpdt); reverses an MPDT; also reverses the stack assignments Time, Space: O(V + E)
  mpdtreverse -mpdt_parentheses=mpdt.parens a.mpdt >b.mpdt    
  There are also delayed versions of these algorithms where possible. See the header files for additional information including options.
Changed:
<
<
Note with this FST-based representation of PDTs, many FST operations (e.g., Concat, Closure, Rmepsilon, Union) work equally well on PDTs as FSTs.
>
>
Note with this FST-based representation of MPDTs, many FST operations (e.g., Concat, Closure, Rmepsilon, Union) work equally well on MPDTs as FSTs.
 
Changed:
<
<
As an example of this representation and these algorithms, the transducer in textual format:
>
>
As an example of this representation and these algorithms, consider an MPDT that implements the ww copy language over the alphabet {1, 2}.
 
Changed:
<
<
fstcompile >pdt.fst <<EOF
>
>
fstcompile >mpdt.fst <<EOF
 0 1 1 1
Added:
>
>
0 2 2 2 0 3 0 0
 1 0 3 3
Changed:
<
<
0 2 0 0 2 3 2 2 3 2 4 4 2
>
>
2 0 4 4 3 4 5 5 3 5 6 6 3 6 0 0 4 3 7 7 5 3 8 8 6 7 0 1 6 8 0 2 6 7 6 9 9 8 6 10 10
 EOF

with parentheses:

Changed:
<
<
cat >pdt.parens <<EOF 3 4
>
>
cat >mpdt.parens <<EOF 3 5 1 4 6 1 7 9 2 8 10 2
 EOF
Changed:
<
<
accepts 1n2n.
>
>
and input:
 
Changed:
<
<
This can be shown with:

$ fstcompile >1122.fst <<EOF

>
>
fstcompile >input.fst <<EOF
 0 1 1 1
Changed:
<
<
1 2 1 1
>
>
1 2 2 2
 2 3 2 2 3 4 2 2
Changed:
<
<
4
>
>
4 5 2 2 5 6 1 1 6 7 1 1 7 8 2 2 8 9 2 2 9
 EOF
Changed:
<
<
# intersect the FST and PDT; the result is a PDT pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | # expand the (bounded-stack) PDT into an FST; this enforces the parenthesis matching pdtexpand --pdt_parentheses=pdt.parens | # remove epsilons (formerly the parentheses) fstrmepsilon | fstprint 0 1 1 1 1 2 1 1 2 3 2 2 3 4 2 2 4

Had the input string been 112, the result of the composition would be a non-empty PDT representing a path with unbalanced parentheses. The following expansion step would then result in an empty FST.

>
>
 
Changed:
<
<
The above recognition algorithm has the exponential complexity of conventional PDT parsing. An alternate approach with cubic complexity, which is a generaliztion of Earley's algorihtm, is:

# intersect the FST and PDT; the result is a PDT 
pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | 
# find the shortest balanced path 
pdtshorestpath --pdt_parentheses=pdt.parens | 
# remove epsilons (formerly the parentheses) 
fstrmepsilon | fstprint
0	1	1	1
1	2	1	1
2	3	2	2
3	4	2	2
4
>
>
Then the following pipeline:
mpdtcompose --left_mpdt=false --mpdt_parentheses=parens input.fst redup.fst | 
mpdtexpand --mpdt_parentheses=parens | 
fstproject --project_output | 
fstrmepsilon | 
fsttopsort | 
fstprint --acceptor
will produce the following output:
 
Changed:
<
<
Finally, the following invocation returns all paths within $threshold of the best accepted path as an FST (cf. Prune). The algorithm has cubic complexity when the threshold is near 0.0 (dominated by a shortest distance computation) and becomes exponential as it approaches infinity (dominated by the expansion operation):


# intersect the FST and PDT; the result is a PDT 
pdtcompose --pdt_parentheses=pdt.parens pdt.fst in.fst | 
# expand the (bounded-stack) PDT into an FST keeping paths within a threshold of the best path
pdtexpand --weight=$threshold ---pdt_parentheses=pdt.parens >out.fst 
>
>
0   1   1
1   2   2
2   3   2
3   4   2
4   5   2
5   6   1
6   7   1
7   8   2
8   9   2
9   10   1          # Copy starts here
10   11   2
11   12   2
12   13   2
13   14   2
14   15   1
15   16   1
16   17   2
17   18   2
18
 
META FILEATTACHMENT attachment="ciaa12.pdf" attr="" comment="" date="1337639906" moveby="MichaelRiley" movedto="FST.PdtExtension.ciaa12.pdf" movedwhen="1435278644" movefrom="FST.FstExtensions.ciaa12.pdf" name="ciaa12.pdf" path="ciaa12.pdf" size="194246" user="CyrilAllauzen" version="2"

Revision 62015-06-26 - MichaelRiley

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"

Pushdown Transducer Library (PDTs)

Line: 111 to 111
 # expand the (bounded-stack) PDT into an FST keeping paths within a threshold of the best path pdtexpand --weight=$threshold ---pdt_parentheses=pdt.parens >out.fst \ No newline at end of file
Added:
>
>
META FILEATTACHMENT attachment="ciaa12.pdf" attr="" comment="" date="1337639906" moveby="MichaelRiley" movedto="FST.PdtExtension.ciaa12.pdf" movedwhen="1435278644" movefrom="FST.FstExtensions.ciaa12.pdf" name="ciaa12.pdf" path="ciaa12.pdf" size="194246" user="CyrilAllauzen" version="2"

Revision 52015-06-25 - KyleGorman

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"

Pushdown Transducer Library (PDTs)

Line: 6 to 6
 A PDT is encoded as an FST, where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path.
Changed:
<
<
See here for more information.
>
>
configure
<-- -->
Sorted ascending
include
--enable-pdt <fst/extensions/pdt/pdtlib.h>

A PDT is encoded as an FST where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path. The subset of the transducer labels that correspond to parenthesis (open, closed) pairs is designated from the C++-library in a vector<pair<Label, Label>> and from the command line in a file of lines of label pairs (typically passed with the flag --pdt_parentheses). See Cyril Allauzen and Michael Riley, "A Pushdown Transducer Extension for the OpenFst Library", Proceedings of the Seventeenth International Conference on Implementation and Application of Automata, (CIAA 2012)

The following operations, many which have FST analogues (but are distinguished in C++ by having a vector<pair<Label, Label>> parenthesis pair argument), are provided for PDTs:

Operation Usage Description Complexity
Compose Compose(a_pdt, parens, b_fst, &c_pdt); compose a PDT and an FST with PDT result (Bar-Hillel) Same as FST composition
  Compose(a_fst, b_pdt, parens, &c_pdt);    
  pdtcompose -pdt_parentheses=pdt.parens a.pdt b.fst >c.pdt    
  pdtcompose -pdt_parentheses=pdt.parens -pdt_left_pdt=false a.fst b.pdt >c.pdt    
Expand Expand(a_pdt, parens, &b_fst); expands a (bounded-stack) PDT as an FST Time, Space: O(e(V + E))
  pdtexpand -pdt_parentheses=pdt.parens a.pdt >b.fst    
Info pdtinfo -pdt_parentheses=pdt.parens a.pdt prints out information about a PDT  
Replace Replace(fst_label_pairs, &b_pdt, root_label, &parens); Converts an RTN represented by FSTs and non-terminal labels into a PDT Time, Space: O(∑ (Vi + Ei))
  pdtreplace -pdt_parentheses=pdt.parens root.fst rootlabel [rule1.fst rule1.label ...] out.pdt    
Reverse Reverse(a_pdt, parens, &b_pdt); reverses a PDT Time, Space: O(V + E)
  pdtreverse -pdt_parentheses=pdt.parens a.pdt >b.pdt    
ShortestPath ShortestPath(a_pdt, parens, &b_fst); find the shortest path in a (bounded-stack) PDT (cf. Earley) Time: O((V + E)3), Space: O((V + E)2)
  pdtshortestpath -pdt_parentheses=pdt.parens a.pdt >b.fst    

There are also delayed versions of these algorithms where possible. See the header files for additional information including options. Note with this FST-based representation of PDTs, many FST operations (e.g., Concat, Closure, Rmepsilon, Union) work equally well on PDTs as FSTs.

As an example of this representation and these algorithms, the transducer in textual format:

fstcompile >pdt.fst <<EOF
0 1 1 1
1 0 3 3
0 2 0 0 
2 3 2 2
3 2 4 4
2
EOF

with parentheses:

cat >pdt.parens <<EOF
3 4
EOF

accepts 1n2n.

This can be shown with:

$ fstcompile >1122.fst <<EOF
0 1 1 1
1 2 1 1
2 3 2 2
3 4 2 2
4
EOF

# intersect the FST and PDT; the result is a PDT 
pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | 
# expand the (bounded-stack) PDT into an FST; this enforces the parenthesis matching 
pdtexpand --pdt_parentheses=pdt.parens | 
# remove epsilons (formerly the parentheses) 
fstrmepsilon | fstprint
0	1	1	1
1	2	1	1
2	3	2	2
3	4	2	2
4

Had the input string been 112, the result of the composition would be a non-empty PDT representing a path with unbalanced parentheses. The following expansion step would then result in an empty FST.

The above recognition algorithm has the exponential complexity of conventional PDT parsing. An alternate approach with cubic complexity, which is a generaliztion of Earley's algorihtm, is:


# intersect the FST and PDT; the result is a PDT 
pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | 
# find the shortest balanced path 
pdtshorestpath --pdt_parentheses=pdt.parens | 
# remove epsilons (formerly the parentheses) 
fstrmepsilon | fstprint
0	1	1	1
1	2	1	1
2	3	2	2
3	4	2	2
4

Finally, the following invocation returns all paths within $threshold of the best accepted path as an FST (cf. Prune). The algorithm has cubic complexity when the threshold is near 0.0 (dominated by a shortest distance computation) and becomes exponential as it approaches infinity (dominated by the expansion operation):


# intersect the FST and PDT; the result is a PDT 
pdtcompose --pdt_parentheses=pdt.parens pdt.fst in.fst | 
# expand the (bounded-stack) PDT into an FST keeping paths within a threshold of the best path
pdtexpand --weight=$threshold ---pdt_parentheses=pdt.parens >out.fst 
 \ No newline at end of file

Revision 42012-05-10 - MichaelRiley

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"

Pushdown Transducer Library (PDTs)

Changed:
<
<
This is an experimental push-down transducer (PDT) library. A PDT is encoded as an FST, where some transitions are labeled with open or close
>
>
This is a push-down transducer (PDT) extension of the OpenFst library. A PDT is encoded as an FST, where some transitions are labeled with open or close
 parentheses. To be interpreted as a PDT, the parentheses must balance on a path.
Changed:
<
<
This is an extension library of the OpenFst library. See here for more information.
>
>
See here for more information.

Revision 32012-03-02 - MichaelRiley

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"
Changed:
<
<

Pushdown Transducers (PDTs)

>
>

Pushdown Transducer Library (PDTs)

 
Changed:
<
<
Work in progress, under construction This is an experimental push-down transducer (PDT) library. A PDT is
>
>
This is an experimental push-down transducer (PDT) library. A PDT is
 encoded as an FST, where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path.
Changed:
<
<
-- CyrilAllauzen - 07 Jun 2011
>
>
This is an extension library of the OpenFst library. See here for more information.

Revision 22011-06-08 - CyrilAllauzen

Line: 1 to 1
 
META TOPICPARENT name="FstExtensions"

Pushdown Transducers (PDTs)

Added:
>
>
Work in progress, under construction This is an experimental push-down transducer (PDT) library. A PDT is encoded as an FST, where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path.
 

-- CyrilAllauzen - 07 Jun 2011 \ No newline at end of file

Revision 12011-06-07 - CyrilAllauzen

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="FstExtensions"

Pushdown Transducers (PDTs)

-- CyrilAllauzen - 07 Jun 2011

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback