Line: 1 to 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

## Pushdown Transducer Library (PDTs) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Line: 6 to 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

A PDT is encoded as an FST, where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Changed: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

< < | See here for more information. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

> > |
A PDT is encoded as an FST where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses
must balance on a path. The subset of the transducer labels that correspond to parenthesis (open, closed) pairs is designated from the C++-library in a
The following operations, many which have FST analogues (but are distinguished in C++ by having a
There are also delayed versions of these algorithms where possible. See the header files for additional information including options.
Note with this FST-based representation of PDTs, many FST operations (e.g., As an example of this representation and these algorithms, the transducer in textual format:
fstcompile >pdt.fst <<EOF 0 1 1 1 1 0 3 3 0 2 0 0 2 3 2 2 3 2 4 4 2 EOF with parentheses:
cat >pdt.parens <<EOF 3 4 EOF
accepts This can be shown with:
$ fstcompile >1122.fst <<EOF 0 1 1 1 1 2 1 1 2 3 2 2 3 4 2 2 4 EOF # intersect the FST and PDT; the result is a PDT pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | # expand the (bounded-stack) PDT into an FST; this enforces the parenthesis matching pdtexpand --pdt_parentheses=pdt.parens | # remove epsilons (formerly the parentheses) fstrmepsilon | fstprint 0 1 1 1 1 2 1 1 2 3 2 2 3 4 2 2 4
Had the input string been The above recognition algorithm has the exponential complexity of conventional PDT parsing. An alternate approach with cubic complexity, which is a generaliztion of Earley's algorihtm, is: # intersect the FST and PDT; the result is a PDT pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | # find the shortest balanced path pdtshorestpath --pdt_parentheses=pdt.parens | # remove epsilons (formerly the parentheses) fstrmepsilon | fstprint 0 1 1 1 1 2 1 1 2 3 2 2 3 4 2 2 4
Finally, the following invocation returns all paths within
# intersect the FST and PDT; the result is a PDT pdtcompose --pdt_parentheses=pdt.parens pdt.fst in.fst | # expand the (bounded-stack) PDT into an FST keeping paths within a threshold of the best path pdtexpand --weight=$threshold ---pdt_parentheses=pdt.parens >out.fst | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

\ No newline at end of file |

View topic | History: r7 < r6 < r5 < r4 | More topic actions...

Copyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback