TWiki
>
FST Web
>
FstExtensions
>
PdtExtension
(2015-10-29,
RichardSproat
)
(raw view)
E
dit
A
ttach
---+ Pushdown Transducer Library (PDTs) This is a push-down transducer (PDT) extension of the [[http://www.openfst.org][OpenFst]] library. A PDT is encoded as an FST, where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path. | *configure* | *include * | | =--enable-pdt= | =<fst/extensions/pdt/pdtlib.h>= | A PDT is encoded as an FST where some transitions are labeled with open or close parentheses. To be interpreted as a PDT, the parentheses must balance on a path. The subset of the transducer labels that correspond to parenthesis (open, closed) pairs is designated from the C++-library in a =vector<pair<Label, Label>>= and from the command line in a file of lines of label pairs (typically passed with the flag =--pdt_parentheses=). See Cyril Allauzen and Michael Riley, [[%ATTACHURL%/ciaa12.pdf]["A Pushdown Transducer Extension for the OpenFst Library"]], _Proceedings of the Seventeenth International Conference on Implementation and Application of Automata_, (CIAA 2012) The following operations, many which have FST analogues (but are distinguished in C++ by having a =vector<pair<Label, Label>>= parenthesis pair argument), are provided for PDTs: | *Operation* | *Usage* | *Description* | *Complexity* | | Compose | Compose(a_pdt, parens, b_fst, &c_pdt); | compose a PDT and an FST with PDT result (Bar-Hillel) | Same as FST [[ComposeDoc][composition]] | | | Compose(a_fst, b_pdt, parens, &c_pdt); | | | | | pdtcompose -pdt_parentheses=pdt.parens a.pdt b.fst >c.pdt | | | | | pdtcompose -pdt_parentheses=pdt.parens -pdt_left_pdt=false a.fst b.pdt >c.pdt | | | | Expand | Expand(a_pdt, parens, &b_fst); | expands a (bounded-stack) PDT as an FST | Time, Space: _O(e<sup>O(V + E)</sup>)_ | | | pdtexpand -pdt_parentheses=pdt.parens a.pdt >b.fst | | | | Info | pdtinfo -pdt_parentheses=pdt.parens a.pdt | prints out information about a PDT | | | Replace | Replace(fst_label_pairs, &b_pdt, root_label, &parens); | Converts an RTN represented by FSTs and non-terminal labels into a PDT | Time, Space: _O(∑ (V<sub>i</sub> + E<sub>i</sub>))_ | | | pdtreplace -pdt_parentheses=pdt.parens root.fst rootlabel [rule1.fst rule1.label ...] out.pdt | | | | Reverse | Reverse(a_pdt, parens, &b_pdt); | reverses a PDT | Time, Space: _O(V + E)_ | | | pdtreverse -pdt_parentheses=pdt.parens a.pdt >b.pdt | | | | !ShortestPath | !ShortestPath(a_pdt, parens, &b_fst); | find the shortest path in a (bounded-stack) PDT (cf. Earley) | Time: _O((V + E)<sup>3</sup>)_, Space: _O((V + E)<sup>2</sup>)_ | | | pdtshortestpath -pdt_parentheses=pdt.parens a.pdt >b.fst | | | There are also delayed versions of these algorithms where possible. See the header files for additional information including options. Note with this FST-based representation of PDTs, many FST operations (e.g., =Concat=, =Closure=, =Rmepsilon=, =Union=) work equally well on PDTs as FSTs. As an example of this representation and these algorithms, the transducer in textual format: <verbatim> fstcompile >pdt.fst <<EOF 0 1 1 1 1 0 3 3 0 2 0 0 2 3 2 2 3 2 4 4 2 EOF </verbatim> with parentheses: <verbatim> cat >pdt.parens <<EOF 3 4 EOF </verbatim> accepts =1<sup>n</sup>2<sup>n</sup>=. This can be shown with: <pre> $ fstcompile >1122.fst <<EOF 0 1 1 1 1 2 1 1 2 3 2 2 3 4 2 2 4 EOF %RED% # intersect the FST and PDT; the result is a PDT %ENDCOLOR% pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | %RED% # expand the (bounded-stack) PDT into an FST; this enforces the parenthesis matching %ENDCOLOR% pdtexpand --pdt_parentheses=pdt.parens | %RED% # remove epsilons (formerly the parentheses) %ENDCOLOR% fstrmepsilon | fstprint 0 1 1 1 1 2 1 1 2 3 2 2 3 4 2 2 4 </pre> Had the input string been =112=, the result of the composition would be a non-empty PDT representing a path with unbalanced parentheses. The following expansion step would then result in an empty FST. The above recognition algorithm has the exponential complexity of conventional PDT parsing. An alternate approach with cubic complexity, which is a generaliztion of Earley's algorihtm, is: <pre> %RED% # intersect the FST and PDT; the result is a PDT %ENDCOLOR% pdtcompose --pdt_parentheses=pdt.parens pdt.fst 1122.fst | %RED% # find the shortest balanced path %ENDCOLOR% pdtshorestpath --pdt_parentheses=pdt.parens | %RED% # remove epsilons (formerly the parentheses) %ENDCOLOR% fstrmepsilon | fstprint 0 1 1 1 1 2 1 1 2 3 2 2 3 4 2 2 4 </pre> Finally, the following invocation returns all paths within =$threshold= of the best accepted path as an FST (cf. [[PruneDoc][Prune]]). The algorithm has cubic complexity when the threshold is near 0.0 (dominated by a shortest distance computation) and becomes exponential as it approaches infinity (dominated by the expansion operation): <pre> %RED% # intersect the FST and PDT; the result is a PDT %ENDCOLOR% pdtcompose --pdt_parentheses=pdt.parens pdt.fst in.fst | %RED% # expand the (bounded-stack) PDT into an FST keeping paths within a threshold of the best path%ENDCOLOR% pdtexpand --weight=$threshold ---pdt_parentheses=pdt.parens >out.fst %RED% </pre>
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
pdf
ciaa12.pdf
r2
r1
manage
189.7 K
2012-05-21 - 22:38
CyrilAllauzen
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r7 - 2015-10-29
-
RichardSproat
FST
Log In
or
Register
FST Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback