Difference: GrmThraxForum (84 vs. 85)

Revision 852017-10-20 - PooriaAzimi

Line: 1 to 1

OpenGrm Thrax Forum

Line: 17 to 17

Mapping between input and output tokens

PooriaAzimi - 2017-10-20 - 18:58

Suppose I have a simple words_to_numbers.grm that, given a spelled-out number string, will return multiple possible interpretations for it:

<verbatim> Input String: six twenty two

Output String: 622 <cost: 0.2> Output String: 6 22 <cost: 0.4> Output String: 620 2 <cost: 0.4> </verbatim>

What I would like is to be able to map the output tokens to the input tokens. An example would be something like this:

<verbatim> Output String: 622<"six twenty two"> <cost: 0.2> Output String: 6<"six"> 22<"twenty two"> <cost: 0.4> Output String: 620<"six twenty"> 2<"two"> <cost: 0.4> </verbatim>

(or just provide the character positions of each new token, or anything else that could possibly help you do the mapping at a later stage)

You can't do this post-rewrite; it's impossible to know whether "(six) (twenty two)" transduced to "6 22", or "(six twenty) two".

I don't believe this is possible to do with `thraxrewrite-tester`, or just trying to add the markup in grammar rules. I've also looked at both thrax and open-fst code and tried to see what it takes to carry over the input states forward through rewrites but haven't had any success yet.

The grammars I'm working on are much more complicated than this example (400k nodes and millions of arcs for a very sophisticated NLU module) and being able to provide some sort of mapping between input and output is essential to be able to integrate thrax into the rest of the application.

Thank you very much for this incredibly useful tool, and any help or hints are greatly appreciated!

PooriaAzimi - 2017-10-20 - 19:02

^ the formatting seems to be off; here's a slightly better formatted version of the post: https://gist.github.com/anonymous/522156df4ce78f2592805c8f417c5687
Log In


Using Thrax compiled grammars with Pynini

ButteredGroove - 2017-07-06 - 18:29

This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback