OpenFst Forum 2010 Archive

On the fly composition

GavrilovichSimonov - 29 Dec 2010 - 09:39

Hi, I use OpenFst to build an OCR system In the workflow, I compose two FSTs (one representing the optical model, another the observations) and get the shortest path (best recognized word).

The whole process is quite slow (10s) and I notice that the composition took about 5s.

I try to use on the fly composition. But then the composition is very fast, but the shortest path become even slower : 19.63s !! I use the default options.

Did I miss something ? How can I optimize my workflow ?

Thanks, Sergei

KonstantinSelivanov - 25 Jan 2011 - 07:29

I think you have to use 'NaturalShortestFirstQueue' queue discipline. Otherwise your automaton would be expanded.

You can try something like this:

// suppose you have StdVectorFst fst_a, fst_b

StdVectorFst fst_c; // to store result vector<StdArc::Weight> distance; int nshortest = 1;

NaturalShortestFirstQueue<StdArc::StateId, StdArc::Weight> state_queue(distance);

ShortestPathOptions<StdArc, NaturalShortestFirstQueue<StdArc::StateId, StdArc::Weight>, AnyArcFilter > opts(&state_queue, AnyArcFilter(), nshortest);

ComposeFst cfst(fst_a, fst_b); opts.first_path = true; ShortestPath(cfst, &fst_c, &distance, opts);

Log In

About FST recognition and transduction algorithm

JoachimChau - 29 Dec 2010 - 04:10

Hi,
I need to do some text normalization work, eg, transforms "a 10% rise" to "a ten precent rise", I have two problems:
1) Mark the start and end: Given the sentence (eg. "a 10% rise") and a FST (eg. FST recognizes percentage), how can I get the sub-sentence (mark the start and end in the original sentence)the FST accepts?
2) transduction algorithm: Given an input string, provide the output string(s) as defined by the regular relation provided by an FST.

Thanks,
Joachim

Log In

LoG composition eps normalization runs out of memory

JagadeeshB - 29 Dec 2010 - 02:43

Hi, In the SLT 2010 slides, it is mentioned (slide #49) that standard composition for the Broadcast News task requires a RAM of 5.3GB. I am trying a simple det(LoG) experiment with a 2-gram language model (ngram 1=36345, ngram 2=730914) and a dictionary L with 25K words.

The grammar FST has : # of states 23558 # of arcs 600034

The lexicon FST has : # of states 183757 # of arcs 210297

I am doing fstarcsort and fstdeterminize on G and only fstclosure on L.

When I do fstcompose L.fst G.fst |fstepsnormalize|fstdeterminize, the fstepsnormalize step eatsup 16GB of memory and very quickly and it just fails after sometime. When I reduce the dictionary size by about half, everything runs successfully. Is there a reason why I am using up so much memory compared what is given in the tutorial for what seems to be bigger task ? Please point me to anything I might be doing wrong here.

Thanks, Jagadeesh

PaulDixon - 09 Jan 2011 - 02:52

Did you label the backoff arcs in G with epsilons or an axillary symbol? If you used epsilons this could cause epsnormalize to blow up.

Did you try omitting the epsilon normalization step?

Also you will probably find the composition of L and G is faster if you sort the output of L instead of (or as well as) the input of G.

JosefRNovak - 18 Feb 2011 - 06:45

As Paul mentions there is probably no need to perform epsilon normalization at any point in your build process. Also, assuming you are using a standard ARPA→WFST transformation for your G, you should not need to determinize it prior to composition with your lexicon.

RaymondNg - 14 Jun 2012 - 09:08

I have the same problem. I skipped eps normalization. After composelg, minimization hangs here for days and giving no output. Can I ask if there are any tricks I have missed out? Or any implementation traps (combination of software, data pruning, etc.) that I have to take care of? Thank you very much

HuanliangWang - 16 Jun 2012 - 06:28

Hi, RaymondNg I also have the same problem. I can give two balance mehods. 1. You can convert LG.fst into tropocal semiring, then run fstminimize, then convert it into log semiring. 2. You check negative weights in LG.fst, then modify them.

Log In

N-gram model computation

KonstantinSelivanov - 19 Dec 2010 - 10:17

Hi,

is it possible to compute fst n-gram representation using openfst (or openkernel)?

It's not so hard to compute n-gram model using openfst without smoothing. But I need a smoothing technique so it's a problem for me.

Thanks.

JosefRNovak - 18 Feb 2011 - 02:26

Why don't you just use your favorite LM training module and then convert the result to a WFST?
Log In

Thread safety

MarkusD - 01 Dec 2010 - 23:47

If you want to run OpenFst functions in parallel using multiple threads, you should make the following small code change in fst/lock.h to make it thread-safe and prevent crashes:

  //int Incr() const { return ++count_; }                                                                                                  
  //int Decr() const {  return --count_; }                                                                                                 
  int Incr() const { return __sync_add_and_fetch(&count_, 1); }
  int Decr() const { return __sync_sub_and_fetch(&count_, 1); }

These functions are available in GCC 4.1 or higher. Otherwise use mutex locks there.

Just wanted to share this info, in case other people needs threads too. Mike tells me something like that is used internally at Google but it's not included in the tarball for better code portability.

JosefNovak - 14 Jun 2011 - 18:19

this was a big help, thanks!
Log In

Shell command: Print all words / create FST from word list

GeorgJaehnig - 01 Dec 2010 - 14:06

Hi, is there a shell command that prints all words recognized by an FST?

And is there an easy way on the shell for the other way around: To create an FST from a list of words?

KonstantinSelivanov - 19 Dec 2010 - 10:33

I don't know any shell commands for that. But you can use depth-first search to collect a labels along the paths.

Something like the following (for acyclic only!): ... vector traverse(StdVectorFst& fst, int state, string str=string()) { vector strs; if (fst.Final(state) = StdArc::Weight::Zero()) strs.push_back(str); for (ArcIterator > aiter(fst, state); aiter.Done(); aiter.Next()) { const StdArc &arc = aiter.Value(); vector r = traverse(fst, arc.nextstate, str + st->Find(arc.olabel)); vector::iterator i; for(i=r.begin(); i!=r.end(); i++) strs.push_back(*i); } return strs; } ...

And the second question. I don't know easy way either. But you can construct it "by hand".

GeorgJaehnig - 28 Apr 2011 - 09:48

It seems problem number one is solved: someone started openfst-tools: http://code.google.com/p/openfst-tools/ containing the command "fstprintpaths"

JosefRNovak - 12 May 2011 - 02:48

I made some modifications to this printpaths class a while back and included it in another small OS project. It returns a vector of the paths rather than just printing them out directly, and makes it a bit easier to filter unwanted tokens from the output. In case you're still looking around:

http://code.google.com/p/phonetisaurus/source/browse/

see FstPathFinder.cpp and FstPathFinder.hpp .

Log In

Edit Distance- compose 3 fst

BlueSky - 23 Nov 2010 - 12:55

Please I need support, I am trying to compose 3 FSTs (fst-string1, edit-distance fst, fst-string2). Similar in the paper"Linear Space Computation of the Edit Distance between a String and a Finite Automaton"by Allauzen, Mohri. I am getting many arcs with 0:0/1 in the composed FST which should not appear. Cheers, Memo

CyrilAllauzen - 24 Nov 2010 - 11:01

Could you call fstinfo your_fst | grep epsilons on each of your 3 Fsts and post the output below (between verbatim tags preferably)?

Cyril

BlueSky - 25 Nov 2010 - 07:15

I appreciate your support, Best Regards

Begin fst1 fstinfo fst-in.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 0 input/output epsilons n input epsilons n output epsilons n End fst1

Begin fst2 fstinfo ed-loop.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 3 # of output epsilons 3 input/output epsilons n input epsilons y output epsilons y End fst2

Begin fst3 fstinfo fst-corr.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 0 input/output epsilons n input epsilons n output epsilons n End fst3

Begin fst-composition-3fsts fstinfo fst_ed_comp2.fst | grep epsilons # of input/output epsilons 8 # of input epsilons 8 # of output epsilons 8 input/output epsilons y input epsilons y output epsilons y End fst-composition-3fsts

CyrilAllauzen - 29 Nov 2010 - 13:48

Everything seems fine. Could you give me the exact sequence of operations used to generate fst_ed_comp2.fst ?

BlueSky - 30 Nov 2010 - 10:10

Thanks a lot! step 1) compose(fst1,fst2) ArcSortOutput(fst1) ArcSortInput(fst2) Compose(fst1,fst2,fst_ed_comp1) ProjectOutput(fst_ed_comp1) return fst_ed_comp1

step 2)compose(fst_ed_comp1,fst3) repeat the same sequence of operations return fst_ed_comp2

CyrilAllauzen - 30 Nov 2010 - 10:56

I think this explains everything. You project the results of composition on their output. This turns transitions of the form a:epsilon corresponding to deletions into input/output epsilon transitions.
Log In

When is it "safe" to call Synchronize() ?

KennethRBeesley - 19 Nov 2010 - 17:11

Can you test an Fst to determine if it's safe to call Synchronize() on it? E.g. can you always call Synchronize() safely on an acylic Fst?

CyrilAllauzen - 22 Nov 2010 - 13:17

You can always synchronize an acyclic an acyclic transducer.

As mentionned in FST.SynchronizeDoc, the condition for Synchronize to terminate is that the transducer has bounded delay, which is always the case for acyclic (the delay being at most the number of states in that case).

In the cyclic case, the bounded delay condition is equivalent to the condition that the delay of every cycle has to be zero, i.e. for every cycle c, the length of the input of c is equal to the length of the output of c.

BlueSky - 08 Feb 2011 - 05:06

Hi All,

I would like to ask about 2 things. First does anyone know a method in openfst which prints the fst with the characters not ascii ?

I have a problem with the minimization function, i created an fst with 60,000 words. The fst is created and determinized, but then the minimize function does not finish??

I appreciate the help :), Cheers Meme

Log In

Unable to minimize already determinized transducer

DanielBolanos - 16 Nov 2010 - 12:20

Hello,

I determinized successfully my transducer (HC o L o G) by first transforming it to an acceptor, and then tried to minimize it, however the minimization process never ends (I stop it after about an hour, note that the determinization of this transducer takes about a second). The transducer is not that big, about 50k states and 95k transitions. It is a weighted and cyclic acceptor, does anybody know what is going on?

thanks so much for your help

CyrilAllauzen - 17 Nov 2010 - 11:39

Hi,

Which version of the library are you using? There was a hashing bug in older versions that could lead to some inefficiency in Minimize.
Which semiring is the automaton over? Does fstshortestdistance terminate on this machine?

Best, Cyril

DanielBolanos - 17 Nov 2010 - 17:26

Hi Cyril,

I'm using version: 1.2.4, should I try the latest one? The automaton is over the log semiring (I specify --arc_type=log on fstcompile, which I believe is the way to do it) No, fstshortestdistance does not finish, and fstpush does not finish either.

thanks so much for your help Cyril, you are providing a great service

Daniel

CyrilAllauzen - 17 Nov 2010 - 19:39

Hi Daniel,

The issue is that the shortest distance does not converge in the log semiring when the automaton is not stochastic. You need to convert you machine to the tropical semiring, apply minimization and then convert back to the log.

Cyril

DanielBolanos - 18 Nov 2010 - 12:55

Thanks for your help Cyril, I moved to the tropical semiring (--arc_type=standard) and ran fstshortestdistance, but it still does not finish (if I run it on verbose mode it prints lines on the form: "INFO: AutoQueue: SCC #XXX: using trivial discipline", where XXX is each of the strongly connected components in the machine). fstdeterminize does not finish either.

thanks so much for your help again

Daniel

DanielBolanos - 18 Nov 2010 - 17:45

Cyril, I found the problem, it was a silly mistake, I had negative weights. I replaced them by positive weights and everything works fine.

thanks so much for your help, this is a great library

Daniel

Log In

Pre-Determinization Algorithm

HasimSak - 16 Nov 2010 - 12:09

Hi,

Does anybody know if there is an implementation (in openfst or other library) of the pre-determinization algorithm described in the paper "An efficient pre-determinization algorithm" by Allauzen and Mohri? I would really appreciate if someone has already implemented this algorithm and can share with us.

Please note that "fsmdeterminize -l" and "fstdeterminize --subsequential_label" don't implement this.

Thanks

CyrilAllauzen - 17 Nov 2010 - 11:20

I don't know of any available implementation of this algorithm.

Best, Cyril

Log In

FST Determinization Question

HasimSak - 14 Nov 2010 - 16:19

Hi,

Does fstdeterminize expect the input transducer to be epsilon normalized? When I give a specific functional transducer (not epsilon normalized), fstdeterminize terminates but the resulting transducer is not deterministic (due to two input epsilon arcs leaving the same state). When I determinize it twice, the result is then deterministic. An example functional fst is:

0 1 a x
0 2 a y
1 3 b eps
2 3 c z
2 3 d w
3 4 eps a
4 5 s eps
3
5

The resulting transducer after determinization is:

0       1       a       eps
1       2       b       x
1       3       c       y
1       4       d       y
2       5       eps   a
2
3       6       eps   z
3       7       eps   z
4       6       eps   w
4       7       eps   w
5       8       s       eps
6       8       s       a
7
8
Specifically in this case, I don't understand why the determinization algorithm creates an extra final state (7), while it can just make the state 6 final. Is this an expected behaviour?

Thanks

CyrilAllauzen - 17 Nov 2010 - 11:16

This is the expected behavior. Determinization of transducers is implemented as determinization of weighted automata over the string semiring. This lead to an automaton over the string semiring where 3 is final with final weight "z". When converting this automaton into a transducer leads to the creation of the state 7 and the transition from 3 to 7 with input epsilon and output z. You can use the --subsequential_label option to specify an input label other than epsilon for the transition form 3 to 7 leading to a machine that is indeed deterministic.

In your example, it would indeed be enough to make 6 final but in general this is difficult to determine since there might be other paths to 6 that would be affected by making 6 final.

If you do not want to use the --subsequential_label, an other solution to obtain a deterministic transducer would be to subsequently apply to the output of determinization: Encode (labels), Determinize, Decode.

Best, Cyril

HasimSak - 17 Nov 2010 - 15:23

I see, thank you very much Cyril.

Hasim

Log In

compiler warning comparison signed and unsigned

KeithPonting - 12 Nov 2010 - 06:01

Hi,
Thankyou for making OpenFst available - I have happily downloaded and built it, but using from my own code with gcc (4.3.2 or 4.1.2) and -Wall (which is our standard) I get many warnings about signed/unsigned comparisons. I know I can eliminate these with an extra -Wno-sign-compare switch, but I prefer code to compile as cleanly as possible (and need it to do so in order to persuade others in the group to adopt the library). Have you any plans to make OpenFst signed-ness clean?
To reproduce just add -Wall to compilation of fst_test
cd openfst-1.2.5/src/test
g++ -DHAVE_CONFIG_H   -I./../include -D_REENTRANT  -g -O2 -MT fst_test.o -MD -MP -MF .deps/fst_test.Tpo -c -o fst_test.o fst_test.cc -Wall

Log In

Determinize problem and Composition filters

HyungBaeJeon - 02 Nov 2010 - 20:45

Hi,

I want to do composition of L, G. But Determnize is not terminate.

My shell command is like below; fstcompose L.fst G.fst LG.com.fst fstepsnormalize LG.com.fst LG.epn.fst fstdeterminize LG.epn.fst LG.det.fst fstencode --encode_labels LG.det.fst codex LG.encode.fst fstminimize LG.encode.fst LG.min.fst fstencode --decode LG.min.fst codex LG.decode.fst fstarcsort LG.decode.fst LG.final.fst

I did arcsort input lable of G.fst, output label of L.fst. And G was generated from 1.3M entry trigrams.

Is there any mistake in my script?

When i use openfst v1.0, determinize is complete. And, If i use at&t fsmdeterminize, it is also working well. Do I need additional argument or optional parameter?

Below is my fst size.

L.fst # of states 43484 # of arcs 50264

G.fst # of states 540723 # of arcs 2421839

LG.com.fst (fstcompose L.fst G.fst LG.com.fst) # of states 8040320 # of arcs 11991899

LG.epn.fst (fstepsnormalize LG.com.fst LG.epn.fst) # of states 7502435 # of arcs 12709467

Another question is, Can I use lookahead composition filter using shell command?

Thanks,

Log In

Shortest Path with constraints

RolandSchwarz - 01 Nov 2010 - 11:53

Hiya,

I was wondering if there is a way to compute a shortest path closest to a certain weight over the tropical sr. So instead of getting the absolute shortest path with the lowest total score I'd be interested in getting the path that is closest to a total score of e.g. 5.

Related to that is there a tropical semiring variant which orders paths according to their absolute values instead of their raw values and that is still distributive (i.e. shortest path still works)? So if I have paths with total weights -7, 1, 4, 6, 8 I'd like to have them ordered as 1, 4, 6, -7, 8, i.e. the path with a score of 1 should be the shortest path. Thanks for your help!

Cheers,

Roland

Log In

Compose FST

BlueSky - 29 Oct 2010 - 07:56

Hi, I am new with fst, I have the same problem. I am trying to compose 2fsts, the fist one is "cot" and the second one {cat, car,come}. I could not get any composition's output. I used the fstinfo, the input/output symbol table are none. Although, some arcs should be composed. Thanks a lot, Memo

CyrilAllauzen - 29 Oct 2010 - 11:22

Your description seems vague but it seems that the composition should be empty. fstcompose automatically connects the output so you should use --connect=false if you want to see the non coaccessible paths.

Cyril

BlueSky - 01 Nov 2010 - 12:41

Hi Cyril, thanks a lot i tried it and it works by using the fstcompose in the shell. I got only the connected arcs. If i want to use it inside my code is the function "Connect" has the same option? I used "Connect(fst)" after the Compose method but i could not see the output when i tried to drew the fst as i did before in the shell.

Does it work fine if i use such a fst(the result of the first composition) in further operations, such as compose it with other fst? because i am unable to draw or print it to make sure! I appreciate your support. Memo

IvanProvalov - 03 Feb 2011 - 12:54

I am trying to test a simple string input-output with FST. Is there an equivalent of lexfsmstrings to replace the last part of the command below:

lexcompre -l example.lab -s "test" | fsmcompose - test.fst | lexfsmstrings -l example.lab

IvanProvalov - 03 Feb 2011 - 20:31

I found what I was looking for: fstprint --acceptor --isymbols=example.lab --osymbols=example.lab | awk 'BEGIN {printf("\n")} {printf("%s ",$4)} END {printf("\n")}
Log In

Compact fst

KonstantinSelivanov - 22 Oct 2010 - 06:20

Hi,

Could compact fsts be composed, intersected etc? I couldn't find related methods. Did I missed something?

And I came across a strange behavior. When start state isn't defined it causes segfaults with some operations (concatenation, union).

PaulDixon - 22 Oct 2010 - 09:19

I've composed a compact_acceptor type fst with a olabel_lookahead fst using the shell tools and lookahead composition.

KonstantinSelivanov - 29 Oct 2010 - 12:49

Thanks Paul. I managed to do what I need.
Log In

Incremental redundancy reduction

SimonDobrisek - 19 Oct 2010 - 15:00

Please allow me to draw your attention to the paper http://uni-lj.academia.edu/documents/0152/7609/tsd200a.pdf

It seems that the incremental FST redundancy reduction considerably outperforms the usual batch optimization approach.

The demonstration implementation of the presented algorithm is available at http://luks.fe.uni-lj.si/spider/

We believe that the incremental algorithm could be generalized and implemented also as part of the OpenFST toolkit.

Please let me know if you need any additional explanation regarding this algorithm.

Kind Regards!

Simon Dobrišek

Log In

Forward / Backward variables

JasonTuskanov - 13 Oct 2010 - 09:49

Hi. I would like to compute the forward and backward variables (see L.R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 1989) for each state of a FST. I'm surprised I have found nothing relevant so far. Did I miss something existing in the library, or shall I just write the piece of code on my own ?

CyrilAllauzen - 13 Oct 2010 - 18:45

You missed something: shortest distance

MichaelRiley - 20 Oct 2010 - 17:57

Use ShortestDistance
Log In

'fst::Cast' : ambiguous call to overloaded function

HonzaSilovsky - 12 Oct 2010 - 04:43

Compiling a project that refers to openfst library (I'm using v. 1.2.3) in Visual Studio (2010 express) I was getting the following error: error C2668: 'fst::Cast' : ambiguous call to overloaded function

It turned out that the problem is the incorrect order of keywords in the function declaration; the original declaration in vector-fst.h looks like template void friend Cast(const F &, G *); while I guess it should look like template friend void Cast(const F &, G *);

after this change everything compiles allright.

CyrilAllauzen - 13 Oct 2010 - 14:38

Indeed, you're right. This will be fixed in the next release. Thanks.

Log In

Compiling openfst-1.2.3 with Visual C++ 2008 Express

HenrikKinnemann - 01 Oct 2010 - 07:02

Does anybody has tried to compile openfst-1.2.3 with Visual C++ 2008 Express yet? For me it's not possible to install Visual C++ 2010.

How can I compile under MS Windows with the following options "--enable-bin=yes --enable-compact-fsts=no --enable-const-fsts=no --enable-far=no --enable-lookahead-fsts=no --enable-pdt=no --with-icu=yes" ?

Log In

Lookahead Composition Filter with RhoMatcher

SashaRush - 22 Sep 2010 - 12:46

Is it possible to use one of the new lookahead filters on a composition with a Rho or other special matcher? It looks like the lookahead filter takes a lookahead matcher as an argument, but I'm guessing this is different than the matcher passed to ComposeFST.

Log In

unexpected composition result

CameronSmith - 22 Sep 2010 - 12:02

When I compose two cyclic, single-state, unweighted transducers I get an unexpected result. The composition is TA o TF = TB; where TA: http://j.mp/d1fDP9 , TF: http://j.mp/ddSOPy , TB: http://j.mp/amYjNt . However I would expect the output to be TBc: http://j.mp/b5dnNo . The code I have used to define and compose these transducers is copied below. Is there some mistake I have made in defining the transducers for this purpose or is there a problem with the way in which I have used the Compose() or other functions?

#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <string>
#include <cstring>

#include <fst/fstlib.h>
#include <fst/compact-fst.h>
#include <fst/extensions/far/farlib.h>



using namespace fst;


int main()
{
int fstnum = 3;
string fstnames[fstnum];

fstnames[0]="TA";
fstnames[1]="TF";
fstnames[2]="TB";
//------------------Generate [population] of FSTs--------------------//

// A vector FST is a general mutable FST
StdVectorFst TA;

// Adds state 0 to the initially empty FST and make it the start state.
TA.AddState();   // 1st state will be state 0 (returned by AddState)
TA.SetStart(0);  // arg is state ID

// Adds two arcs exiting state 0.
// Arc constructor args: ilabel, olabel, weight, dest state ID.
TA.AddArc(0, StdArc(0, 0, 0, 0));  // 1st arg is src state ID


TA.SetFinal(0,0); // SetFinal (StateID, weight)

//We can save this FST to a file with:
TA.Write("onestate/TA.fst");

StdVectorFst TF;

// Adds state 0 to the initially empty FST and make it the start state.
TF.AddState();   // 1st state will be state 0 (returned by AddState)
TF.SetStart(0);  // arg is state ID

// Adds two arcs exiting state 0.
// Arc constructor args: ilabel, olabel, weight, dest state ID.
TF.AddArc(0, StdArc(0, 1, 0, 0));  // 1st arg is src state ID
TF.AddArc(0, StdArc(1, 0, 0, 0));  // 1st arg is src state ID


TF.SetFinal(0,0); // SetFinal (StateID, weight)

//We can save this FST to a file with:
TF.Write("onestate/TF.fst");


//------------------randomly select 2 FSTs----------//
//--------------------Compose 2 FSTs----------------//

// The FSTs must be sorted along the dimensions they will be joined.
// In fact, only one needs to be so sorted.
// This could have instead been done for "model.fst" when it was created.
ArcSort(&TA, StdOLabelCompare());
ArcSort(&TF, StdILabelCompare());

// Container for composition result.
StdVectorFst result;

// Create the composed FST.
Compose(TA, TF, &result);

result.Write("onestate/TB.fst");


//---------compute and store interaction network complexity-------//

//---------print interaction network---------------//
int cmdtst = system(NULL);
if (cmdtst != 0 )
{
    for (int i=0; i<fstnum; i++)
    {

        string sh1="fstdraw onestate/.fst onestate/.dot";
        string sh2="dot -Tps onestate/.dot > onestate/.ps";
        string sh3="evince onestate/.ps &";
        string fstname = fstnames[i];
        sh1.insert(17,fstname); sh1.insert(33,fstname);
        sh2.insert(18,fstname); sh2.insert(36,fstname);
        sh3.insert(16,fstname);
        system(sh1.c_str()); system(sh2.c_str()); system(sh3.c_str());

    }

} else { cout << "No command interpreter available \n"; };

return 0;

}

CameronSmith - 22 Sep 2010 - 15:19

update: I apologize I see from the "conventions" section that I am using the epsilon transition label 0 when I do not intend to indicate an epsilon transition. Is there a way to redefine the epsilon symbol so that I can use the binary alphabet in my state transition labels?

MichaelRiley - 20 Oct 2010 - 17:56

no.
Log In

far weirdness

RichardSproat - 15 Sep 2010 - 21:44

This ought to be simple and just work out of the box, so why am I getting the following behavior:

rocotillo:testdata sproatr$ fstinfo TOKENIZER | sed 3q fst type vector arc type standard input symbol table none rocotillo:testdata sproatr$ farcreate TOKENIZER tokenizer.far rocotillo:testdata sproatr$ farinfo tokenizer.far ERROR: FstHeader::Read: Bad FST header: tokenizer.far: ERROR: ReadSTTableHeader: error reading file: tokenizer.far ERROR: Error reading FAR: tokenizer.far ERROR: GenericRegister::GetEntry : dlopen(-arc.so, 1): image not found FATAL: No operation found for "FarInfo" on arc type

So, the fst TOKENIZER is well-formed, "farcreate" apparently works, but whatever it's creating ain't happy.

CyrilAllauzen - 16 Sep 2010 - 14:38

There is a bug in FAR that only manifest itself on 32-bit machines. This can be fixed by changing line 79 of src/include/extensions/far/sttable.h to:

    WriteType(stream_, static_cast<int64>(positions_.size()));

This will be fixed in the next release. Thanks to Richard for his help on this!

Log In

unweighted transducers

CameronSmith - 15 Sep 2010 - 17:09

How would one construct and compose unweighted transducers? Is it necessary to write a BoolWeight class to do this and then create new type definitions:

typedef ArcTpl WeightlessArc; typedef VectorFst WlsVectorFst;

as respective analogs to:

typedef ArcTpl StdArc; typedef VectorFst StdVectorFst;

Is it possible and reasonable to set the kNotWeighted property to True for each transducer?

Is there a simpler and/or better way of accomplishing this?

As a disclaimer, I may have a fundamental misunderstanding of the weights themselves (or of other essential concepts involved for that matter).

Thanks.

CameronSmith - 21 Sep 2010 - 16:15

update: sorry for this comment i figured out what was happening
Log In

compilation needs -ldl

ChrDr - 15 Sep 2010 - 07:08

On my Arch Linux System the compilation only works if I specify LIBS="-ldl" ./configure --prefix=/usr but fails with ./configure --prefix=/usr

Shouldn't it compile also in the second case?

Log In

Random FST generation

CameronSmith - 14 Sep 2010 - 12:30

Is it possible to generate a random FST from the space of all possible (e.g. two-state) transducers given a particular input and output alphabet (e.g. {0,1}) and a constant weight (e.g. 1 or unweighted)?

If anyone could provide a brief skeleton for accomplishing this I would appreciate it.

Thanks.

Log In

Convert machine from FSM to FST

KonstantinSelivanov - 10 Sep 2010 - 12:29

I wonder if there is a way to convert FSM to FST format?

Thanks.

KonstantinSelivanov - 10 Sep 2010 - 12:33

upd: I mean from ATT FSM library format to OpenFST format.

AMorenoDaniel - 10 Sep 2010 - 14:32

They share the text representation. fsmprint your.fsm | fstcompile > your.fst

KonstantinSelivanov - 11 Sep 2010 - 08:58

Thanks Daniel.

But it seems they have some distinctions. E.g. when I create language model using grmcount/grmake there are some arcs with weight "Infinity". I didn't see such weights in openfst library. Or maybe I'm doing something wrong?

MichaelRiley - 20 Oct 2010 - 18:00

Should be the same. Post small example if not.
Log In

FAST_LOG_PROB_ARC_SELECTOR

AMorenoDaniel - 08 Sep 2010 - 13:01

Hi, I'm taking a look at the version 1.2.2, and I wonder if
FAST_LOG_PROB_ARC_SELECTOR
was accidentally included as option for fstrandgen.cc. It is not defined anywhere else as far as I see.

My first compilation succeeded on Cygwin after I commented out that option. Cheers

BusyBeaver - 08 Sep 2010 - 13:27

Had the same issue here. The macro is not defined anywhere in the tar ball

CyrilAllauzen - 09 Sep 2010 - 11:40

My bad. Last minute changes before the release not done carefully. FAST_LOG_PROB_ARC_SELECTOR should have been defined in the enum in script/randgen.h.

MichaelRiley - 09 Sep 2010 - 17:11

Fixed in version 1.2.3
Log In

Compiling 1.2.1 on Mac OSX

RichardSproat - 02 Sep 2010 - 17:09

Has anyone run into the following problem when trying to compile 1.2.1 out of the box on OSX?

g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT replace.lo -MD -MP -MF .deps/replace.Tpo -c replace.cc -fno-common -DPIC -o .libs/replace.o /usr/include/c++/4.0.0/tr1/hashtable: In member function 'typename std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::const_iterator std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::find(const Key&) const [with Key = int, Value = std::pair, Allocator = std::allocator<std::pair >, ExtractKey = Internal::extract1st<std::pair >, Equal = std::equal_to, H1 = std::tr1::hash, H2 = Internal::mod_range_hashing, H = Internal::default_ranged_hash, RehashPolicy = Internal::prime_rehash_policy, bool cache_hash_code = false, bool mutable_iterators = true, bool unique_keys = true]': ./../include/fst/replace.h:531: instantiated from 'bool fst::ReplaceFstImpl<A, T>::IsNonTerminal(typename A::Label) const [with A = fst::LogArc, T = fst::DefaultReplaceStateTable<fst::LogArc, ssize_t>]' ./../include/fst/replace.h:560: instantiated from 'size_t fst::ReplaceFstImpl<A, T>::NumInputEpsilons(typename A::StateId) [with A = fst::LogArc, T = fst::DefaultReplaceStateTable<fst::LogArc, ssize_t>]' ./../include/fst/fst.h:732: instantiated from 'size_t fst::ImplToFst<I, F>::NumInputEpsilons(typename I::Arc::StateId) const [with I = fst::ReplaceFstImpl<fst::LogArc, fst::DefaultReplaceStateTable<fst::LogArc, ssize_t> >, F = fst::Fst<fst::LogArc>]' replace.cc:43: instantiated from here /usr/include/c++/4.0.0/tr1/hashtable:1136: error: passing 'const std::tr1::hashtable<int, std::pair, std::allocator<std::pair >, Internal::extract1st<std::pair >, std::equal_to, std::tr1::hash, Internal::mod_range_hashing, Internal::default_ranged_hash, Internal::prime_rehash_policy, false, true, true>' as 'this' argument of 'typename std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::node* std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::find_node(Internal::hash_node<Value, cache_hash_code>*, const Key&, typename std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::hash_code_t) [with Key = int, Value = std::pair, Allocator = std::allocator<std::pair >, ExtractKey = Internal::extract1st<std::pair >, Equal = std::equal_to, H1 = std::tr1::hash, H2 = Internal::mod_range_hashing, H = Internal::default_ranged_hash, RehashPolicy = Internal::prime_rehash_policy, bool cache_hash_code = false, bool mutable_iterators = true, bool unique_keys = true]' discards qualifiers make[3]: * [replace.lo] Error 1

RichardSproat - 02 Sep 2010 - 19:01

BTW, I know about this:

http://openfst.cs.nyu.edu/twiki/bin/view/FST/CompilingOnMacOSX

did the relevant updates, and was fine compiling 1.1

CyrilAllauzen - 03 Sep 2010 - 11:56

I'll look into this but in the meantime I recommend you use the first work-around mentioned on the page: using gcc 4.2 from Fink instead of Apple's old gcc version.

CyrilAllauzen - 03 Sep 2010 - 16:50

Richard and I traced down the issue to an other bug in gcc 4.0 TR1 implementation. An updated hashtable header file is available at FST.CompilingOnMacOSX.
Log In

Application of RTNs?

KennethRBeesley - 01 Sep 2010 - 13:16

OpenFst provides the Replace() and ReplaceFst() operations that take an RTN (an Fst wherein some of the output labels are references to subnetworks, i.e. are non-terminals) and replaces each such reference with a copy of the referred-to subnetwork. The replacement operations appear to function well, turning an RTN into a normal Fst. Is there any support for applying an RTN to input without first performing a Replace()ment? Thanks.

CyrilAllauzen - 01 Sep 2010 - 15:38

ReplaceFst is a delayed representation. Copying it into a VectorFst will perform the replacement. Composing your input with the ReplaceFst is what you want.
Log In

Mindtuning for PDT?

KennethRBeesley - 01 Sep 2010 - 13:05

A colleague and I are trying to understand the experimental library for Push Down Transducers (PDTs) in OpenFst 1.2. What does this mean: "some transitions are open or close parentheses"? Could some kind soul post a small example showing what the transition labels really look like? Thanks

CyrilAllauzen - 01 Sep 2010 - 15:41

Each pair of opening and closing parentheses represents a stack symbol. The opening parenthesis corresponds to pushing that stack symbol on the stack, the closing parenthesis corresponds to poping that symbol from the stack.
Log In

pre-determinization

WeiwenLeung - 30 Aug 2010 - 06:19

Are there any tool in openfst that i can use to predeterminization any fst?

CyrilAllauzen - 01 Sep 2010 - 15:42

Unfortunately, I don't know of any such tool.
Log In

fstepsnormalize

WeiwenLeung - 27 Aug 2010 - 01:56

When calling fstepsnormalize for the composed fst for L and G, the memory increases regularly to the available size of my server (48GB) and killed by the Linux system. The sizes of my L and G fsts are 4MB and 31MB respectively.

is there anyone encounter the same problem?

Log In

fstcompose output

AliAzimizadeh - 20 Aug 2010 - 16:47

Hi, I have two fst file, both of them are right. when they are composed by fstcompose the output file size is 66k and when i use fstprint and convert it to text mode, there is nothing there and file is empty. can you give me any hint about this problem? Thanks so much Ali

AMorenoDaniel - 21 Aug 2010 - 15:04

You can use 'fstinfo' or traverse the fst using state/arcs iterators using c++ lib. Expose the guts and you'll know whats going on.

AliAzimizadeh - 21 Aug 2010 - 16:34

Thanks i found the problem

BlueSky - 29 Oct 2010 - 05:15

Hi, I am new with fst, I have the same problem. I am trying to compose 2fsts, the fist one is "cot" and the second one {cat, car,come}. I could not get any composition's output. I used the fst info, the input/output symbol table are none. Although, some arcs should be composed. Thanks a lot, Memo
Log In

Compiling openfst-1.2 on cygwin

AMorenoDaniel - 18 Aug 2010 - 17:48

Has anyone compiled openfst-1.2 (recently released) on cygwin?
As suggested a year ago for openfst-1.1, I did:
./configure CXX=g++-4.exe CC=gcc-4.exe --prefix=$HOME/local
make
#make install  # <--- I haven't made it here though.
Make aborts with the error message:
g++-4.exe -g -O2 -o fstarcsort.exe fstarcsort.o  -ldl ../lib/.libs/libfst.a ../script/.libs/libfstscript.a /usr/lib/gcc/i686-pc-cygwin/4.3.4/libstdc++.dll.a   -L/usr/lib/gcc/i686-pc-cygwin/4.3.4 -L/usr/lib/gcc/i686-pc-cygwin/4.3.4
../script/.libs/libfstscript.a(fst-class.o):fst-class.cc:(.text$_ZN3fst16CompatPropertiesEyy[fst::CompatProperties(unsigned long long, unsigned long long)]+0x143): undefined reference to `fst::PropertyNames'
~~~----ETCETERA----~~~
collect2: ld returned 1 exit status
make[3]: *** [fstarcsort.exe] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
so it looks as if the linker doesn't find the definition of CompatProperties.
I must add that the following warning occurs twice during Make:
libtool: link: warning: undefined symbols not allowed in i686-pc-cygwin shared libraries
Once here:
/bin/sh ../../libtool --tag=CXX   --mode=link g++-4.exe  -g -O2 -version-info 0:0:0  -o libfst.la -rpath /usr/local/lib compat.lo flags.lo fst.lo properties.lo symbol-table.lo util.lo
and once again here:
/bin/sh ../../libtool --tag=CXX   --mode=link g++-4.exe  -g -O2 -version-info 0:0:0  -o libfstscript.la -rpath /usr/local/lib arcsort.lo closure.lo compile.lo compose.lo concat.lo connect.lo convert.lo decode.lo determinize.lo difference.lo draw.lo encode.lo epsnormalize.lo equal.lo equivalent.lo fst-class.lo info.lo intersect.lo invert.lo map.lo minimize.lo print.lo project.lo prune.lo push.lo randequivalent.lo randgen.lo relabel.lo replace.lo reverse.lo reweight.lo rmepsilon.lo script-impl.lo shortest-distance.lo shortest-path.lo synchronize.lo text-io.lo topsort.lo union.lo weight-class.lo

DanielPovey - 19 Aug 2010 - 05:43

I am getting this problem too. Currently I am investigating whether adding -no-undefined to all the *_la_LDFLAGS variables in the Makefiles in the subdirectories will fix this problem. Dan

DanielPovey - 19 Aug 2010 - 06:06

BTW, this didn't work.

AMorenoDaniel - 19 Aug 2010 - 11:37

yeah, setting LDFLAGS="-no-undefined" took care of the warnings, but an error of similar nature remains.

DanielPovey - 19 Aug 2010 - 13:26

A co-worker suggested adding the following options to configure: --enable-static --disable-shared

DanielPovey - 20 Aug 2010 - 05:09

BTW, this didn't work. It seems it still tried to build the shared libraries. Somehow configure didn't correctly accept the options.

PaulDixon - 21 Aug 2010 - 00:21

I have most of version 1.2 compiling under VS2010, just the far and pushdown transducers are missing. I put the port so far here:

www.furui.cs.titech.ac.jp/~dixonp/openfstwin-1.2-210810.zip

It includes prebuilt 32-bit binaries that should also work under cygwin. It's all 1.2 code the 1.1 directory name still needs to be changed.

DanielPovey - 21 Aug 2010 - 05:10

Thanks, Paul. Following up on the cygwin issue: A co-worker was able to work-around this problem as follows by manually changing some of the compile commands. For example, Not sure what it is. It looks like some stuff is missing from the lib archives. But I was able to compile arcsort by hand, the others are probably the same. Instead of (from src/bin): /bin/sh ../../libtool --tag=CXX --mode=link g++ -g -O2 -lm -ldl -o fstarcsort.exe fstarcsort.o ../lib/libfst.la ../script/libfstscript.la I went to that dir and did by hand: /bin/sh ../../libtool --tag=CXX --mode=link g++ -g -O2 -lm -ldl -o fstarcsort.exe fstarcsort.o ../script/libfstscript.la ../lib/*.o ../lib/libfst.la

Log In

ReplaceFst() and testing for CyclicDependencies

KennethRBeesley - 17 Aug 2010 - 14:21

I've just started experimenting with ReplaceFst(). The following script defines three Fsts, with cyclic dependencies. The line

if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) { ...}

returns true, apparently detecting the cyclic dependencies correctly.

Question: Have I used the proper idioms for testing for cyclic dependencies and doing the actual replacement?

#include <fst/fstlib.h>
#include <iostream>

using namespace fst ;

int main() {
   std::cout << "Hello, world!\n" ;

   int a = 97 ;   // code point values
   int b = 98 ;

   // example with cyclic dependencies

   // StdVectorFst is a typedef for VectorFst<StdArc>
   StdVectorFst fst1, fst2, fst3 ;

   // define fst1, a subnetwork (will be arbitrarily associated with -1)
   fst1.AddState() ;     // return 0
   fst1.SetStart(0) ;
   fst1.AddState() ;   // returns 1
   fst1.AddState() ;   // returns 2
   fst1.SetFinal(2, StdArc::Weight::One()) ;
   fst1.AddArc(0, StdArc(a, a, StdArc::Weight::One(), 1)) ;
   // fst1 refers to fst2 (associated below with -2)
   fst1.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ;
   //         src        i  o          w             dest

   // define fst2, a subnetwork (will be arbitrarily associated with -2)
   fst2.AddState() ;   // return 0
   fst2.SetStart(0) ;
   fst2.AddState() ;   // return 1
   fst2.AddState() ;   // return 2
   fst2.SetFinal(2, StdArc::Weight::One()) ;
   fst2.AddArc(0, StdArc(b, b, StdArc::Weight::One(), 1)) ;
   // fst2 refers to fst1 (associated below with -1)
   fst2.AddArc(1, StdArc(0, -1, StdArc::Weight::One(), 2)) ;

   // define fst3, the base network (will be arbitrarily associated with -3)
   fst3.AddState() ;   // return 0
   fst3.SetStart(0) ;
   fst3.AddState() ;   // return 1
   fst3.AddState() ;   // return 2
   fst3.SetFinal(2, StdArc::Weight::One()) ;
   // add arcs with "references" to subnetworks
   // output-side -1 will be associated with fst1
   // output-side -2 will be associated with fst2
   // The label references to subnetworks are on the Output Side.
   // The corresponding Input-Side labels can be anything (here 0 for epsilon).
   fst3.AddArc(0, StdArc(0, -1, StdArc::Weight::One(), 1)) ;
   fst3.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ;
   //         src        i   o           w            dest

   // set up the vector associating arbitrary numbers with the three networks
   // -1 refers to fst1
   vector< pair< StdArc::Label, const Fst<StdArc> * > > pairlabelfsts ;
   pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-1, &fst1) ) ;
   // -2 refers to fst2
   pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-2, &fst2) ) ;
   // -3 is the base network
   pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-3, &fst3) ) ;
   
   StdArc::Label slabel = -3 ;      // specifies the "base" Fst, which is fst3

   fst3.Write("before.fst") ;      // the network with references

   // Using delayed ReplaceFst
   if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) {
      std::cout << "Has cyclic dependencies.\n" ;
   } else {
      std::cout << "No cyclic dependencies.\n" ; 
      StdVectorFst expandedFst ;   // the result
      // slabel (here -3) is used to find the base Fst supplied in pairlabelfsts
      expandedFst = ReplaceFst<StdArc>(pairlabelfsts, slabel) ;
      expandedFst.Write("intermediate.fst") ;   // the network just after ReplaceFst()
      RmEpsilon(&expandedFst) ;
      expandedFst.Write("after.fst") ;   // the network after RmEpsilon()
   }

   std::cout << "Good-bye, world!\n" ;
}

CyrilAllauzen - 17 Aug 2010 - 15:52

Yes, this looks correct to me.

Best, Cyril

Log In

Multithreading in OpenFST

AlbertPark - 09 Jul 2010 - 18:30

Hi,

I am currently using the OpenFST library, and I was thinking of multithreading my program, and I was curious as to whether the OpenFST library currently supports multithreading. Thank you!

Albert

AlbertPark - 21 Jul 2010 - 04:13

As I haven't gotten a response yet, this is a self-reply to my question.

I've gone through looking at the implementation and from what I can tell the OpenFST library currently is not multithreading safe. I've resorted to doing a deep copy of FSTs for each thread. Unfortunately, doing a deep copy was not as straightforward as I thought it would be, or at least I couldn't figure out how to do it elegantly and quickly (The FST I am reading in is very large, and trying to read in the FST multiple times takes much longer than I want). I tried doing a deep copy using Map() with the IdentityMapper as mentioned in a previous post, but found that this only does deep copies of the arcs (given the copy constructor of the Arc and Weight do deep copies), and does a shallow copy of the symbol table, which creates problems for multithreading (when multiple threads try to access the symbol table at once).

I'm not sure if there is an easier way to do this, but here is what I did to change the OpenFST code to support deep copying of the symbol table. I added a new enum code in map.h to MAP_SYMBOLS_ACTION as well as a couple of lines of code for supporting symbol table deep copying, added a function to do a deep copy of the symbol table to symbol-table.h. Finally, I added a Mapper almost identical to the identity mapper, but returns the new enum code for InputSymbolsAction() and OutputSymbolsAction().

Hope this helps anybody trying to multithread!

MichaelRiley - 09 Aug 2010 - 20:42

OpenFst is meant to be thread compatible/safe in the sense that if you use the method Copy(true) to copy FSTs then different threads can read the each of those. This is a 'deep' copy only as deep as needed for this to work. NOTE HOWEVER IT IS NECESSARY TO IMPLEMENT the classes in lock.h in a non-trivial way at your site. Do so and the behavior as described will work.

MichaelRiley - 09 Aug 2010 - 20:44

BTW, these comments may apply more to OpenFst 1.2 - very soon to be released. Don't remember how complete 1.1 was in this respect.
Log In

memory problem in fstcompose

AliAzimizadeh - 07 Jul 2010 - 09:18

Hi, i use openfst in HTS 2.1.1 . i just compile two fst by a symbol table in AT&T format:

0 #_f0 1 #_m0_s2 2 #_m0_s3 3 #_m0_s4 4 #_m0_s5 5 #_m0_s6 6 e$_f0 7 e$_m1_s2 8 e$_m1_s3 9 e$_m1_s4 10 e$_m1_s5 11 e$_m1_s6 12 . . . ow_f6037 170654 s$_f6037 170655 k_f6037 170656 u_f6037 170657 i_f6037 170658 s_f6037 170659 v_f6037 170660 h_f6037 170661 c$_f6037 170662 j_f6037 170663 z$_f6037 170664

its size is about 2.5Mb. then i run the below command successfully:

fstcompile --isymbols=a.sym --osymbols=a.sym tmp1.fst fstcompile --isymbols=a.sym --osymbols=a.sym tmp2.fst

then i want to compose them and run this:

fstcompose tmp2.fst tmp3.fst tmp4.fst

when the symbol number is little the composition is done but in some symbol files the symbol number is very large, for example 170655. in this case, the memory is fulled rapidly and the CPU doesn't work any more. this is my system properties: CPU: Pentium 4 Dual core 2.26GHz Ram: 4G Swap: 15G OS: CentOS 5.3

could you please give me suggestions about the case? Thanks a lot -Ali

PaulDixon - 08 Jul 2010 - 21:52

Have you tried arc sorting the fsts before composing?

fstarcsort -sort_type=olabel tmp2.fst > sort2.fst

fstarcsort -sort_type=ilabel tmp3.fst > sort3.fst

fstcompose sort2.fst sort3.fst > tmp4.fst

AliAzimizadeh - 09 Jul 2010 - 06:37

I tested your solution, but the problem remains yet. i think my fst model is big. the size of tmp2.fst and tmp3.fst are 147MB and 1.5M. Do you think my Ram memory enough to this process? if your answer is no, approximately, how much Ram does need ? Thanks for your attention -Ali

WeiwenLeung - 27 Aug 2010 - 01:33

i encounter almost the same problem. when fstepsnormalize running, it takes more than 50GB, and killed by the system. the gram fsm 31MB, lexicon fsm 4MB.

did you figure out what the problem was ?

Log In

Composition and ShortestPath

ChristopherKermorvant - 22 Apr 2010 - 12:20

Hi,

I make a composition of 3 transducers F o G o H and I extract the ShortedPath on the resulting transducer. Is there a way to have access to the arcs in F, G and H corresponding to the arcs in the ShortestPath transducer ? I would like to have access to the output symbols of the arcs in F for example.

thank you,

-- Chris

DoganCan - 27 Apr 2010 - 09:59

Hi Chris,

There may be better solutions to your problem provided by the library but a quick solution is:

1. remove the weights of the shortestpath fst, S = shortest(F o G o H), and project it onto the input and output labels

fstmap -map_type=rmweight s.fst | fstproject > s.in.fst
fstmap -map_type=rmweight s.fst | fstproject -project_output > s.out.fst

2. compose shortest input with F and project the result onto the output labels to get the candidate paths in F

compose shortest output with H and project the result onto the input labels to get the candidate paths in H

fstcompose s.in.fst f.fst | fstproject -project_output > f.out.fst
fstcompose h.fst s.out.fst | fstproject > h.in.fst

3. compose the candidate paths with G (F_out o G o H_in) to obtain the paths in G that survive during the composition

fstcompose f.out.fst g.fst | fstcompose - h.in.fst > internal.fst

4. finally apply shortestpath to this fst to obtain the path in G internal to the shortest path in F o G o H

fstshortestpath internal.fst > shortest.internal.fst

This final transducer should provide the internal path you need. Note that the weights of this transducer are those obtained during the composition not the original weights of G.

Best, Dogan

RolandSchwarz - 13 May 2010 - 09:47

Hi, I'm also struggling with this and unfortunately the solution you provided doesn't seem to be working for me. Let's say F and H are strings of length 1000 and G is a simple two-state transducer. I'm interested in the internal path between the two states of the transducer G within S. In your solution f.out.fst is still of length 1000 and so is h.in.fst. It seems natural to me, that the composition in step 3 therefore leads to another large transducer and the shortestpath in step 4 is just as complex as the original transducer S. But I'm not totally sure if this is maybe even a different problem. Best, Roland

DoganCan - 14 May 2010 - 18:05

Hi Roland,

I am not entirely sure that I got your problem. I mean the internal path in G should be as complex as S.

Can you post a simple example where the above procedure fails?

Best, Dogan

RolandSchwarz - 15 May 2010 - 04:49

Hm, I probably misunderstood you. Take my example from above. I compare two sequences using a transducer, so F and H are acceptors, G is a transducer. Now I am not only interested in the shortest path in terms of input and output symbols and the shortest distance, but also in the state path within transducer G that lead to that distance. All the fstshortestpath gives me, is the shortest path with states numbered from 1 to 1000, but I would be interested in which of those steps went through state 1 in transducer G and which through state 2 (or how many states the transducer has). This is not evident from the output of fstshortestpath if the transducer G is not deterministic. I hope that clears things up a bit, I thought this was what you were referring to when talking about "internal paths". Best,

Roland

Log In

transducing phoneme lattices using WFSTs

BusyBeaver - 03 Apr 2010 - 11:38

Dear Community,

I'd like to convert phoneme lattices into WFSTs and use a grammar transducer to obtain a word lattice out of it. The thing is, that I want to keep additional information e.g. time stamps, acoustic scores, etc. while doing so. Note, that I do not want to obtain any likelihoods on paths in the first place. Instead, I would traverse the final lattice using those accumulated values to determine a likelihood for each transition and user-defined scaling factors myself.

I see two possibilities to keep track on times/scores - by storing them on arcs or using the string semiring to push just arbitrary information in the weights. When I use the string semiring, I could simply push there as much as I want to,and do a composition with my grammar. But then I am somewhat restricted: composition is not supported, because the string semiring is not commutative. I was thinking to rewrite the weighted composition algorithm from the paper, just change the weighting part, since I do not care about the order of elements within my strings anyways. But maybe I could keep my input fst as log/tropical semiring type and attach my scores/time stamps in the arcs. Would it then be possible to use the existing composition algorithm and still copy my attached information into the resulting FST? Maybe it is all much easier than what I thought of here right now, then I'd be glad if you give me some hints smile Thanks a lot in advance, Stefan

Log In

Probabilities as weights

NoChips - 17 Mar 2010 - 16:35

Hi

As far as I know the default arcs StdArc is using Tropical weight. How can I change that into Probabilty weight, so that transition weights are multiplied and not summed up?

I was looking for something predefined, but couldn't find anything...

Thank you in advance!

PaulDixon - 18 Mar 2010 - 04:14

You could use the LogArc type and convert to probabilities when needed.

There is a counting utility that includes a real semiring implementation and arc registration with the standard command line tools [here][1]

[1]: http://www.furui.cs.titech.ac.jp/~dixonp/wfstcount.zip]

Log In

Deleting arcs in an Fst

KennethRBeesley - 05 Mar 2010 - 19:37

Assume that I iterate through the states and arcs of an Fst, using the available iterators. Based on values of fields in the arcs (ilabel, olabel) I need to delete some particular arcs. How can I do that safely?

KennethRBeesley - 12 Mar 2010 - 16:46

I tried the following: 1) I called AddState() to add a new "sink" state to the FST, 2) In the Mutable Arc Iterator, for all the arcs to be deleted, I reset the nextstate to the sink state, then 3) I called Connect(). As the sink state is non-final, and has no exit arcs, Connect() deletes the sink state and all the arcs leading to it. It seems to work.

CyrilAllauzen - 17 Aug 2010 - 15:56

Indeed this would work. If you use VectorFst you could also use the DeleteStates method instead of Connect to delete the "sink" state.

JoachimChau - 01 Jun 2011 - 14:58

hi, sorry to bother you, but how to reset a arc's nextstate to the sink state? For example, the original arc is (sate1, state2), how to reset it to (state1, "sink")?
Log In

arcsum.h and arcmerge.h minor bug?

PhilSours - 05 Mar 2010 - 17:29

In ArcSumCompare and ArcMergeCompare, the 4th line: if (x.olabel < y.olabel) return false; should probably be: if (x.olabel > y.olabel) return false;

Log In

Memory footprint of an Fst

KennethRBeesley - 22 Feb 2010 - 15:16

Given a pointer to an Fst, is there an easy way to calculate its total memory footprint?

Log In

hbka reference paper

NewestUser - 26 Jan 2010 - 05:05

Hi, I have been reviewing the hbka paper referenced on the background material section of the site,

http://www.cs.nyu.edu/~mohri/postscript/hbka.pdf

and the figure for the minimization example, pg. 8, fig. 5(c) seems quite wrong. particularly the 'minimized' version is not equivalent to the original transducer. in the other reference

http://www.cs.nyu.edu/~mohri/pub/csl01.pdf

an identical transducer, albeit with different weights on a couple of the arcs, appears very differently in its minimized form.

i verified that the second version referenced in the latter link produces the expected output but it would be nice to confirm this, or alternatively to understand why they are different.

CyrilAllauzen - 26 Jan 2010 - 18:44

There is a "typo" in Fig 5(c). There should have been transitions labeled by d (with weight 4) and e (with weight 5) from state 0 to state 1.

NewestUser - 26 Jan 2010 - 22:28

thanks for the clarification. i was getting worried that i was very confused about something.
Log In

Problems with fstshortestpath

JoeWang - 15 Jan 2010 - 03:56

HI, all. I run fstshortestpath on a toy composed WFST, however I got the error message:

ERROR: FstMainRegister::GetMain: log-arc.so: cannot open shared object file: No such file or directory ERROR: fstshortestpath: Bad or unknown arc type "log" for this operation (ShortestPathMain)

I have checked my $LD_LIBRARY_PATH, it already includes /usr/local/lib, how come it still cannot find the shared object? Thanks in advance

CyrilAllauzen - 15 Jan 2010 - 13:41

Hi,

The issue is that the shortest path algorithm does not work over the log semiring.

Hence support for LogArc is not compiled into the fstshortestpath utility and the code tries to locate a shared object containg the definition of that arc type.

Best, Cyril

Log In

Is it possible to use fstcompile to construct an acceptor instead of a transducer?

JoeWang - 12 Jan 2010 - 22:09

Hi, all, Is it possible to use fstcompile to construct an acceptor (i.e. symbol/weight) instead of a transducer (i.e. ilabel:olabel/weight)? Thanks in advance.

PaulDixon - 13 Jan 2010 - 00:46

Use the -acceptor=true command line switch

JoeWang - 13 Jan 2010 - 02:43

Thanks, Paul.
Log In

Missing installation files for creating a user-defined arc type?

PaulDixon - 12 Jan 2010 - 01:11

When creating a new arc type some of the files (the *-main.h files) needed to compile the shared object don't seem to be copied as part of the installation. A workaround is to add compiler include path to the files in the openfst-1.1/ src/bin/ dir.

For the shared object to be detected and loaded a name of the form arc-type.so worked but arc_type.so did not.

I put sample code for a real semiring arc type here

KeithPonting - 19 Nov 2010 - 07:41

Looks good, but with openfst 1.2.5 there do not seem to be any *-main.h files... it is possible that REGISTER_FST_OPERATIONS (namespace fst::script) has replaced REGISTER_FST_MAINS, but I have so far failed to find a combination which works when using the resultant shared library with fstprint. (Building using gcc 4.1.2 on OpenSusE 10.2)

CyrilAllauzen - 19 Nov 2010 - 11:22

Indeed, the registration mechanism was change in 1.2 and the *-main.h files are gone.

The contain of my-arc.cc (the source for my-arc.so) should look like:

#include "nlp/fst/lib/fst-inl.h"
#include "nlp/fst/lib/const-fst-inl.h"
#include "nlp/fst/script/register.h"
#include "nlp/fst/script/fstscript.h"
#include "my-arc.h"
namespace fst {
namespace script {

REGISTER_FST(VectorFst, MyArc);
REGISTER_FST(ConstFst, MyArc);
REGISTER_FST_CLASSES(MyArc);

REGISTER_FST_OPERATIONS(MyArc);

}
}

Hope this will work for you.

Cyril

KeithPonting - 30 Nov 2010 - 05:17

Thankyou for getting back to me so promptly. I am afraid above does not work for me as I do not seem to have any inl.h files in either the installed directories or the downloaded sources or the fst build directories ...

CyrilAllauzen - 30 Nov 2010 - 15:36

This is a copy paste mistake. You just need to delete the -inl 's, the lib/ 's and the nlp/ 's. This gives:

#include <fst/fst.h>
#include <fst/const-fst.h>
#include <fst/script/register.h>
#include <fst/script/fstscript.h>

KeithPonting - 01 Dec 2010 - 05:39

Eureka! It works. Thankyou.
Log In

Problems on Ubuntu Linux 9.10

JoeWang - 11 Jan 2010 - 04:40

Hi, I have successfully maked openfst-1.1 on Ubuntu Linux 9.10. However, when I use fstcompile to run the first example, I got the error message from the shell: "fstcompile: error while loading shared libraries: libfst.so.0: cannot open shared object file: No such file or directory" which shows that libfst.so.0 doesn't exist, I checked under /usr/local/lib, it does exist. I am quite confused, can somebody help me? Thanks very much

CyrilAllauzen - 11 Jan 2010 - 12:05

Try adding /usr/local/lib to your LD_LIBRARY_PATH environment variable:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
Best, Cyril

JoeWang - 12 Jan 2010 - 02:28

Thanks, Cyril. It works
Log In

Relabel() and epsilon Properties

KennethRBeesley - 04 Jan 2010 - 15:50

I think that I've found a problem with Relabel(), when hard symbols (stored on arcs as non-zero integers) are relabeled with epsilons (stored on arcs as zeros). E.g. I tried

vector < pair<StdArc::Label, StdArc::Label> > vect ; vect.push_back(pair<StdArc::Label, StdArc::Label> ( (StdArc::Label) 5, (StdArc::Label) 0 )) ; Relabel(fstp, vect, vect) ;

where fstp is a pointer to a StdVectorFst. ilabels and olabels originally labeled 5 are relabeled correctly as 0, possibly introducing input-side, output-side and double-sided epsilons; but (here's the problem) the Properties kIEpsilons, kNoIEpsilons, kOEpsilons, kNoOEpsilons, kEpsilons and kNoEpsilons are not updated.

At least that's what seems to be happening.

KennethRBeesley - 04 Jan 2010 - 18:03

Please ignore the previous message. The error was mine.
Log In

Access control:

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2014-04-21 - CyrilAllauzen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback