- OpenFst Forum 2010 Archive
- On the fly composition
- About FST recognition and transduction algorithm
- LoG composition eps normalization runs out of memory
- N-gram model computation
- Thread safety
- Shell command: Print all words / create FST from word list
- Edit Distance- compose 3 fst
- When is it "safe" to call Synchronize() ?
- Unable to minimize already determinized transducer
- Pre-Determinization Algorithm
- FST Determinization Question
- compiler warning comparison signed and unsigned
- Determinize problem and Composition filters
- Shortest Path with constraints
- Compose FST
- Compact fst
- Incremental redundancy reduction
- Forward / Backward variables
- 'fst::Cast' : ambiguous call to overloaded function
- Compiling openfst-1.2.3 with Visual C++ 2008 Express
- Lookahead Composition Filter with RhoMatcher
- unexpected composition result
- far weirdness
- unweighted transducers
- compilation needs -ldl
- Random FST generation
- Convert machine from FSM to FST
- FAST_LOG_PROB_ARC_SELECTOR
- Compiling 1.2.1 on Mac OSX
- Application of RTNs?
- Mindtuning for PDT?
- pre-determinization
- fstepsnormalize
- fstcompose output
- Compiling openfst-1.2 on cygwin
- ReplaceFst() and testing for CyclicDependencies
- Multithreading in OpenFST
- memory problem in fstcompose
- Composition and ShortestPath
- transducing phoneme lattices using WFSTs
- Probabilities as weights
- Deleting arcs in an Fst
- arcsum.h and arcmerge.h minor bug?
- Memory footprint of an Fst
- hbka reference paper
- Problems with fstshortestpath
- Is it possible to use fstcompile to construct an acceptor instead of a transducer?
- Missing installation files for creating a user-defined arc type?
- Problems on Ubuntu Linux 9.10
- Relabel() and epsilon Properties

The whole process is quite slow (10s) and I notice that the composition took about 5s.

I try to use on the fly composition. But then the composition is very fast, but the shortest path become even slower : 19.63s !! I use the default options.

Did I miss something ? How can I optimize my workflow ?

Thanks, Sergei

You can try something like this:

// suppose you have StdVectorFst fst_a, fst_b

StdVectorFst fst_c; // to store result vector<StdArc::Weight> distance; int nshortest = 1;

NaturalShortestFirstQueue<StdArc::StateId, StdArc::Weight> state_queue(distance);

ShortestPathOptions<StdArc, NaturalShortestFirstQueue<StdArc::StateId, StdArc::Weight>, AnyArcFilter

ComposeFst

I need to do some text normalization work, eg, transforms "a 10% rise" to "a ten precent rise", I have two problems:

1) Mark the start and end: Given the sentence (eg. "a 10% rise") and a FST (eg. FST recognizes percentage), how can I get the sub-sentence (mark the start and end in the original sentence)the FST accepts?

2) transduction algorithm: Given an input string, provide the output string(s) as defined by the regular relation provided by an FST.

Thanks,

Joachim

The grammar FST has : # of states 23558 # of arcs 600034

The lexicon FST has : # of states 183757 # of arcs 210297

I am doing fstarcsort and fstdeterminize on G and only fstclosure on L.

When I do fstcompose L.fst G.fst |fstepsnormalize|fstdeterminize, the fstepsnormalize step eatsup 16GB of memory and very quickly and it just fails after sometime. When I reduce the dictionary size by about half, everything runs successfully. Is there a reason why I am using up so much memory compared what is given in the tutorial for what seems to be bigger task ? Please point me to anything I might be doing wrong here.

Thanks, Jagadeesh

Did you try omitting the epsilon normalization step?

Also you will probably find the composition of L and G is faster if you sort the output of L instead of (or as well as) the input of G.

is it possible to compute fst n-gram representation using openfst (or openkernel)?

It's not so hard to compute n-gram model using openfst without smoothing. But I need a smoothing technique so it's a problem for me.

Thanks.

`fst/lock.h`

to make it thread-safe and prevent crashes:

//int Incr() const { return ++count_; } //int Decr() const { return --count_; } int Incr() const { return __sync_add_and_fetch(&count_, 1); } int Decr() const { return __sync_sub_and_fetch(&count_, 1); }

These functions are available in GCC 4.1 or higher. Otherwise use mutex locks there.

Just wanted to share this info, in case other people needs threads too. Mike tells me something like that is used internally at Google but it's not included in the tarball for better code portability.

And is there an easy way on the shell for the other way around: To create an FST from a list of words?

Something like the following (for acyclic only!):
...
vector

And the second question. I don't know easy way either. But you can construct it "by hand".

http://code.google.com/p/phonetisaurus/source/browse/

see FstPathFinder.cpp and FstPathFinder.hpp .

`fstinfo your_fst | grep epsilons`

on each of your 3 Fsts and post the output below (between `verbatim`

tags preferably)?
Cyril

Begin fst1 fstinfo fst-in.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 0 input/output epsilons n input epsilons n output epsilons n End fst1

Begin fst2 fstinfo ed-loop.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 3 # of output epsilons 3 input/output epsilons n input epsilons y output epsilons y End fst2

Begin fst3 fstinfo fst-corr.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 0 input/output epsilons n input epsilons n output epsilons n End fst3

Begin fst-composition-3fsts fstinfo fst_ed_comp2.fst | grep epsilons # of input/output epsilons 8 # of input epsilons 8 # of output epsilons 8 input/output epsilons y input epsilons y output epsilons y End fst-composition-3fsts

`fst_ed_comp2.fst`

?

step 2)compose(fst_ed_comp1,fst3) repeat the same sequence of operations return fst_ed_comp2

`a:epsilon`

corresponding to deletions into input/output epsilon transitions.

As mentionned in FST.SynchronizeDoc, the condition for Synchronize to terminate is that the transducer has bounded delay, which is always the case for acyclic (the delay being at most the number of states in that case).

In the cyclic case, the bounded delay condition is equivalent to the condition that the delay of every cycle has to be zero, i.e. for every cycle *c*, the length of the input of *c* is equal to the length of the output of *c*.

I would like to ask about 2 things. First does anyone know a method in openfst which prints the fst with the characters not ascii ?

I have a problem with the minimization function, i created an fst with 60,000 words. The fst is created and determinized, but then the minimize function does not finish??

I appreciate the help :), Cheers Meme

I determinized successfully my transducer (HC o L o G) by first transforming it to an acceptor, and then tried to minimize it, however the minimization process never ends (I stop it after about an hour, note that the determinization of this transducer takes about a second). The transducer is not that big, about 50k states and 95k transitions. It is a weighted and cyclic acceptor, does anybody know what is going on?

thanks so much for your help

Which version of the library are you using? There was a hashing bug in older versions that could lead to some inefficiency in Minimize.

Which semiring is the automaton over? Does fstshortestdistance terminate on this machine?

Best, Cyril

I'm using version: 1.2.4, should I try the latest one? The automaton is over the log semiring (I specify --arc_type=log on fstcompile, which I believe is the way to do it) No, fstshortestdistance does not finish, and fstpush does not finish either.

thanks so much for your help Cyril, you are providing a great service

Daniel

The issue is that the shortest distance does not converge in the log semiring when the automaton is not stochastic. You need to convert you machine to the tropical semiring, apply minimization and then convert back to the log.

Cyril

thanks so much for your help again

Daniel

thanks so much for your help, this is a great library

Daniel

Does anybody know if there is an implementation (in openfst or other library) of the pre-determinization algorithm described in the paper "An efficient pre-determinization algorithm" by Allauzen and Mohri? I would really appreciate if someone has already implemented this algorithm and can share with us.

Please note that "fsmdeterminize -l" and "fstdeterminize --subsequential_label" don't implement this.

Thanks

Best, Cyril

Does fstdeterminize expect the input transducer to be epsilon normalized? When I give a specific functional transducer (not epsilon normalized), fstdeterminize terminates but the resulting transducer is not deterministic (due to two input epsilon arcs leaving the same state). When I determinize it twice, the result is then deterministic. An example functional fst is:

0 1 a x 0 2 a y 1 3 b eps 2 3 c z 2 3 d w 3 4 eps a 4 5 s eps 3 5

The resulting transducer after determinization is:

0 1 a eps 1 2 b x 1 3 c y 1 4 d y 2 5 eps a 2 3 6 eps z 3 7 eps z 4 6 eps w 4 7 eps w 5 8 s eps 6 8 s a 7 8Specifically in this case, I don't understand why the determinization algorithm creates an extra final state (7), while it can just make the state 6 final. Is this an expected behaviour?

Thanks

`--subsequential_label`

option to specify an input label other than epsilon for the transition form 3 to 7 leading to a machine that is indeed deterministic.
In your example, it would indeed be enough to make 6 final but in general this is difficult to determine since there might be other paths to 6 that would be affected by making 6 final.

If you do not want to use the `--subsequential_label`

, an other solution to obtain a deterministic transducer would be to subsequently apply to the output of determinization: Encode (labels),
Determinize, Decode.

Best, Cyril

Hasim

Thankyou for making OpenFst available - I have happily downloaded and built it, but using from my own code with gcc (4.3.2 or 4.1.2) and -Wall (which is our standard) I get many warnings about signed/unsigned comparisons. I know I can eliminate these with an extra -Wno-sign-compare switch, but I prefer code to compile as cleanly as possible (and need it to do so in order to persuade others in the group to adopt the library). Have you any plans to make OpenFst signed-ness clean?

To reproduce just add -Wall to compilation of fst_test

cd openfst-1.2.5/src/test g++ -DHAVE_CONFIG_H -I./../include -D_REENTRANT -g -O2 -MT fst_test.o -MD -MP -MF .deps/fst_test.Tpo -c -o fst_test.o fst_test.cc -Wall

I want to do composition of L, G. But Determnize is not terminate.

My shell command is like below; fstcompose L.fst G.fst LG.com.fst fstepsnormalize LG.com.fst LG.epn.fst fstdeterminize LG.epn.fst LG.det.fst fstencode --encode_labels LG.det.fst codex LG.encode.fst fstminimize LG.encode.fst LG.min.fst fstencode --decode LG.min.fst codex LG.decode.fst fstarcsort LG.decode.fst LG.final.fst

I did arcsort input lable of G.fst, output label of L.fst. And G was generated from 1.3M entry trigrams.

Is there any mistake in my script?

When i use openfst v1.0, determinize is complete. And, If i use at&t fsmdeterminize, it is also working well. Do I need additional argument or optional parameter?

Below is my fst size.

L.fst # of states 43484 # of arcs 50264

G.fst # of states 540723 # of arcs 2421839

LG.com.fst (fstcompose L.fst G.fst LG.com.fst) # of states 8040320 # of arcs 11991899

LG.epn.fst (fstepsnormalize LG.com.fst LG.epn.fst) # of states 7502435 # of arcs 12709467

Another question is, Can I use lookahead composition filter using shell command?

Thanks,

I was wondering if there is a way to compute a shortest path closest to a certain weight over the tropical sr. So instead of getting the absolute shortest path with the lowest total score I'd be interested in getting the path that is closest to a total score of e.g. 5.

Related to that is there a tropical semiring variant which orders paths according to their absolute values instead of their raw values and that is still distributive (i.e. shortest path still works)? So if I have paths with total weights -7, 1, 4, 6, 8 I'd like to have them ordered as 1, 4, 6, -7, 8, i.e. the path with a score of 1 should be the shortest path. Thanks for your help!

Cheers,

Roland

`fstcompose`

automatically connects the output so you should use `--connect=false`

if you want to see the non coaccessible paths.
Cyril

Does it work fine if i use such a fst(the result of the first composition) in further operations, such as compose it with other fst? because i am unable to draw or print it to make sure! I appreciate your support. Memo

lexcompre -l example.lab -s "test" | fsmcompose - test.fst | lexfsmstrings -l example.lab

Could compact fsts be composed, intersected etc? I couldn't find related methods. Did I missed something?

And I came across a strange behavior. When start state isn't defined it causes segfaults with some operations (concatenation, union).

It seems that the incremental FST redundancy reduction considerably outperforms the usual batch optimization approach.

The demonstration implementation of the presented algorithm is available at http://luks.fe.uni-lj.si/spider/

We believe that the incremental algorithm could be generalized and implemented also as part of the OpenFST toolkit.

Please let me know if you need any additional explanation regarding this algorithm.

Kind Regards!

Simon Dobrišek

It turned out that the problem is the incorrect order of keywords in the function declaration; the original declaration in vector-fst.h looks like
template

after this change everything compiles allright.

How can I compile under MS Windows with the following options "--enable-bin=yes --enable-compact-fsts=no --enable-const-fsts=no --enable-far=no --enable-lookahead-fsts=no --enable-pdt=no --with-icu=yes" ?

#include <iostream> #include <fstream> #include <stdlib.h> #include <string> #include <cstring> #include <fst/fstlib.h> #include <fst/compact-fst.h> #include <fst/extensions/far/farlib.h> using namespace fst; int main() { int fstnum = 3; string fstnames[fstnum]; fstnames[0]="TA"; fstnames[1]="TF"; fstnames[2]="TB"; //------------------Generate [population] of FSTs--------------------// // A vector FST is a general mutable FST StdVectorFst TA; // Adds state 0 to the initially empty FST and make it the start state. TA.AddState(); // 1st state will be state 0 (returned by AddState) TA.SetStart(0); // arg is state ID // Adds two arcs exiting state 0. // Arc constructor args: ilabel, olabel, weight, dest state ID. TA.AddArc(0, StdArc(0, 0, 0, 0)); // 1st arg is src state ID TA.SetFinal(0,0); // SetFinal (StateID, weight) //We can save this FST to a file with: TA.Write("onestate/TA.fst"); StdVectorFst TF; // Adds state 0 to the initially empty FST and make it the start state. TF.AddState(); // 1st state will be state 0 (returned by AddState) TF.SetStart(0); // arg is state ID // Adds two arcs exiting state 0. // Arc constructor args: ilabel, olabel, weight, dest state ID. TF.AddArc(0, StdArc(0, 1, 0, 0)); // 1st arg is src state ID TF.AddArc(0, StdArc(1, 0, 0, 0)); // 1st arg is src state ID TF.SetFinal(0,0); // SetFinal (StateID, weight) //We can save this FST to a file with: TF.Write("onestate/TF.fst"); //------------------randomly select 2 FSTs----------// //--------------------Compose 2 FSTs----------------// // The FSTs must be sorted along the dimensions they will be joined. // In fact, only one needs to be so sorted. // This could have instead been done for "model.fst" when it was created. ArcSort(&TA, StdOLabelCompare()); ArcSort(&TF, StdILabelCompare()); // Container for composition result. StdVectorFst result; // Create the composed FST. Compose(TA, TF, &result); result.Write("onestate/TB.fst"); //---------compute and store interaction network complexity-------// //---------print interaction network---------------// int cmdtst = system(NULL); if (cmdtst != 0 ) { for (int i=0; i<fstnum; i++) { string sh1="fstdraw onestate/.fst onestate/.dot"; string sh2="dot -Tps onestate/.dot > onestate/.ps"; string sh3="evince onestate/.ps &"; string fstname = fstnames[i]; sh1.insert(17,fstname); sh1.insert(33,fstname); sh2.insert(18,fstname); sh2.insert(36,fstname); sh3.insert(16,fstname); system(sh1.c_str()); system(sh2.c_str()); system(sh3.c_str()); } } else { cout << "No command interpreter available \n"; }; return 0; }

rocotillo:testdata sproatr$ fstinfo TOKENIZER | sed 3q fst type vector arc type standard input symbol table none rocotillo:testdata sproatr$ farcreate TOKENIZER tokenizer.far rocotillo:testdata sproatr$ farinfo tokenizer.far ERROR: FstHeader::Read: Bad FST header: tokenizer.far: ERROR: ReadSTTableHeader: error reading file: tokenizer.far ERROR: Error reading FAR: tokenizer.far ERROR: GenericRegister::GetEntry : dlopen(-arc.so, 1): image not found FATAL: No operation found for "FarInfo" on arc type

So, the fst TOKENIZER is well-formed, "farcreate" apparently works, but whatever it's creating ain't happy.

`src/include/extensions/far/sttable.h`

to:

WriteType(stream_, static_cast<int64>(positions_.size()));

This will be fixed in the next release. Thanks to Richard for his help on this!

typedef ArcTpl

as respective analogs to:

typedef ArcTpl

Is it possible and reasonable to set the kNotWeighted property to True for each transducer?

Is there a simpler and/or better way of accomplishing this?

As a disclaimer, I may have a fundamental misunderstanding of the weights themselves (or of other essential concepts involved for that matter).

Thanks.

Shouldn't it compile also in the second case?

If anyone could provide a brief skeleton for accomplishing this I would appreciate it.

Thanks.

Thanks.

But it seems they have some distinctions. E.g. when I create language model using grmcount/grmake there are some arcs with weight "Infinity". I didn't see such weights in openfst library. Or maybe I'm doing something wrong?

FAST_LOG_PROB_ARC_SELECTORwas accidentally included as option for fstrandgen.cc. It is not defined anywhere else as far as I see.

My first compilation succeeded on Cygwin after I commented out that option. Cheers

`FAST_LOG_PROB_ARC_SELECTOR`

should have been defined in the `enum`

in `script/randgen.h`

.

g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT replace.lo -MD -MP -MF .deps/replace.Tpo -c replace.cc -fno-common -DPIC -o .libs/replace.o
/usr/include/c++/4.0.0/tr1/hashtable: In member function 'typename std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::const_iterator std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::find(const Key&) const [with Key = int, Value = std::pair***** [replace.lo] Error 1

http://openfst.cs.nyu.edu/twiki/bin/view/FST/CompilingOnMacOSX

did the relevant updates, and was fine compiling 1.1

`hashtable`

header file is available at FST.CompilingOnMacOSX.

is there anyone encounter the same problem?

As suggested a year ago for openfst-1.1, I did:

./configure CXX=g++-4.exe CC=gcc-4.exe --prefix=$HOME/local make #make install # <--- I haven't made it here though.Make aborts with the error message:

g++-4.exe -g -O2 -o fstarcsort.exe fstarcsort.o -ldl ../lib/.libs/libfst.a ../script/.libs/libfstscript.a /usr/lib/gcc/i686-pc-cygwin/4.3.4/libstdc++.dll.a -L/usr/lib/gcc/i686-pc-cygwin/4.3.4 -L/usr/lib/gcc/i686-pc-cygwin/4.3.4 ../script/.libs/libfstscript.a(fst-class.o):fst-class.cc:(.text$_ZN3fst16CompatPropertiesEyy[fst::CompatProperties(unsigned long long, unsigned long long)]+0x143): undefined reference to `fst::PropertyNames' ~~~----ETCETERA----~~~ collect2: ld returned 1 exit status make[3]: *** [fstarcsort.exe] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2so it looks as if the linker doesn't find the definition of CompatProperties.

I must add that the following warning occurs twice during Make:

libtool: link: warning: undefined symbols not allowed in i686-pc-cygwin shared librariesOnce here:

/bin/sh ../../libtool --tag=CXX --mode=link g++-4.exe -g -O2 -version-info 0:0:0 -o libfst.la -rpath /usr/local/lib compat.lo flags.lo fst.lo properties.lo symbol-table.lo util.loand once again here:

/bin/sh ../../libtool --tag=CXX --mode=link g++-4.exe -g -O2 -version-info 0:0:0 -o libfstscript.la -rpath /usr/local/lib arcsort.lo closure.lo compile.lo compose.lo concat.lo connect.lo convert.lo decode.lo determinize.lo difference.lo draw.lo encode.lo epsnormalize.lo equal.lo equivalent.lo fst-class.lo info.lo intersect.lo invert.lo map.lo minimize.lo print.lo project.lo prune.lo push.lo randequivalent.lo randgen.lo relabel.lo replace.lo reverse.lo reweight.lo rmepsilon.lo script-impl.lo shortest-distance.lo shortest-path.lo synchronize.lo text-io.lo topsort.lo union.lo weight-class.lo

www.furui.cs.titech.ac.jp/~dixonp/openfstwin-1.2-210810.zip

It includes prebuilt 32-bit binaries that should also work under cygwin. It's all 1.2 code the 1.1 directory name still needs to be changed.

if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) { ...}

returns true, apparently detecting the cyclic dependencies correctly.

Question: Have I used the proper idioms for testing for cyclic dependencies and doing the actual replacement?

#include <fst/fstlib.h> #include <iostream> using namespace fst ; int main() { std::cout << "Hello, world!\n" ; int a = 97 ; // code point values int b = 98 ; // example with cyclic dependencies // StdVectorFst is a typedef for VectorFst<StdArc> StdVectorFst fst1, fst2, fst3 ; // define fst1, a subnetwork (will be arbitrarily associated with -1) fst1.AddState() ; // return 0 fst1.SetStart(0) ; fst1.AddState() ; // returns 1 fst1.AddState() ; // returns 2 fst1.SetFinal(2, StdArc::Weight::One()) ; fst1.AddArc(0, StdArc(a, a, StdArc::Weight::One(), 1)) ; // fst1 refers to fst2 (associated below with -2) fst1.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ; // src i o w dest // define fst2, a subnetwork (will be arbitrarily associated with -2) fst2.AddState() ; // return 0 fst2.SetStart(0) ; fst2.AddState() ; // return 1 fst2.AddState() ; // return 2 fst2.SetFinal(2, StdArc::Weight::One()) ; fst2.AddArc(0, StdArc(b, b, StdArc::Weight::One(), 1)) ; // fst2 refers to fst1 (associated below with -1) fst2.AddArc(1, StdArc(0, -1, StdArc::Weight::One(), 2)) ; // define fst3, the base network (will be arbitrarily associated with -3) fst3.AddState() ; // return 0 fst3.SetStart(0) ; fst3.AddState() ; // return 1 fst3.AddState() ; // return 2 fst3.SetFinal(2, StdArc::Weight::One()) ; // add arcs with "references" to subnetworks // output-side -1 will be associated with fst1 // output-side -2 will be associated with fst2 // The label references to subnetworks are on the Output Side. // The corresponding Input-Side labels can be anything (here 0 for epsilon). fst3.AddArc(0, StdArc(0, -1, StdArc::Weight::One(), 1)) ; fst3.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ; // src i o w dest // set up the vector associating arbitrary numbers with the three networks // -1 refers to fst1 vector< pair< StdArc::Label, const Fst<StdArc> * > > pairlabelfsts ; pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-1, &fst1) ) ; // -2 refers to fst2 pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-2, &fst2) ) ; // -3 is the base network pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-3, &fst3) ) ; StdArc::Label slabel = -3 ; // specifies the "base" Fst, which is fst3 fst3.Write("before.fst") ; // the network with references // Using delayed ReplaceFst if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) { std::cout << "Has cyclic dependencies.\n" ; } else { std::cout << "No cyclic dependencies.\n" ; StdVectorFst expandedFst ; // the result // slabel (here -3) is used to find the base Fst supplied in pairlabelfsts expandedFst = ReplaceFst<StdArc>(pairlabelfsts, slabel) ; expandedFst.Write("intermediate.fst") ; // the network just after ReplaceFst() RmEpsilon(&expandedFst) ; expandedFst.Write("after.fst") ; // the network after RmEpsilon() } std::cout << "Good-bye, world!\n" ; }

Best, Cyril

I am currently using the OpenFST library, and I was thinking of multithreading my program, and I was curious as to whether the OpenFST library currently supports multithreading. Thank you!

Albert

I've gone through looking at the implementation and from what I can tell the OpenFST library currently is not multithreading safe. I've resorted to doing a deep copy of FSTs for each thread. Unfortunately, doing a deep copy was not as straightforward as I thought it would be, or at least I couldn't figure out how to do it elegantly and quickly (The FST I am reading in is very large, and trying to read in the FST multiple times takes much longer than I want). I tried doing a deep copy using Map() with the IdentityMapper as mentioned in a previous post, but found that this only does deep copies of the arcs (given the copy constructor of the Arc and Weight do deep copies), and does a shallow copy of the symbol table, which creates problems for multithreading (when multiple threads try to access the symbol table at once).

I'm not sure if there is an easier way to do this, but here is what I did to change the OpenFST code to support deep copying of the symbol table. I added a new enum code in map.h to MAP_SYMBOLS_ACTION as well as a couple of lines of code for supporting symbol table deep copying, added a function to do a deep copy of the symbol table to symbol-table.h. Finally, I added a Mapper almost identical to the identity mapper, but returns the new enum code for InputSymbolsAction() and OutputSymbolsAction().

Hope this helps anybody trying to multithread!

its size is about 2.5Mb. then i run the below command successfully:

fstcompile --isymbols=a.sym --osymbols=a.sym tmp1.fst fstcompile --isymbols=a.sym --osymbols=a.sym tmp2.fst

then i want to compose them and run this:

fstcompose tmp2.fst tmp3.fst tmp4.fst

when the symbol number is little the composition is done but in some symbol files the symbol number is very large, for example 170655. in this case, the memory is fulled rapidly and the CPU doesn't work any more. this is my system properties: CPU: Pentium 4 Dual core 2.26GHz Ram: 4G Swap: 15G OS: CentOS 5.3

could you please give me suggestions about the case? Thanks a lot -Ali

fstarcsort -sort_type=olabel tmp2.fst > sort2.fst

fstarcsort -sort_type=ilabel tmp3.fst > sort3.fst

fstcompose sort2.fst sort3.fst > tmp4.fst

did you figure out what the problem was ?

I make a composition of 3 transducers F o G o H and I extract the ShortedPath on the resulting transducer. Is there a way to have access to the arcs in F, G and H corresponding to the arcs in the ShortestPath transducer ? I would like to have access to the output symbols of the arcs in F for example.

thank you,

-- Chris

There may be better solutions to your problem provided by the library but a quick solution is:

1. remove the weights of the shortestpath fst, S = shortest(F o G o H), and project it onto the input and output labels

fstmap -map_type=rmweight s.fst | fstproject > s.in.fst fstmap -map_type=rmweight s.fst | fstproject -project_output > s.out.fst

2. compose shortest input with F and project the result onto the output labels to get the candidate paths in F

compose shortest output with H and project the result onto the input labels to get the candidate paths in H

fstcompose s.in.fst f.fst | fstproject -project_output > f.out.fst fstcompose h.fst s.out.fst | fstproject > h.in.fst

3. compose the candidate paths with G (F_out o G o H_in) to obtain the paths in G that survive during the composition

fstcompose f.out.fst g.fst | fstcompose - h.in.fst > internal.fst

4. finally apply shortestpath to this fst to obtain the path in G internal to the shortest path in F o G o H

fstshortestpath internal.fst > shortest.internal.fst

This final transducer should provide the internal path you need. Note that the weights of this transducer are those obtained during the composition not the original weights of G.

Best, Dogan

I am not entirely sure that I got your problem. I mean the internal path in G should be as complex as S.

Can you post a simple example where the above procedure fails?

Best, Dogan

Roland

I'd like to convert phoneme lattices into WFSTs and use a grammar transducer to obtain a word lattice out of it. The thing is, that I want to keep additional information e.g. time stamps, acoustic scores, etc. while doing so. Note, that I do not want to obtain any likelihoods on paths in the first place. Instead, I would traverse the final lattice using those accumulated values to determine a likelihood for each transition and user-defined scaling factors myself.

I see two possibilities to keep track on times/scores - by storing them on arcs or using the string semiring to push just arbitrary information in the weights. When I use the string semiring, I could simply push there as much as I want to,and do a composition with my grammar. But then I am somewhat restricted: composition is not supported, because the string semiring is not commutative. I was thinking to rewrite the weighted composition algorithm from the paper, just change the weighting part, since I do not care about the order of elements within my strings anyways. But maybe I could keep my input fst as log/tropical semiring type and attach my scores/time stamps in the arcs. Would it then be possible to use the existing composition algorithm and still copy my attached information into the resulting FST? Maybe it is all much easier than what I thought of here right now, then I'd be glad if you give me some hints Thanks a lot in advance, Stefan

As far as I know the default arcs StdArc is using Tropical weight. How can I change that into Probabilty weight, so that transition weights are multiplied and not summed up?

I was looking for something predefined, but couldn't find anything...

Thank you in advance!

There is a counting utility that includes a real semiring implementation and arc registration with the standard command line tools [here][1]

[1]: http://www.furui.cs.titech.ac.jp/~dixonp/wfstcount.zip]

http://www.cs.nyu.edu/~mohri/postscript/hbka.pdf

and the figure for the minimization example, pg. 8, fig. 5(c) seems quite wrong. particularly the 'minimized' version is not equivalent to the original transducer. in the other reference

http://www.cs.nyu.edu/~mohri/pub/csl01.pdf

an identical transducer, albeit with different weights on a couple of the arcs, appears very differently in its minimized form.

i verified that the second version referenced in the latter link produces the expected output but it would be nice to confirm this, or alternatively to understand why they are different.

ERROR: FstMainRegister::GetMain: log-arc.so: cannot open shared object file: No such file or directory ERROR: fstshortestpath: Bad or unknown arc type "log" for this operation (ShortestPathMain)

I have checked my $LD_LIBRARY_PATH, it already includes /usr/local/lib, how come it still cannot find the shared object? Thanks in advance

The issue is that the shortest path algorithm does not work over the log semiring.

Hence support for `LogArc`

is not compiled into the `fstshortestpath`

utility and the code tries to locate a shared object containg the definition of that arc type.

Best, Cyril

For the shared object to be detected and loaded a name of the form arc-type.so worked but arc_type.so did not.

I put sample code for a real semiring arc type here

The contain of `my-arc.cc`

(the source for `my-arc.so`

) should look like:

#include "nlp/fst/lib/fst-inl.h" #include "nlp/fst/lib/const-fst-inl.h" #include "nlp/fst/script/register.h" #include "nlp/fst/script/fstscript.h" #include "my-arc.h" namespace fst { namespace script { REGISTER_FST(VectorFst, MyArc); REGISTER_FST(ConstFst, MyArc); REGISTER_FST_CLASSES(MyArc); REGISTER_FST_OPERATIONS(MyArc); } }

Hope this will work for you.

Cyril

`-inl`

's, the `lib/`

's and the `nlp/`

's. This gives:

#include <fst/fst.h> #include <fst/const-fst.h> #include <fst/script/register.h> #include <fst/script/fstscript.h>

`/usr/local/lib`

to your `LD_LIBRARY_PATH`

environment variable:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/libBest, Cyril

vector < pair<StdArc::Label, StdArc::Label> > vect ; vect.push_back(pair<StdArc::Label, StdArc::Label> ( (StdArc::Label) 5, (StdArc::Label) 0 )) ; Relabel(fstp, vect, vect) ;

where fstp is a pointer to a StdVectorFst. ilabels and olabels originally labeled 5 are relabeled correctly as 0, possibly introducing input-side, output-side and double-sided epsilons; but (here's the problem) the Properties kIEpsilons, kNoIEpsilons, kOEpsilons, kNoOEpsilons, kEpsilons and kNoEpsilons are not updated.

At least that's what seems to be happening.

Access control:

- Set ALLOWTOPICCHANGE = AllAuthUsersGroup

Edit | Attach | ~~Watch~~ | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions

Topic revision: r1 - 2014-04-21 - CyrilAllauzen

Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback