- OpenFst Forum 2008 Archive
- Patch for GNU autotools including libtool for OpenFst 20080422
- Determinize, Sequentialize, Subsequentialize
- Determinize and functional transducers
- Request: Determinize in place
- StateIterator and the Start state
- Failure transitions
- Interactive manual testing of an FST
- Easy way to query the number of paths in an FST?
- About Factoring Algorithm of WFST
- Here is a patch to use CMake for builds (tested on Linux and Mac OS X)
- SymbolTable::size()
- Efficient Way of Taking the Union of Several FSTs
- possible bug in fstdeterminize in case of log semiring
- LM using FST
- compilation gotcha with spaces in directory names
- Input and output alphabets
- Interpreting complement character ranges
- Mindtuning: value of separate input and output symbol tables?
- "Normalizing" arc/final weights on an FST?
- Natural code for printing all strings accepted by an FST?
- Why is TropicalWeight considered standard?
- Compiler error on Determinize().
- New Release of OpenFst?
- Another newbie question

**toautotools_openfst_20080422_20081227.diff**

As I have stated in a previous post, to apply and use the patch, say to a user subdirectory busr so that one does not need administrator privileges to install,

patch -p1 -E < ../toautotools_openfst_20080422_20081227.diff autoreconf --install --force ./configure --prefix=$HOME/busr CPPFLAGS="-I/opt/local/include" LDFLAGS="-L/opt/local/lib" make make check make install

I have enabled make check using tests already provided in OpenFst so that one can test whether modules can be dlopened. All tests pass for all platforms except Cygwin on Windows XP. Even for Cygwin on Windows XP, this patch enables a dynamic linked library to be built such that OCRopus is able to use it to pass its own OpenFst tests. (There is an additional failure of one OCRopus test, test-fst-search2.lua, on Mac OS X and FreeBSD.)

My preliminary investigation for Cygwin on Windows XP appears to show that something is going wrong with reinterpret_cast-ing the read() function so that it can be recovered using a map in both fst/lib/register.h and fst/bin/main.h. The pointer to this function appears to be immediately recovered if the map is accessed right after it is stored, but a later GetEntry() returns a null pointer.

I believe that using GNU autotools to provide dynamic linked libraries is an essential step to helping more wide distribution of OpenFst through standard platform package or port managers. I think the current patch just about finishes this task other than the above problem for Cygwin. I also have not tested this patch on any 64-bit platforms other than an Intel Macbook running Mac OS X.

Any mind-tuning that you could offer would be much appreciated.

Ken

In the weighted automaton case, the implementation follows exactly Mohri's paper. The weighted transducer is dealt by considering weighted transducer over seminiring K as weighted automaton over the semiring obtained by taking the cross product of the string semiring and the semiring K.

Cyril

Thanks,

Ken

Best, Cyril

Thanks,

Ken

In my own work (a programming language built on top of OpenFst) an assignment statement like

$net = a*b+[w-z]? ;

causes the right-hand side to be evaluated (by calling various functions in the OpenFst library) and in a symbol table the name $net is bound to a Java Fst object that stores (as a Java long int) the pointer to the resulting C++ Fst object. That pointer is my handle from the Java world to the Fst object in the C++ universe. If I then type

$net2 = $net ;

then $net2 becomes an alias for the same Java Fst object that contains the pointer to the original C++ Fst object. (If I really want to make a copy, I've got a copy function that relies on the Map() function in the OpenFst library.)

With RmEpsilon(&A), I can remove the epsilons from the original C++ Fst object in place, and both $net and $net2 still reference the same Java Fst object that still contains the pointer to the original C++ Fst object. If I could (at least superficially) also Determinize() that original Fst object in place, life would be a bit easier for me. With the existing Determinize(A, &B), a new C++ Fst object B is created, which usually causes my language to create a new Java Fst object to wrap it; and I've got to worry about changing the pointer in the original Java Fst object (the value, in the symbol table, of $net and any aliases like $net2) and deleting the original C++ object A to prevent a kind of memory leak.

So I now understand why Determinize(A, &B) is as it is, but I do still suggest that I and perhaps others would find a new Determinize(&A) useful and intuitively parallel to RmEpsilon(&A) and Minimize(&A), so that all could be performed "in place".

Best,

Ken

I understand that what you describe is a rather unsastisfying way of wrapping. However, I not sure I understand why this is the only of wrapping. Anyhow, you can easily add a Determinize(&A) as follows:

template <class Arc> void Determinize(MutableFst<Arc> *fst) { *fst = DeterminizeFst<Arc>(*fst, DeterminizeFstOptions(CacheOptions(true, 0))); }

Best, Cyril

for (StateIterator<StdFst> siter(fst); !siter.Done(); siter.Next()) { StateId state_id = siter.Value() ; ... }

is the first state_id returned from the Iterator always the Start state of the Fst?

Thanks,

Ken

Best, Cyril

Great effort to build this library - there are so many uses - it's very nice to have such knowledgeable people building this for everyone. Thank you!

I'm interested in doing some language modeling with this library which I anticipate will require backoff weights that are commonly going to build paths which will be "better" than the proper path for certain n-grams.

I know you guys implemented failure transitions in the GRM library for LM purposes, and later an algorithm to replace them with epsilon transitions along with some compression heuristics to reduce the size of the final machines to address this exact problem.

My question is: What's the best route to take to implement this for the OpenFST project? If I take this on myself... should I create a new FST class to implement failure transitions? Or can I get away with just building a new Arc class which can trick the search functions for composition?

Thanks again for your help!

Chris

There is an undocumented support for failure transitions in composition. This feature is not documented and the handling of failure transitions will change significantly in the next version of the library.

For now, you can label failure transitions with `kPhiLabel`

(you will need to use the C++ library for this, Fsts containing transitions labeled by `kPhiLabel`

cannot be written to a file).

To perform composition with failure transitions, you need to use

ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST1_PHI>())when

`fst1`

contains transitions labeled by `kPhiLabel`

or ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST2_PHI>())when

`fst2`

contains transitions labeled by `kPhiLabel`

.
Best, Cyril

Out of curiosity, what's the new implementaiton going to be? Something along the lines of a custom Arc/State/Fsm class?

Chris

I noticed that during composition with failure transitions, the composition does not chase down the transitive closure of both epsilon and failure transitions.

You can reproduce the behavior by having a state with a failure transition lead to a state with an epsilon transition - which in turn leads to a state with a real symbol arc on it. For example an FSA with the following makeup: 0 1 -2 -2 1 2 0 0 2 3 1 1 3

When you do composition with an FST containing only that real symbol (symbol 1 in the above case) - I would expect it to yield a connected machine (failure -> epsilon -> symbol) - which it does not. Instead it looks through the state with the failure transition and sees no matching symbol (only an epsilon transition) and aborts composition.

I tried to fix the problem, but couldn't quite figure out a good way to do so. The logic to chase down the transitive closure of failure transitions is making assumptions on the structure of the FST (deterministic w.r.t. failure transitions, or at least that they're all equally good, because it takes the first one it finds which yields a match) which probably means any fix I make will be short lived anyway.

If you'd like more information, please let me know, otherwise I'll just try another workaround - but I don't think fixing it in the current implementation is worthwhile... do you?

Chris

Indeed, I should have mentioned this: failure and epsilon transitions do not work together. You should also have at most 1 failure transition at a given state.

Cyril

1. Get the input string typed by the user.

2. Build the input string into a one-path acceptor (call it InputACC)

3. StdVectorFst ResultFST ;

Compose(InputACC, MainFST, &ResultFST) ; // compose for generation

Project(&ResultFST, PROJECT_OUTPUT) ; // take the output projection

if (number of paths in ResultFST not infinite or huge) {

// print strings encoded by ResultFST

}

This testing would be put in a loop, allowing the user to manually enter as many test input strings as desired before terminating the loop. To implement analysis, one could 1) invert the MainFST before performing the steps above, or 2) do the following:

Compose(MainFST, InputACC, &ResultFST) ; // compose for analysis

Project(&ResultFST, PROJECT_INPUT) ; // take the input projection

QUESTION: Is there a better, more efficient, way to implement such manual testing? Perhaps using the lazy operations ComposeFst() and ProjectFst() ?

Thanks,

Ken

You could benefit from lazy operations when the number of strings in ResultFst is large and you don't want to display all these strings (or even don't want to compute all of ResultFst). Instead, you could rewrite your code to generate all paths to work for a non-expanded FST and to stop once a fixed number of strings have been found (you could even ask to the user if he wants to see more strings then).

Best, Cyril

If you have an FST, is there an easy way to query the number of paths? Thanks, Ken |
KennethRBeesley | 13 Jun 2008 - 14:10 |

The natural way is to use the shortest-distance algorithm: 1. Use Map to turn your FST into an FST over the log semiring setting the weight of every transition and the final weight of final states to LogWeight::One(). 2. Call ShortestDistance with reverse == true. Let d be the value return in the distance vector for the initial state. d is then the -log of the number of successful paths in the FST. Best, Cyril |
CyrilAllauzen | 16 Jun 2008 - 12:56 |

Thanks, Cyril. I suspect that my terminology was wrong and/or my C++ skills are still shaky. What I really need is the number of complete paths in an FST, from the start state to a final state. Following these instructions (as best I can), I seem to get the number of paths from the start state to any state (final or not). I hesitate to post my code here, but I'd be more than happy to send it to you or anyone else.Ken |
KennethRBeesley | 17 Jun 2008 - 16:48 |

The key is to use the reverse option: `ShortestDistance(fst, &distance, true)` . For any state q, we then have that `distance[q]` is the sum of the weights of all the paths from `q` to a final states. Hence, `distance[fst.Start()]` is what you want (but you should always check before that `distance.size() > fst.Start()` ).Cyril |
CyrilAllauzen | 18 Jun 2008 - 11:37 |

Cyril, I am using the reverse option = true. Perhaps I'm not properly setting the exit weight of the final states to LogWeight::One(). Will the following Map do it? // Mapper from StdArc to LogArc. Thanks, Ken |
KennethRBeesley | 25 Jun 2008 - 13:17 |

Following instructions from Cyril, here's a pathCount() function, and a trivial main() for testing, that work for me. If it gets garbled in the posting, I'd be glad to forward it by email to anyone interested. Ken #include "fst/lib/fstlib.h" #include <iostream> // iostream gives access to cout using namespace fst ; // From map.h // Mapper to map all non-Zero() weights to One(). //template <class A, class B = A> //struct RmWeightMapper { // typedef A FromArc; // typedef B ToArc; // typedef typename FromArc::Weight FromWeight; // typedef typename ToArc::Weight ToWeight; // B operator()(const A &arc) const { // ToWeight w = arc.weight != FromWeight::Zero() ? // ToWeight::One() : ToWeight::Zero(); // return B(arc.ilabel, arc.olabel, w, arc.nextstate); // } // MapFinalAction FinalAction() const { return MAP_NO_SUPERFINAL; } // uint64 Properties(uint64 props) const { // return props & kWeightInvariantProperties | kUnweighted; // } //} typedef VectorFst<LogArc> LogVectorFst ; typedef Fst<LogArc> LogFst ; typedef LogArc::StateId StateId ; // pathCount return the number of paths in an FST // (this version assumes that the input is a StdVectorFst) int pathCount(const StdVectorFst &sfst) { // if the fst has loops, then the language or relation // encoded by the FST is infinite, just return -1 if (sfst.Properties(kCyclic, true)) { return -1 ; } // copy the FST input to a new unweighted network, // over the Log Semiring LogVectorFst lfst ; // for each original StdArc, if the weight is _not_ // TropicalWeight::Zero() then change it to LogWeight::One(), // else (when the weight is TropicalWeight::Zero()) change it to // LogWeight::Zero(). This is handily done with // RmWeightMapper(), which is in the library. Map(sfst, &lfst, RmWeightMapper<StdArc, LogArc>()) ; vector<LogArc::Weight> distance ; ShortestDistance(lfst, &distance, true) ; // from the distance vector, get the weight for the start state LogArc::Weight w = distance[lfst.Start()] ; // w.Value() is the -log of the number of paths // cout << "Weight from pathCount(): " << w << endl ; // so make w positive and get the exp() int paths = exp((double)(-1.0 * w.Value())) ; // cout << "Paths from pathCount(): " << paths << endl ; return paths ; } // a trivial main() that tests pathCount() // int main() { // Create a little FST over the Tropical Semiring StdVectorFst fst ; fst.AddState() ; // will be state 0 fst.SetStart(0) ; fst.AddState() ; // will be state 1 fst.AddState() ; // will be state 2 // add some arcs fst.AddArc(0, StdArc(1, 1, 0.693, 1)) ; // arc from 0 to 1 fst.AddArc(0, StdArc(2, 2, 0.693, 1)) ; // arc from 0 to 1 fst.AddArc(0, StdArc(7, 7, 0.0, 1)) ; fst.AddArc(1, StdArc(3, 3, 0.693, 2)) ; // arc from 1 to 2 fst.AddArc(1, StdArc(4, 4, 0.693, 2)) ; // arc from 1 to 2 fst.AddArc(1, StdArc(5, 5, 0.693, 2)) ; fst.AddArc(0, StdArc(8, 8, 0.0, 2)) ; // Uncomment the following line to create a cycle/loop, // which makes the language or relation infinite. //fst.AddArc(1, StdArc(5, 5, 0.0, 1)) ; fst.SetFinal(2, 0.693) ; int count = pathCount(fst) ; cout << "Path Count from main(): " << count << endl << endl ; return 0 ; } |
KennethRBeesley | 03 Jul 2008 - 16:46 |

Hi Ken, I fixed your post. Cyril |
CyrilAllauzen | 09 Jul 2008 - 11:41 |

Cyril, Thanks again for your help in implementing the path-counting algorithm above. Could you or anyone help me understand how the code above returns the number of paths? Why does the weight of the start state, in the Log semiring, correspond to the number of paths.? In larger FSTs, some accuracy seems to be lost. In one of our examples, an FST reported (by the algorithm above) to have 1,048,580 paths, has, in fact 4 fewer paths. Not a big error, but it surprised one of our testers who expected to get an exact path count.Thanks, Ken |
KennethRBeesley | 24 Jun 2009 - 19:55 |

It uses the shortest-distance algorithm describe in FST.ShortestDistanceDoc. In the real semiring, when the weights of all the arcs are 1, the shortest-distance from the initial state to a state q is the number of paths from the initial state to q. However, since the libary does not implement the real semiring, we have to do the computation in the log semiring. In this very special case where the weights are integer and we always multiply by 1, going to the log actually leads to more numerical errors. There are two things you can try if you want to get better precision: 1. Use a smaller delta. Shortest-distance (like many algorithms in the library), use a delta parameter to determine whether two weights are "equal". By default, this delta is kDelta. This is to make the algorithm converge faster when they are cycles. Here, since the machine is acyclic, you could get away with a much much smaller delta. 2. Use doubles instead of floats. If reducing significantly delta does not help, the imprecisions might be due to floating point errors. You could try using doubles instead of floats by using ArcTpl<LogWeightTpl Finally, if this is very important to you, the best thing would probably be to implement the (+,x)-semiring over unsigned 64-bit integers. Cyril |
CyrilAllauzen | 25 Jun 2009 - 17:02 |

Hi, In this paper Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech Recognition with Weighted Finite-State Transducers. In Larry Rabiner and Fred Juang, editors, Handbook on Speech Processing and Speech Communication, Part E: Speech recognition. Springer-Verlag, Heidelberg, Germany, 2008. The factoring algorithm is mentioned but no enough details. Can you tell me where to find the code or detailed descrption of the factoring algorithm? Thanks in advance! Zhenyu Pan |
ZyPan | 30 May 2008 - 13:08 |

Performs build, tests, and installation. Could generate for Cygwin and even Visual Studio; might work with the former, but code is of course UNIX-specific, so won't build with the latter. Can't upload here, so download from: http://lemurz.org/io/wp-content/uploads/2008/05/OpenFst-beta-20080422-cmakep1.patch.gz |
TomMurray | 12 May 2008 - 17:42 |

Small update to install header files with 'make install': http://lemurz.org/io/wp-content/uploads/2008/05/OpenFst-beta-20080422-cmakep2.patch.gz | TomMurray | 12 May 2008 - 19:03 |

It's minor, but it would be nice to have a SymbolTable::size() function that returns the number of keys. | MarkusD | 23 Apr 2008 - 11:06 |

Hi, I'm trying to take the union of several thousands of FSTs. After a while the union becomes quite large and the union operation appears to take more and more time to complete. I tried removing epsilon transitions and determinizing after each union. This helps to reduce the time consumed by the union operation, however this time the overall operation (union+rmepsilon+encode+determinize+decode) takes much more time compared to the former case. I wonder if there is an efficient way of taking the union of several FSTs. Thanks in advance, Dogan |
DoganCan | 22 Apr 2008 - 12:57 |

Hi, When dealing with the union of a very large number N of FSTs, a good approach is two proceed in log N steps. The first step consists of computing the union of the (2i)-th and (2i+1)-th FSTs for each 0 < i < N/2, leading to a set of N/2 transducer for the next step. The final transducer is obtained as the union of 2 intermediary transducers at the log N step. Best, Cyril |
CyrilAllauzen | 22 Apr 2008 - 17:58 |

Hi, I noticed that the outputs of fstdeterminize and fsmdeterminize from AT&T FSM toolkit do not match in the case of log semiring. Here is a simple fst in log semiring with the mentioned problem and the results of two executables
I believe a previous discussion titled "fstdeterminize" is about the same issue where fstdeterminize changes the path weights in the log semiring. I tried the suggestion "The issue might be due to numerical instabilities. One solution would then be to use a smaller delta (in DeterminizeFstOptions)." with much smaller/larger deltas, but the result of fstdeterminize did not change. Could you please explain what I'm missing or how to correct if this is actually a bug. Best, Dogan |
DoganCan | 15 Apr 2008 - 10:44 |

Hi, The output of fsmdeterminize and fstdeterminize do not match because there are significant differences in the implementation. In that example, both outputs seem valid to me. However, changing the delta should have had an effect on the result. It appears there was a bug and the delta parameter was ignored when determinizing transducers (delta works as it is supposed to for acceptors). A new version of the library fixing this issue is available on the download page. Best, Cyril |
CyrilAllauzen | 17 Apr 2008 - 17:28 |

Hi, I used to use GRM to build LMs with FSM, is FST compatible with GRM ? If not, is there an alternative in FST ? Also, What is the equivalent of farcompilestrings & farprintstrings under FST? Thanks, Mohamed Noamany |
MohamedNoamany | 03 Apr 2008 - 20:50 |

Hi, Only the text representation of OpenFst and FSM are compatible. So, if you want to use a LM build by GRM in OpenFst, you need to convert it to the OpenFst format using `fsmprint` / `fstcompile` : fsmprint lm.fsm | fstcompile > lm.fstThere are no equivalent of farcompilestrings/farprintstrings in OpenFst. Best, Cyril |
CyrilAllauzen | 04 Apr 2008 - 10:50 |

JonathanB | 16 Oct 2009 - 11:16 | |

Hi Cyril, Would the far tool in OpenKernel do the trick? (edit) ..mmhh.. I take it back. I just noticed version 1.2 came out and it does have the far extension. cheers |
AMorenoDaniel | 13 Aug 2010 - 19:36 |

Low priority item for the todo list (which will be likely fixed with autoconf but good to put in the test suite): compiling in a directory with spaces in the name (allowed and quite possible under macos) causes $(PWD) in bin/Makefile to break. Symptom: localhost:~/cse733/sp08/794L tutorial materials/fst/OpenFst/fst fosler$ make ( cd lib ; make all ) make[1]: Nothing to be done for `all'. ( cd bin ; make all ) make[1]: * No rule to make target `"/Users/fosler/cse733/sp08/794L', needed by `fstarcsort.o'. Stop.make: * [all] Error 2Clear and easy workaround is to replace $(PWD) with a hard coded directory with escaped spaces. |
EricFoslerLussier | 26 Mar 2008 - 21:35 |

I've been using non-disjoint but different input and output alphabets, along with --keep_osymbols and --keep_isymbols. Two commands in particular are confusing here. fstinvert does not understand that the input and output symbol tables need to be swapped if the result is going to be correct, and fstproject does not understand that the alphabets in the result are going to be either both derived from the osymbols or both derived from the isymbols. Presumably the trouble is that these two commands are outliers compared to the rest of the library, so the infrastructure which they get from the library doesn't quite fit. |
ChrisBrew | 14 Mar 2008 - 16:40 |

This is actually a bug. It should do the right thing with the symbol tables for project and invert. The bug has been fixed and you can download the fixed version of the library on the download page. | CyrilAllauzen | 14 Mar 2008 - 19:24 |

Thankyou. There's an easy workaround: don't use different alphabets, which actually works for me. But better not to have to. |
ChrisBrew | 15 Mar 2008 - 10:49 |

Thankyou. There's an easy workaround: don't use different alphabets, which actually works for me. But better not to have to. |
ChrisBrew | 15 Mar 2008 - 10:49 |

Actually, a new bug was introduced in fstinvert in the version released on friday. A new version is now available on the download page fixing this bug. | CyrilAllauzen | 17 Mar 2008 - 11:51 |

I'm implementing a programming language based on regular expressions, which are interpreted to produce finite-state networks. The current interpreter, which is just starting to work, calls functions in the OpenFst library, though I've kept the design modular enough to use other interpreters based on other libraries. Interpreting character ranges: If I parse and interpret the following statement: $foo = b [aeiou] t ; a character range like [aeiou] or [a-z] can be handled straightforwardly as a union, equivalent to (a |
e | i | o | u) and (a | b | c | d | ... | z), respectively. No problem. But how, in OpenFst, can I interpret complemented character ranges such as $bar = a [^aeiou] e ; ??? If . is used in the syntax to represent "any possible character" (denoting the set of all single-character strings), then [^aeiou] is equivalent to (. - (a |
e | i | o | u)), but that just raises the question, How would one interpret the following: $fum = a . e ; denoting the set of all three-character strings where the first character is 'a', the second character is "any possible character" and the third is 'e' ? Thanks, Ken |
KennethRBeesley | 13 Mar 2008 - 13:12 |

Wow. That last message got garbled wherever I used a vertical-bar to indicate union. The first garbled example, expanding [aeiou], was (a verticalbar e verticalbar i verticalbar o verticalbar u ). The second, equivalent to [a-z] was (a verticalbar b verticalbar c verticalbar d verticalbar ... z) The third, equivalent to [^aeiou] was (. - ( a verticalbar e verticalbar i verticalbar o verticalbar u)) Ken |
KennethRBeesley | 13 Mar 2008 - 14:02 |

OpenFst allows the use of separate input and output symbol tables for mapping symbol names (strings) to the integers used as labels on an Fst. So, for examples, the label 2 on the input side could be mapped to 'a', and the label 2 on the output side could be mapped to 'b'. 1. What's the perceived value of allowing separate input and output mappings? 2. In practice, is it most common to use the same symbol table for both sides? Thanks, Ken |
KennethRBeesley | 04 Mar 2008 - 11:53 |

Actually, it is very common to have different symbol tables for each side. For instance, if you a lexicon that maps prononciation of words, i.e. sequence of phonemes, to words. The input symbol table would contains "phonemes" and the output table words. This is a real example from speech recognition (cf. this paper) | CyrilAllauzen | 04 Mar 2008 - 11:59 |

Yes, that's a good example. Thanks. In my own work, at least, I would tend to want individual alphabetic symbols like 'a', 'b' and 'c' to be stored (by default) on arcs with their standard Unicode code point values, including IPA symbols in a speech application; and any symbols with multicharacter print names (including whole words, as in the speech-recognition example) would be stored with code point values from a Unicode Private Use Area. Unicode currently offers 137,468 Private Use code points--is that enough for all the words and other multicharacter symbols (e.g. ngrams of phoneme characters) used in a typical speech application? | KennethRBeesley | 05 Mar 2008 - 15:43 |

Unfortunately, this solution would not be satisfying for speech recognition. Some very large vocabulary speech recognition systems use vocabularies consisting of several million words (i.e. multicharacter symbols). Allowing users to define their own input and output symbol tables is the most general approach. For your work, the approach you describe may be more than adequate, and the library indeed allows you to have the same symbol table used as both input and output table for all your FSTs. Cyril |
CyrilAllauzen | 06 Mar 2008 - 10:55 |

I am using FSTs to represent probabilistic finite-state automata and there are various situations in which I need to "normalize" them--that is, scale the arc weights such that the probability of arcs leading out of a state (plus the probability of state finality) sums to 1 for each state. I am having trouble, however, figuring out how to modify the weights on arcs of an FST. I have tried the following code, and while this modifies the final weight appropriately, it doesn't affect the arc weights at all: void makeProper(VectorFst<StdArc> &fst) { using fst::TropicalWeight; for(StateIterator<VectorFst<StdArc> > siter(fst); !siter.Done(); siter.Next()) { int s = siter.Value(); float tot = pow(2, -1 * fst.Final(s).Value()); for(ArcIterator<VectorFst<StdArc> > aiter(fst,s); ! aiter.Done(); aiter.Next()) { StdArc arc = aiter.Value(); float contribution = pow(2, -1 * arc.weight.Value()); tot += contribution; } float adjustment = -1 * log(tot) / log(2); float newFinal = fst.Final(s).Value() - adjustment; fst.SetFinal(s,TropicalWeight(newFinal)); for(ArcIterator<VectorFst<StdArc> > aiter(fst,s); ! aiter.Done(); aiter.Next()) { StdArc arc = aiter.Value(); float newWeight = arc.weight.Value() - adjustment; arc.weight = TropicalWeight(0.0); } } } How can I modify the arc weights? Or is there some other functionality I haven't noticed yet that performs this normalization on its own? Many thanks once again, Roger |
RogerLevy | 02 Mar 2008 - 23:34 |

Ahh...my naive attempt to use ...was not entirely successful. My full query with properly-indented code can be found here: http://idiom.ucsd.edu/~rlevy/normalize_fst_weights Roger |
RogerLevy | 02 Mar 2008 - 23:37 |

Hi Roger, 1. In order to modify an arc, you need to use a MutableArcIterator instead of an ArcIterator. Then do aiter.SetValue(arc) at the end of your last for loop. The constructor of aiter should then also take &fst as an argument instead of fst. 2. There is an operation to normalize the weights in the way you describe. It is call pushing: Push(&fst, REWEIGHT_TO_INITIAL) However, pushing is defined using the semiring operation, hence in your case you'll need to push in the log semiring. VectorFst<StdArc> fst; VectorFst<LogArc> log_fst; Map(fst, &log_fst, StdToLogMapper()); Push(&log_fst, REWEIGHT_TO_INITIAL); Map(log_fst, &fst, LogToStdMapper()); The total weight will still be present at the initial state. You can remove it by doing after calling Push: vector<LogWeight> distance; ShortestDistance(log_fst, &distance, true); LogWeight total = distance[log_fst->Start()]; MutableArcIterator< VectorFst<LogArc> > aiter(&log_fst, log_fst->Start()); for (; !aiter.Done(); aiter.Next()) { LogArc arc = aiter.Value(); arc.weight = Divide(arc.weight, total); aiter.SetValue(); } P.S. I didn't compile the code so there might be some typos. P.P.S. I fixed the formating of your original post. |
CyrilAllauzen | 04 Mar 2008 - 11:25 |

Many thanks once more, Cyril. (The last call needed to be aiter.SetValue(arc) but that was easy enough to figure out.) The approach I was taking had an error in it, anyway, so Push is much better. Best Roger |
RogerLevy | 04 Mar 2008 - 22:50 |

One other question: is there an easy way of making LogArc and LogWeight operate in base 2 logs rather than in natural logs? Roger |
RogerLevy | 05 Mar 2008 - 17:52 |

Unfortunately, they are no easy way to do this. Cyril |
CyrilAllauzen | 06 Mar 2008 - 10:44 |

Got it. Also, I notice that RealWeight and RealArc are not yet implemented, is this correct? (they would be convenient for transparent post-hoc inspection of normalized log-semiring FSTs.) Best Roger |
RogerLevy | 08 Mar 2008 - 15:56 |

No RealWeight yet. The reason was we never used the real semiring for anything else than toy example with the FSM library. But, we might decide to add it to the next release since people seem to request it. Best, Cyril |
CyrilAllauzen | 11 Mar 2008 - 11:59 |

I have written some code for printing all strings accepted by an FST together with their weights. (Clearly a bad idea in general, but useful right now for some tests I'm running with very small FSTs.) Since I am new to both C++ and OpenFst I wonder whether there is a more natural way to write this functionality. In particular I wonder: 1) is there a better way of checking whether a state is final? 2) whether there are better ways to implement the accumulation of weights along the path through the FST, perhaps by using a Weight class directly? void printAllStrings(VectorFst<StdArc> &fst) { vector<int> str(0); printAllStringsHelper(fst, 0, str, 0.0); } void printAllStringsHelper(VectorFst<StdArc> &fst, int state, vector<int> str, float cost) { if(fst.Final(state).Value() < 100000) cout << vectorToString(str) << " with cost " << (cost + fst.Final(state).Value()) << endl; for(ArcIterator<VectorFst<StdArc> > aiter(fst,state); ! aiter.Done(); aiter.Next()) { StdArc arc = aiter.Value(); str.push_back(arc.ilabel); printAllStringsHelper(fst, arc.nextstate, str, cost + arc.weight.Value()); str.pop_back(); } } string itos(int i) { // convert int to string; from Bjarne Stroustrup's FAQ stringstream s; s << i; return s.str(); } string vectorToString(vector<int> v) { if(v.size() == 0) return "<>"; string result = "<" + itos(v[0]); for(int i = 1; i < v.size(); i++) { result += "," + itos(v[i]); } return result + ">"; } Many thanks in advance for any pointers! Roger |
RogerLevy | 01 Mar 2008 - 21:04 |

hmm...TWiki has done some damage here to my double-equals equality test, to my plus-equals assignments, and also has erased some templates...hopefully the code is still reasonably readable! I've posted it at http://idiom.ucsd.edu/~rlevy/print_all_strings Roger |
RogerLevy | 01 Mar 2008 - 21:09 |

1. Yes! There is a much better to test if a state is final.if (fst.Final(state) == Weight::Zero()) // if state is not final Where you can replace Weight by the appropriate weight class (even better do a typedef of the current weight to Weight). 2. Indeed, you should try to use the Weight class operations. You should almost never need to use weight.Value(). Even cout << weight and cin >> weight should work since the operators << and >> are defined. Remember, the operation to combine weight along a path is Times. So in your code you should have TropicalWeight costand Times(cost, arc.weight)P.S. I fixed the formatting of your original post |
CyrilAllauzen | 04 Mar 2008 - 12:12 |

Many thanks, Cyril, this is great! Roger |
RogerLevy | 04 Mar 2008 - 22:47 |

Roger, 1. Do you have a final version of your code to print all paths with their weights?, and 2. Would you consider sharing it? Thanks, Ken |
KennethRBeesley | 11 Jun 2008 - 11:52 |

Hi Ken, It looks like the current version of my code is pretty similar to the earlier version: // a bad idea if there are loops :) void printAllStrings(VectorFst vector TropicalWeight tw(TropicalWeight::One()); printAllStringsHelper(fst, 0, str, tw); } void printAllStringsHelper(VectorFst if(fst.Final(state) = TropicalWeight::Zero()) cout << vectorToString(str) << " with cost " << (Times(cost,fst.Final(state))) << endl; for(ArcIterator<VectorFst StdArc arc = aiter.Value(); str.push_back(arc.ilabel); printAllStringsHelper(fst, arc.nextstate, str, Times(cost, arc.weight.Value())); str.pop_back(); } } string vectorToString(vector if(v.size()
` 0)` "," + itos(v[i]);} return result + ">"; } string itos(int i) { // convert int to string; from Bjarne Stroustrup's FAQ stringstream s; s << i; return s.str(); } I have a similar but separate function pair for LogArc FSTs. It would be nice to have a single function pair that works for any arc type, and except for the types the code would be the same, but I don't see how to do it because there isn't a common superclass for LogArc and StdArc. If someone else sees how to do this, let me know! And please feel free to use this code :) Roger |
RogerLevy | 11 Jun 2008 - 20:17 |

Ah, I also forgot -- I believe that arc.weight.Value() can be replaced by arc.weight, as per Cyril's suggestion earlier. | RogerLevy | 11 Jun 2008 - 20:18 |

Roger, Many thanks for posting this code. One comment at this point: the initial call to printAllStringsHelper(fst, 0, str, tw) assumes that the start state is state 0. After combining and optimizing fsts, this may not be the case. Better to use printAllStringsHelper(fst, fst.Start(), str, tw). Ken |
KennethRBeesley | 13 Jun 2008 - 11:52 |

Roger, Another possibility: generalization to print Input vs. Output strings. Your code currently uses arc.ilabel. You could do enum Projection { PROJECT_INPUT, PROJECT_OUTPUT } ; void printAllStrings(VectorFst &fst, Projection proj) { ... printAllStringsHelper(fst, fst.Start(), str, tw, proj) ; } .... if (proj == PROJECT_INPUT) { str.push_back(arc.ilabel) ; } else { str.push_back(arc.olabel) ; } Ken |
KennethRBeesley | 13 Jun 2008 - 12:15 |

Roger, In testing for a final state, I think that your code is incorrect (or garbled in the posting). You have if(fst.Final(state) = TropicalWeight::Zero()) but (as I understand it) a state is final if the exit weight is not equal to Weight::Zero(), so if (fst.Final(state) = TropicalWeight::Zero()) with a not-equal operator, is what you want.Straighten me out as necessary. Ken |
KennethRBeesley | 13 Jun 2008 - 12:37 |

Roger, I see that my "correction" got corrupted the same way when posted. The test should be if (fst.Final(state) notEqualTo TropicalWeight::Zero()) where notEqualTo is the usual exclamation mark followed by an equal-sign. Ken |
KennethRBeesley | 13 Jun 2008 - 14:04 |

In another word, why does StdArc use TropicalWeight instead of FloatWeight or LogWeight. Is there some material that explains the significance of TropicalWeight? Also, what's the reason that there are Plus(), Times() and Divide() implemented for FloatWeight()? Because it is too trivial? |
JiaPu | 29 Feb 2008 - 18:57 |

sorry, type. It should read: ".... there are NOT Plus() ..." |
JiaPu | 29 Feb 2008 - 18:58 |

sorry, typo. It should read: ".... there are NOT Plus() ..." |
JiaPu | 29 Feb 2008 - 18:58 |

StdArc use TropicalWeight and LogArc use LogWeight. StdArc could have been call TropicalArc, but since it is the most commonly used [it correspond to the Viterbi approximation] a shorter name like StdArc was prefered. If you don't like it, you can still do:typedef StdArc TropicalArc; FloatWeight is just an abstract class from which LogWeight and TropicalWeight are derived. It does not correspond to a semiring, that's why it does not have Plus(), Times(), ... |
CyrilAllauzen | 04 Mar 2008 - 11:43 |

For the significance of the tropical semiring, you can check out the papers in FstBackground or this paper or that one. | CyrilAllauzen | 04 Mar 2008 - 11:50 |

Cyril, Please give me an algebra 101. According to the definition of semiring on wiki, float (or rational number) is a semiring. And when authorizing finite state grammar for speech recognition, it is more natural to use float as weight. Although, internally log weight is preferred. |
JiaPu | 04 Mar 2008 - 15:23 |

Float in FloatWeight only means than the underlying representation is a C/C++ float type. TropicalWeight and LogWeight both use C/C++ float to represent real numbers, hence they are both derived from the partially abstract type FloatWeight. Of course, the real number with as operation the usual plus and times is also a semiring. It is not implemented in the library for now because it is not very commonly used. We might add it in a future version. If we do, it will also be as a type derived from FloatWeight. Float just defines the set, it is not enough to define a semiring. Different times and plus operations would lead to different semiring sharing the same set of elements. Finally, you didn't have to all the way to wikipedia, there is some documentation on here on this website. |
CyrilAllauzen | 04 Mar 2008 - 19:10 |

When compiling following code: StdVectorFst in_fst; ... add some states and arcs to in_fst ... StdVectorFst out_fst Determinize(fst, out_fst); I got compiler error: /Volumes/Data/Projects/OpenFst/macosx/Corpus2SphinxFsg.cpp:82: error: no matching function for call to 'Determinize(fst::StdVectorFst&, fst::StdVectorFst&)' Since StdVectorFst is derived from Fst, I don't understand that why this doesn't compile. |
JiaPu | 28 Feb 2008 - 12:48 |

Aha, never mind. Just realize that Determinize() expects a pointer instead of a reference on the second argument. But why can't we use reference on both arguments? | JiaPu | 28 Feb 2008 - 13:18 |

This is one of our coding conventions: input arguments are declared as const references and output arguments are declared as pointers. | CyrilAllauzen | 29 Feb 2008 - 14:26 |

Are there any plans for a new release of OpenFst? If so, I have some requests ... Ken |
KennethRBeesley | 06 Feb 2008 - 17:20 |

There will be a new release of OpenFst. You're welcome to make some suggestions. | CyrilAllauzen | 08 Feb 2008 - 15:20 |

Cyril, Very glad to hear about the new release. I respectfully request the following: void Copy(&In, *Out) // create a deep copy of the input automaton Convenience boolean functions for testing an existing network: isLabelSorted(&Fst, side) isFunctional isAcceptor isWeighted isEpsilonFree isDeterministic Finally, a new text input-output format that is Unicode-capable. I'll try to write up some more specific suggestions for an XML-based notation. Thanks for all your good work, Ken |
KennethRBeesley | 13 Feb 2008 - 23:53 |

Hi Ken, A deep copy can already be performed using Map() with the IdentityMapper, however this might not be obvious at first so a Copy() could indeed be something useful. StdVectorFst in; StdVectorFst out; Map(in, &out, IdentityMapper<StdArc>());For the boolean functions, I'm not so sure that that: if(isAcceptor(fst)) ... is much simpler than: if(fst.Properties(kAcceptor, true)) ... Best, Cyril |
CyrilAllauzen | 29 Feb 2008 - 12:48 |

For functional, the issue is that the best algorithms to test for functionality are quadratic in the number of transitions. This prevented us from using it as a property bits, that's why there are no kFunctional. | CyrilAllauzen | 29 Feb 2008 - 12:53 |

Thanks Cyril, Your fst.Properties(kAcceptor, true) example was what I needed to get oriented. No boolean function isAcceptor(fst) is needed. Other beginners like me might want to look in fst/lib/properties.h for more property names, including: kAcceptor kNotAcceptor kWeighted kNotWeighted kEpsilons kNoEpsilons kIDeterministic (deterministic on the input side) kNonIDeterministic kODeterministic (deterministic on the output side) kNonODeterministic kILabelSorted (sorted with regard to input labels) kNotILabelSorted kOLabelSorted (sorted with regard to output labels) kNotOLabelSorted |
KennethRBeesley | 29 Feb 2008 - 17:06 |

I'm having trouble compiling another example from the Quick Tour page, to get myself going with FST composition. here it is: #include "fst/lib/fstlib.h" int main() { using fst::StdVectorFst; using fst::StdFst; using fst::StdOLabelCompare; using fst::StdILabelCompare; using fst::ArcSort; StdFst *input = StdFst::Read("input.fst"); StdFst *model = StdFst::Read("model.fst"); ArcSort(input, StdOLabelCompare()); ArcSort(model, StdILabelCompare()); StdVectorFst result; Compose(*input, *model, &result); } Trying to compile gives me the following error: rlevy@truffle:~/src/C++$ g++ -I /local/contrib/cpl/OpenFst Example1.cpp /local/contrib/cpl/OpenFst/fst/lib/libfst.so -lpthread Example1.cpp: In function ‘int main()’: Example1.cpp:16: error: no matching function for call to ‘ArcSort(fst::StdFst*&, fst::StdOLabelCompare)’ Example1.cpp:17: error: no matching function for call to ‘ArcSort(fst::StdFst*&, fst::StdILabelCompare)’ It's not obvious to me why it's not matching the function properly -- any ideas? Many thanks! Roger |
RogerLevy | 01 Feb 2008 - 01:14 |

This does not work because the input of ArcSort needs to be a mutable fst (of type MutableFst There are several solutions for this: 1. Restrict the program to only accepts as input VectorFst which is mutable: StdVectorFst *input = StdVectorFst::Read("input.fst"); StdVectorFst *model = StdVectorFst::Read("model.fst"); Simple but not general. 2. Copy model and input into a VectorFst before sorting: StdVectorFst input2(*input); ArcSort(&input2, StdOLabelCompare()); 3. Use on-the-fly sorting: ArcSortFst StdOLabelCompare> input2(*input, StdOLabelCompare()); Best, Cyril |
CyrilAllauzen | 29 Feb 2008 - 13:19 |

I am getting the same error, except I have already declared mutable FSTs. I would really appreciate some help! My code is of the form: #include "fst/fstlib.h" using namespace std; using namespace fst; int main() { StdVectorFst fst1; StdVectorFst fst2; StdVectorFst fst3; ArcSort(fst1, StdOLabelCompare()); fst3 = ComposeFst return 0; } Specifically, the compile error is: no matching function for call to 'ArcSort(fst::StdVectorFst&, fst::StdOLabelCompare)' Thank you very much, Jonathan |
JonathanB | 16 Oct 2009 - 12:03 |

Access control:

- Set ALLOWTOPICCHANGE = AllAuthUsersGroup

Edit | Attach | ~~Watch~~ | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions

Topic revision: r1 - 2014-04-21 - CyrilAllauzen

Copyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback