# OpenFst Forum 2011 Archive

## Can ComposeFst "shift" input labels?

### AlexZatv - 14 Dec 2011 - 13:05

Can ComposeFst "shift" input labels relatively output labels? Let say, I have T1: state1 state2 X1 Y1 state2 state3 X2 Y2 ....etc

And transducer T2: state1 state2 Y1 state2 state3 Y2 Z2 ....etc

Can I be sure that in T1oT2 I will see arc X2:Z2, (not X1:Z2 and X2:eps?) I understand, that theoretically it is not required. And I know that lookahead composition definitely will move this labels. But, can I force usual composition to not make such moves?

## Can you please explain why ProjectFst< StdArc >(C, PROJECT_OUTPUT) does not work, when C is of the type ComposeFst ?

### LexOr - 11 Dec 2011 - 10:47

Here is a code:

 line1: ComposeFst<StdArc> C(A, B, opts);

 line2: ProjectFst< StdArc >(C, PROJECT_INPUT);

I printed A, B and C after line1, and they are correct.
When I print C after line2, it is not become an acceptor, rather it is still a transducer.

LexOr

### LexOr - 11 Dec 2011 - 12:05

My fault... I found the bug.

LexOr

## reduce memory consumption of fstcompile

### JaroslavDobrek - 18 Nov 2011 - 04:02

Hello,

is there a way to tell fstcompile not to use the main memory, but the hard disk while compiling?

I am unable to compile really large transducers, because fstcompile consumes so much memory.

Jaroslav

### AlexZatv - 29 Nov 2011 - 03:24

I don't know good answer,but there is one bad:) someone in this forum was curious about binary fst format. If you will find it, you can make your own "fstcompile". Or, you can make your (partially) own fst format (not VectorFst or ConstFst), with routines to read and write it to disk, and register it with openfst.

### CarlosGonzalezCadenas - 14 Nov 2011 - 06:20

Hello,

We'd like to implement a Google-like autocomplete system. Given that we have 100s of millions of queries, for us memory footprint is as important as runtime input matching speed. Our queries have different probabilities and therefore for us it's critical some support for weighting. We think that using FSTs would be a good match for this problem.

Do you know if there is someone that actually has implemented an autocomplete system like the one we describe using OpenFST?.

If not, it would be great if you can provide some insights / guidelines about how to approach the problem.

Thanks a lot for your help. Carlos

### PaulDixon - 22 Nov 2011 - 06:46

Something along these lines this should work. First, Build a weighted deterministic Fst from the queries. Given an input match through the Fst as far as possible, then run a visit to get the possible completions.

## Bit of confusion on usage of ExpectationArc

### JuliusGoth - 08 Nov 2011 - 17:10

I was looking to implement an FST with arcs storing tropical weights as well as a vector of floating points, to hold expectation values for E-M, and it looked like ExpectationArc held the key to accomplish this. The documentation states the following on ExpectationWeight:

X1 is usually a probability weight like LogWeight X2 is usually a random variable or vector see SignedLogWeight or SparsePowerWeight

I tried several combination to no effect, including ExpecationArc<StdArc, vector>, ExpectationArc<StdArc, StdArc>, etc. with no success. I think that perhaps the second sentence in the documented source code:

If X1 is distinct from X2, it is required that there is an external product between X1 and X2 and if both semriring are commutative, or left or right semirings, then result must have those properties.

Addresses my problem, but can someone tell me what I must do to satisfy this requirement? Do I need to implement another Arc subclass? Thanks in advance!

### JuliusGoth - 08 Nov 2011 - 17:21

To add, the expectation values would consist of a set/vector of two floating point weights.

## Nice way of computing total cost of all paths in FST?

### JuliusGoth - 02 Nov 2011 - 14:33

Anyone know of an easy way to do this? I tried one method involving converting all arc input and outputs to Epsilons (Relabel), then running RmEpsilon. The latter produces the following error twice:

ERROR: SymbolTable::Write: write failed

I'm sure there's a nice recursive algorithm for computing this, but is there some way to do this via some command line operations? Thanks!

### PaulDixon - 02 Nov 2011 - 22:29

Maybe ShortestDistance in the log semiring.

### AlexZatv - 11 Nov 2011 - 05:43

Theoretically, wfst may give the weight only to input string (acceptor) or to mapping from input string to output string. So, theoretically, RmEpsilon don't have to work in your case.

I'm not sure, but may be this will help: 1)change all symbols to epsilon, as you do 2)add only one arc from new state to previous start-state, with some non-epsilon-symbol (let say, A). And make that new state as start state ( like superfinal - "superstart":) ). 3)RmEpsilon in log semiring 4)determinize (this will not needed if you add "super-superfinal" arc, not "superstart") After that, you will have only that your arc with some weight, and final state with some weight. Make Times(arcW,finalW) - and, I think, you will get what you need.

I think this will work.Am I right?

### MichaelRiley - 29 Dec 2011 - 22:52

ShortestDistance as Paul says.

## Link errors in ubuntu linux

### NewGuy - 02 Nov 2011 - 09:42

I got the same error, ../script/.libs/libfstscript.so: undefined reference to dlopen' ../script/.libs/libfstscript.so: undefined reference to dlerror'

can someone help me ?

### AlexZatv - 02 Nov 2011 - 11:16

Hello! It's something strange. I have openfst on ubuntu linux 10.10, and it's work fine. I don't know that's the problem, but may be I'll give someone a hint.

dlopen and dlerror are located in libdl.so. Bug like this sometimes happen in my programs if I try to link with openfst libraries, but forget to link with libdl.so. To link with it, you need to pass -ldl to compiler. But, I see that this parameter is passed to libtool... May be, if you will pass --enable-static to configure-script,it will work?

### SamHardwick - 15 Nov 2011 - 10:51

At least in some cases this is due to the linking directives -lm and -ldl ending up in the middle of the compilation line. This can be remedied by replacing in src/bin/Makefile.am the line

AM_LDFLAGS = -lm -ldl

with

AM_LDFLAGS = ../script/libfstscript.la ../lib/libfst.la -lm -ldl

and omitting all the tool-specific LDADD-lines (ones like fstarcsort_LDADD = ../script/libfstscript.la ../lib/libfst.la).

### KevinUnhammer - 26 Dec 2011 - 12:51

Just to give another data-point, this change made 1.2.9 compile for me on xubuntu 10.10 oneiric

### MichaelRiley - 29 Dec 2011 - 22:54

This is fixed I believe in just uploaded 1.2.10. Used above approach but with LDADD not AM_LDFLAGS.

## Link errors in ubuntu linux

### AnakreontasMentis - 25 Oct 2011 - 07:32

The linker reports errors when compiling the sources. The reported error is:
Making all in bin
make[3]: Entering directory /home/anakreon/tmp/d/openfst-1.2.7/src/bin'
g++ -DHAVE_CONFIG_H   -I./../include -I./../script    -g -O2 -MT fstarcsort.o -MD -MP -MF .deps/fstarcsort.Tpo -c -o fstarcsort.o fstarcsort.cc
mv -f .deps/fstarcsort.Tpo .deps/fstarcsort.Po
/bin/bash ../../libtool --tag=CXX   --mode=link g++  -g -O2 -lm -ldl  -o fstarcsort fstarcsort.o ../script/libfstscript.la ../lib/libfst.la
libtool: link: g++ -g -O2 -o .libs/fstarcsort fstarcsort.o  -lm -ldl ../script/.libs/libfstscript.so ../lib/.libs/libfst.so
../script/.libs/libfstscript.so: undefined reference to dlopen'
../script/.libs/libfstscript.so: undefined reference to dlerror'
collect2: ld returned 1 exit status
make[3]: *** [fstarcsort] Error 1
make[3]: Leaving directory /home/anakreon/tmp/d/openfst-1.2.7/src/bin'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory /home/anakreon/tmp/d/openfst-1.2.7/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory /home/anakreon/tmp/d/openfst-1.2.7'
make: *** [all] Error 2


## HistogramPruneQueue ? or, ConstantSizeQueue ?

### AlexZatv - 14 Oct 2011 - 07:32

Are there any state queues which guarantee upper size limit? I want a queue which will contain, let say, 2000 items. And, if someone will try to add one more item, item with the worsest weight will be deleted from the queue, and new one will be added.

There is NaturalPruneQueue - it is almost like I want, but it is not have the size limit

## Modifying an arc in constant time

### HasanAkram - 13 Oct 2011 - 04:50

Is it possible to do the following task in constant time?


for(MutableArcIterator<CustomFST> aiter(&fst, state_id);!aiter.Done(); aiter.Next())
{
GallicArc<LogArc, STRING_LEFT> arc = aiter.Value();
if(arc.ilabel==input_symbol)
{
arc.ilabel = label;
aiter.SetValue(arc);
break;

}
}



All i need to do here is to modify one single arc, and my code should work in constant time. I am sure openfst has better ways that I am missing.

### AlexZatv - 14 Oct 2011 - 07:26

As far as I understand, it will be O(n) from number of outgoing arcs. Also you can sort your fst (ArcSort), sorting arcs by input labels , and use matchers to find ilabels that you want - it will be O(log(n)).

Also, in KALDI http://kaldi.sourceforge.net/index.html they implement and use TableMatcher. I don't ever use it, but as far as I understand, it will give you constant time. But it will use more memory.

May be there is something better then this.

## fstshortestpath gives reversed output?

### DavidZhang - 09 Oct 2011 - 23:24

Hi there. I have composed two fsts
fstcompose A.fst B.fst out.fst

and run a fstshortestpath on the output

fstshortestpath out.fst out.shortest.fst

But the final fat (out.shortest.fst) outputs a reversed string as it should be.

Any ideas? Thanks!

### CyrilAllauzen - 12 Oct 2011 - 15:03

Hi, are you sure about this? The state ordering will be reverse topological and fstprint will print the arcs in reverse but the machine should be well formed.

## Odd behavior for RmEpsilon for Fst with LogArc

### JuliusGoth - 05 Oct 2011 - 21:45

I have the result of an Fst composition with epsilon arcs (both in and out labels), with some self loops. Some of the weights are not provided and the default 0. I read another message regarding zero weights and undefined behavior but Verify() returns true so it seems like a valid FST. Part of the printed FST looks like this:

0     1     <epsilon>     <epsilon>     2.29587674
1     1     <epsilon>     <epsilon>
1     2     that     that     4.43530607
2     3     <epsilon>     <epsilon>     1.30640042
2     4     <epsilon>     <epsilon>
2     5     is     is     5.14776754
3     3     <epsilon>     <epsilon>
3     6     is     is     6.12910652


When I run RmEpsilon in code and generate the FST again, I get something like this:

0   1   that   that   -0.200286627
1   3   is   is   0.504037499
1   2   is   is   -1.78467798


What am I missing? Why are negative weights being produced in this case?

### JuliusGoth - 09 Oct 2011 - 08:48

After some tweaking, I did notice that removing self-loops does eliminate the odd negative weights in the post-RmEpsilon fst.

Another concern was raised, however. I tried the following example FST:

0   1   <epsilon>   <epsilon>   2.29587674
0   2   <epsilon>   <epsilon>
0   3   But   But   3.66515589
1   4   But   But   6.58336163
2   3   But   But   3.66515589
3   4   <epsilon>   <epsilon>   1.03407383
4   5   <epsilon>   <epsilon>   1.04145384
5   6   through   through   6.26490784
6   7   <epsilon>   <epsilon>   1.45167708
7   8   I   I   3.89830112
8   9   <epsilon>   <epsilon>   1.6464082
8   10   <epsilon>   <epsilon>
8   4.20359564
9   3.0302887
10   4.20359564


Running RmEpsilon I came up with the following FST:

0   2   But   But   8.87923813
0   1   But   But   2.97200871
1   3   through   through   8.34043503
2   3   through   through   7.30636168
3   4   I   I   5.34997845
4   3.23925138


I traced this on paper to verify that the weights were being consolidated correctly (from the elimination of epsilon transitions), and noticed that state 0 -> 1 on the new FST is approximately 0.7 less than it should be. State 0 -> 3 (or 0 -> 2 -> 3) is 3.66 pre-RmEpsilon and is 2.97 in the post-RmEpsilon FST. First, have I missed something in manually verifying this? Second, is there an easy way to eliminate self-loops programmatically to prepare an FST for running RmEpsilon, or is it just a matter of walking through the arcs and somehow removing them from the FST?

### AlexZatv - 11 Oct 2011 - 11:03

Hello! I don't know about programs to eliminate the self-loops, but it is very easy to do. Just put the output of "fstprint" to some script. In this script you read the input string, split on parts. If number of parts>2 and part1=part2, it's a self loop, and you can just skip it.

### CyrilAllauzen - 12 Oct 2011 - 14:24

Hi, I'm not sure why you're surprised by the presence of negative numbers. Adding two numbers less than 1 can lead to something greater than 1 hence +-log adding to positive number can lead to a negative number.

As for your second example, -l(e(-3.66515589) + e(-3.66515589)) is 2.97200870944005469070 so the output of RmEpsilon looks fine to me.

Cyril

## Bug in VectorFst?

### AlexZatv - 05 Oct 2011 - 10:14

Hello!

VectorFst may fail, if after the creation of ArcIterator, user will add some arc with AddArc function, or call ReserveArcs(). In such case, in initArcIterator, we will store the pointer to the first arc. But, AddArc or ReserveArc can relocate array of arcs, so pointer will point to nothing (and,probably, nArcs will be wrong number).

Is it a bug, or it is so "by design" ?

### CyrilAllauzen - 12 Oct 2011 - 14:27

Hi, this is by design and similar what STL does with vectors. Mutating a vector invalidates the opened iterators, mutating the arcs at a given state invalidates the arc iterator opened for that state.

### AlexZatv - 13 Oct 2011 - 03:18

Thank you for explanations!

## Minimize gives different results on different systems

### TrevorJim - 04 Oct 2011 - 11:45

Hi. I've run into an FST that minimizes differently on different operating systems. Is this expected behavior?

Both systems are running openfst 1.2.7 with the default configuration.

The resulting FSTs are equivalent according to fstequivalent, however, they are not identical, which is inconvenient for some of my regression tests.

System 1 (a 64-bit Linux):

    $uname -a Linux ono 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 x86_64 Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz GenuineIntel GNU/Linux$ g++ --version
g++ (GCC) 4.6.1 20110819 (prerelease)


System 2 (Mac OS X):

    $uname -a Darwin -------------.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386$ g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)


Input FST (a deterministic acceptor).

### CyrilAllauzen - 12 Oct 2011 - 14:34

Hi, Many things could affect the state ordering hence leading to not identical machines. You could change your regression to test that the machines are equivalent and have the same number of states.

### TrevorJim - 19 Oct 2011 - 09:19

Thanks, but the transducer is an intermediate result and not easily checked in our workflow.

I should have mentioned that we are using unweighted acceptors; I'm surprised that minimize is not deterministic on these.

## Help with setting up openfst on windows7

### SophieChan - 29 Sep 2011 - 08:08

Hi, I have to use openfst to calculate the edit distance between a group of strings with a fixed finite state acceptor. My friend suggested me to use the C++ version of openfst. But I am using 64 bit windows7 and visual studio 2010 express. I tried to load the 64 bit file of openfst but it did not work on my computer. Is there a way around it? For example, can I use the linux version or the command line version to calculate the edit distance?

Best Sophie

### PaulDixon - 29 Sep 2011 - 18:12

What exactly didn't work with 64bit on Windows? Compiling the solution or running an exe? Unless you need more than 2GB of memory for your operations you could just run the pre-built 32bit binaries on 64bit Windows.

## not-integer StateId's - is it possible?

### AlexZatv - 29 Sep 2011 - 05:22

I'm making Fst class, with states stored in list, not vector, to speedup adding and deleting of states. So, it is natural for me to declare StateId as pointer to internal state structure. But, to define my ListFst as child of MutableFst, I have to override pure virtual function StateId NumStates() const;

Here NumStates must return the number, but my StateId's are pointers! How can I deal with this? May be it's simply impossible with OpenFst to have non-integer StateId's?

### PaulDixon - 29 Sep 2011 - 19:39

Even if you could get the NumStates to return a pointer I think it will cause problems for many of the algorithms. As buffers are frequently created based on the range of stateids (e.g. shortestpath/distance).

It's seem you are trying convert graph that uses pointers to OpenFst format. How you considered performing the conversion to OpenFst once no more edits are required, that way you can easily use hash_map to convert pointers to stateids and use the VectorFst

### AlexZatv - 30 Sep 2011 - 05:14

Paul, Thank you for answer! I make NumStates() return NULL, just to try it. I see many compile-time errors, exactly in shortestpath (in queues specifically).

I can convert my graph to VectorFst after making all edits, or I can write my own implementation of some fst algorithms that I use. But I think that it will be better, and will work faster, if I will do my own lattice-class under OpenFst's interface. If it's possible,of course. If it is not, then why they introduce StateId type, not just us integers?

Now I want to try it this way. In internal state structure, I will store the number of state. I will make the class "StateId" with pointer to state, and operator of conversion to integer. It is fast, just dereferencing the pointer. If it will work,it will be fine.

If not, I will have to make backward conversion, from integer ids to state's pointers. It have to be the map or something else, big and/or slow:(

Or I will stop this attempts, and start to make my own ShortestPath and determinization for my specific structure:(

### AlexZatv - 30 Sep 2011 - 10:18

Hmm, may be std::deque ...

I compile with StateId as wrapper around the pointer to the lists, fst::ShortestPath needs for operation++, and sometimes for "StateId i = 0". It is possible to do with lists, but this: StateId nscc = *max_element(scc_.begin(), scc_.end()) + 1; is something I don't know how to implement.

## Running stream of input through FST and obtaining output and total weight

### JuliusGoth - 28 Sep 2011 - 22:14

If I have made an fst using fstcompile, how would I process an array of input tokens and obtain the output as well as the weight? I was writing an implementation that traversed the fst arcs with the input but thought maybe there was an easier way to perform the same task, especially since I have to deal with eps arcs.

### PaulDixon - 29 Sep 2011 - 02:10

Create a chain Fst that represents your input. One arc per token left-to-right. Then compose this with the other Fst. If there is only one valid output sequence you can get the output and weight using fstprint and fstshortestdistance. If there is more than one path you will need to enumerate them (there is old post by Roger Levy with code to do this)

There are probably some similar examples in the tutorials here: http://www.openfst.org/twiki/bin/view/FST/FstBackground

## More updated documentation on OpenFst internals?

### AlexZatv - 27 Sep 2011 - 08:13

Are there more fresh papers about OpenFst internals and organization, except "OpenFst: A General and Efficient Weighted Finite-State Transducer LIbrary", 2007, mentioned in FstBackground (in this wiki)? Looks like the library was changed from the state described in that paper.

I want to implement smth like VectorFst, but with states and arcs stored in the lists,and I'm not sure what must be the base class for me - MutableFst? How should I use FstImpl ? What if I don't want to waste memory storing NumInputEpsilons and NumOutputEpsilons in every state? etc.

## Customized composition

### JuliusGoth - 20 Sep 2011 - 23:26

Wanted to do the following:

FST1 transduces string x to y with weight a FST2 transduces x to y with weight b Composition transduces string x to y with weight (a x b)

Since this isn't the default behavior of Compose(), what is the best way of going about this? Do I need to define a custom matcher and composition filter?

### MarkusD - 27 Sep 2011 - 13:04

You could map the input and output symbols to new pairwise symbols. So your FST1 is just an acceptor that accepts the symbol x|y with weight a, for example. FST2 accepts x|y with weight b. (You could use functions from encode.h to do that.) You can just intersect these two acceptors. In the result, you can map the symbol x|y back to input x and output y, so you can use it as a standard finite-state transducer.

### MarkusD - 03 Oct 2011 - 14:16

(Of course, the epsilon symbol loses its special status in that case; your FST1 and FST2 must already specify certain fixed alignments between input and output.)

## Modifying arc using matcher

### HasanAkram - 17 Sep 2011 - 07:55

I am using the following piece of code to modify an arc weight. The problem is I have to iterate all the arcs to find the right arc. I was wondering if there is a
 Matcher
equivalent as mutable arc, where I could find the mutable arc more efficiently?

for(MutableArcIterator<CustomFST> aiter_r(&fst, red);!aiter_r.Done();aiter_r.Next())
{
GallicArc<LogArc, STRING_LEFT> arc_r = aiter_r.Value();
if(arc_r.ilabel==e_red.ilabel)
{
GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw_r = GallicWeight<StdArc::Label, LogWeight, STRING_LEFT>(lcp, arc_r.weight.Value2());
arc_r.weight = gw_r;
aiter_r.SetValue(arc_r);
}
}


### PaulDixon - 25 Sep 2011 - 02:20

If your arcs are input label sorted you could try replacing the linear search in your code with a binary search, this is what a the SorterMarcher is doing. The implementation is in SortedMatcher::Find().

Modifying or using a Matcher might not give you the exact behavior you want because it implements an implicit self loop which is used when the searching for an epsilon label.

## Bug in RmEpsilon and arcs with a weight of Zero() ?

### RomainLerallut - 09 Sep 2011 - 11:15

Using RmEpsilon on a StdVectorFst with epsilon:epsilon arcs with a weight of Weight::Zero treats those arcs as if they had a weight of One.

Input FST:

• 3 states: 0,1, and 2, 0 is start, 2 is final
• arc 0->1 with label epsilon and a weight of Zero
• arc 0->1 with label 2 and a weight of 5.67
• arc 1->2 with label 1 and a weight of 1.234.
(0) --0:0/inf-->(1)--1:1/1.234-->((2))
\__2:2/5.67__/


Expected result:

(0) --2:2/5.67-->(1)--1:1/1.234-->((2))
\_____________1:1/inf___________/

Actual result:
(O) --2:2/5.67-->(1)--1:1/1.234-->((2))
\___________1:1/1.234____________/     <= PROBLEM !!

The infinite weight has not been taken into account when adding the 0->2 arc.

I digged into the code and it may be linked to stale data in ShortestDistanceState::distance_ that is not reset when encountering an arc with infinite weight, but if a more knowledgeable person could take a look that would be great.

I am using an older version of OpenFst (1.1.something, I think) but I didn't see any major difference in the code. FYI, I'm calling directly the C++ functions.

Thanks for any help, Romain

### PaulDixon - 12 Sep 2011 - 19:45

Not sure if this helps, but I think the library doesn't consider arcs with Weight:Zero as well formed. fstcompile will not accept them and if you run Verify on this Fst you will get an error message.

### RomainLerallut - 14 Sep 2011 - 10:54

Indeed, I hadn't thought of that. We use Weight::Zero in some degenerate cases to preserve the FSTs' topologies but I guess we'll have to lift that requirement and just not add any arc in that case.

Many thanks for the answer, Paul.

### MarkusD - 27 Sep 2011 - 13:18

Connect() should get rid of those arcs.

## Why doesn't LogArc work for shortest path algorithm?

### JuliusGoth - 05 Sep 2011 - 11:10

I read in a comment from 2010 concerning Log weighs not working (only Tropical is implemented). Are Log weights not right distributive? I would like to use either this or the Real arc implementation for maximum likelihood estimation. Any help is appreciated.

### JuliusGoth - 05 Sep 2011 - 11:20

To clarify, I've received the following error when trying to run ShortestPath in code:

FATAL: SingleShortestPath: Weight needs to have the path property and be right distributive: log

It seems this issue was addressed but as I mentioned above, the reason for it doesn't seem clear to me.

### PaulDixon - 05 Sep 2011 - 23:01

The log semiring doesn't have the path property (a+b = b or a+b =a) this is why you can't compute the shortest paths. Because the log semiring doesn't have the path property if you were to compute the potential for some state, there might not be a path to that state that has this value.

In the log semiring it is possible to compute the shortest distance instead.

### JuliusGoth - 09 Sep 2011 - 09:58

Thanks for clearing that up. After some examination, it makes sense.

To follow up on this: if I used log weights to produce a log probability, What calculation do I use to get the real probability value? I'm aware of logging probabilities to avoid underflow, but usually the procedure is logging, summing, then taking the "anti-log". Is it just that simple? Doesn't seem like I'm getting probabilities less than 1 in this case.

## 1.1 Version source?

### JuliusGoth - 21 Aug 2011 - 09:24

I was trying to integrate the expectation semiring implementation, fstrain, with OpenFST and noticed fstrain had dependencies on an older include file, mains.h, that no longer exists in the 1.2 distribution. Was this file in the 1.1 version? If so, what was the purpose of this file? Is the 1.1 source of OpenFST available anywhere so that I can modify fstrain to work with the 1.2 version? Thanks very much.

### PaulDixon - 22 Aug 2011 - 04:49

mains.h was part of the old weight registration mechanism. In version 1.2 the weight registration has changed. See Cyril's post from 19 Nov 2010 for the procedure in version 1.2

### JuliusGoth - 01 Sep 2011 - 14:19

Thanks. I was also wondering if documentation is available for this version of OpenFST? It doesn't seem to be in the package.

### PaulDixon - 05 Sep 2011 - 23:02

You could try running Doxygen on the source directory.

## List-based Fst ?

### AlexZatv - 04 Aug 2011 - 04:18

Hello! I want to make WFST lattice generation as described in “Efficient General Lattice Generation and Rescoring”. It requires adding many states during decoding, while VectorFst and ConstFst stores all the states in vectors (as far as i understand).

Are there OpenFst implementations which store the states in the list? Are such implementations possible/effective in conjunction with projection and pruning algorithms? Thank you in advance!

## StringWeight and epsilon

### HasanAkram - 19 Jul 2011 - 08:20

I have defined a StringWeight as the following:
 StringWeight<StdArc::Label, STRING_LEFT> w1;

when ever i enter an epsilon or 0, using: w1.PushBack(0), the String is an epsilon, no matter how many times I call w1.PushBack(0);. However, when I do it on the right side of another input symbol, for example:

 w1.PushBack(1);
w1.PushBack(0);

the output is 1_0. My question is, epsilon is the multiplicative identity of String semiring, so no matter how many times i call w1.PushBack(0); the output should be 1 and not 1_0. Can anybody explain me if I am misunderstanding anything here?

## FSTs with Predicates

### DamienDaspit - 18 Jul 2011 - 03:49

Is it possible for OpenFst to support input and output predicates as opposed to symbols, like what was outlined by Noord and Gerdemann in "Finite State Transducers with Predicates and Identities"? If you do not take in to consideration that new implementations of certain operations would be required, such as determinization and composition, would something like this be possible?

## How to populate GallicArc?

### HasanAkram - 03 Jul 2011 - 08:14

Hi,

I have defined a custom fst using GallicArc and now want to populate the values dynamically. Now, my question is concerning the GallicWeight format. How exactly should I build a gallic weight. E.g, if its a

 StdAcr
we simply do the following:
StdVectorFst fst;
fst.SetStart(0);


How should the same thing be done in case of the following:


typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;
void MakeGallicFST()
{

CustomFST fst;

GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw;

GallicArc<LogWeight, STRING_LEFT> ga;
}


### HasanAkram - 03 Jul 2011 - 08:34

To be more precise with my question, what I need is probably the following:
 ga.weight = ???
So, the questions are: 1. What should be the format of the weight? 2. I can certainly put a GallicWeight object here, now the question is how to build a GallicWeight. For example, in the following code:
 gw.Read(???);
What should be the format of the input file? 3. Can we manipulate
 gw.Value1();
or
 gw.Value2();
? If yes, how?

### HasanAkram - 03 Jul 2011 - 11:25

I came up with the following solution, but i am not sure if this is the most elegant way of doing it.


void MakeGallicFST()
{

CustomFST fst;
fst.SetStart(0);

fst::StringWeight<StdArc::Label, STRING_LEFT> sw;

sw.PushFront(1);
sw.PushFront(2);
sw.PushFront(3);
sw.PushFront(4);

LogWeight lw = LogWeight(0.5);

GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw = GallicWeight<StdArc::Label, LogWeight, STRING_LEFT>(sw, lw);

GallicArc<LogArc, STRING_LEFT> ga;

ga.ilabel = 1;
ga.olabel = 1;
ga.nextstate = 0;
ga.weight = gw;

ga.ilabel = 1;
ga.olabel = 1;
ga.nextstate = 1;
ga.weight = gw;

}


### PaulDixon - 15 Jul 2011 - 07:06

Read() is for reading in binary format. If you need to read text format use the extraction operator >>. The text format for weight in your example would look like this 1_2_2_3,0.5. Also, the text format of the empty string would be Epsilon,0.5 not 0,0.5. regardless of how many times you called sw.PushFront(0);

### HasanAkram - 15 Jul 2011 - 09:13

thanks! should something like this work with command line fstcompile if i put the following as text input file:
0 1 a 1_1_1,0.5
1 0 b 2_2_2,0.5

if not, what would be the right format?

### PaulDixon - 16 Jul 2011 - 06:46

No it won't work. The shell tools can only handle the log or tropical semirings. It is possible to build a shared object that will make the shell tools able to handle a user defined semiring as described here

The below code fails to compile because the GallicWeight does not implement all the required members. If you add the missing members it might work, however, there could be other reasons not to do this.

#include <fst/const-fst.h>
#include <fst/edit-fst.h>
#include <fst/vector-fst.h>
#include <fst/script/register.h>
#include <fst/script/fstscript.h>

using namespace fst;
namespace fst {
namespace script {
typedef GallicArc<LogArc> LogGallicArc;
REGISTER_FST(VectorFst, LogGallicArc);
REGISTER_FST(ConstFst, LogGallicArc);
REGISTER_FST(EditFst, LogGallicArc);
REGISTER_FST_CLASSES(LogGallicArc);
REGISTER_FST_OPERATIONS(LogGallicArc);
}
}


Compile and make sure the .so is on LD_LIBRARY_PATH

g++ -fPIC gallic-arc.cc -o gallic-arc.so -shared -O2


## Transducer learning algorithm (e.g. OSTIA) in openfst

### HasanAkram - 03 Jul 2011 - 07:26

I was wondering if there is an existing implemented OSTIA algorithm (the transducer learning algorithm by Oncina el al.) using openfst? Or do you know anyone who implemented such thing?

### JuliusGoth - 01 Sep 2011 - 15:18

fstrain (Dreyer et al.) exists which uses expectation semiring.

## why log/tropical semiring is -log(p), not just log?

### AlexZatv - 24 Jun 2011 - 04:39

Sorry for possibly stupid question, but why is it so? Looks like basic algorithms will not change if it will be log(p), may be I overlook something?

### AMorenoDaniel - 24 Jun 2011 - 13:41

legitimate question and not stupid at all. I guess some (including myself) like to think in terms of 'costs' rather than 'scores'. Since it makes no difference, it is best to stick to one of them.

### AlexZatv - 27 Jun 2011 - 04:05

I am experimenting with wfst decoders, and this thing make my brain exploding:) I mean, writing Weight::One() than I want 0, ">" than I want "<",etc. It's really easier to think in term of costs with wfst, thank you for the tip!

## compare weights

### AlexZatv - 22 Jun 2011 - 08:26

How can I compare weights? I want to drop some weights which are out of the beam. Now I use (Times(w,Beam).Value() >= (wBest).Value()), for tropical semiring, but I'm sure there is some better way.

### CyrilAllauzen - 22 Jun 2011 - 12:21

The "proper" way to compare weights is to use NaturalLess, it is defined in weight.h.

## using ToGallicMapper for alternative transition representation with ProductWeight or GallicWeight

### HasanAkram - 18 Jun 2011 - 15:32

Hi all,

I am trying to represent the transitions as Q \times (\Sigma \cup {\epsilon}) \times \Delta^* \time K \times Q. To do that I am suing String semiring as output and LogWeights and trying to map them to ProductWeight representation using ToGallicMapper. My code is the following: Map(*fst, &gfst, ToGallicMapper<GallicArc<LogArc, STRING_LEFT>>, STRING_LEFT>()); However, its not working. It seems like the parameter types are not right. Can anybody tell me what am I probably missing here?

### CyrilAllauzen - 20 Jun 2011 - 16:53

To convert from LogArc to GallicArc<LogArc, STRING_LEFT> , you need to use ToGallicMapper<LogArc, STRING_LEFT> .

### HasanAkram - 22 Jun 2011 - 06:04

Thanks! I am still having some error with my code:
 typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;
typedef VectorFst<LogArc> LogVectorFst;
CustomFST gfst;

Map(fst, &gfst, ToGallicMapper<LogArc, STRING_LEFT>());


the error msg is the following:

 Error   1   error C2040: 'fst' : 'fst::StdVectorFst *' differs in levels of indirection from 'fst::StdVectorFst'

Is it also possible to push the string weights along the arc?

### HasanAkram - 22 Jun 2011 - 06:49

actually the eoor msg should be:
Error   1   error C2664: 'fst::GallicArc<A,S> fst::ToGallicMapper<A,STRING_LEFT>::operator ()(const A &) const' : cannot convert parameter 1 from 'const fst::ArcTpl<W>' to 'const fst::LogArc &'

### CyrilAllauzen - 22 Jun 2011 - 12:19

The problem is that fst is a StdArc Fst. As I said, ToGallicMapper<LogArc, STRING_LEFT> converts a LogArc Fst into a GallicArc<LogArc, STRING_LEFT>.

### HasanAkram - 22 Jun 2011 - 14:36

thanks! it worked finally. Now I have a question regarding the internal representation of string semiring. If my output alpabet is {x, y} and the output symbol file is the following:
<eps> 0
x 1
y 2

as far as undersatnd, the internal representation of a label x is 1. However, now i can have output label xx, xyx, xxx etc. How will these labels be internally represented? Is there any normalization function to bring the labels as close to the initial state as possible? Its not only the normalization. The labels should be close to the initial state with the longest common prefix operation in the string semiring.

### CyrilAllauzen - 23 Jun 2011 - 16:05

The string weights are represented as a list of integer labels, the sting weight xx will be the list 1,1. The operation you want is pushing toward the initial state (FST.PushDoc), pushing the weights in the gallic semiring. If you are going to the gallic just for that, you could have simply used label pushing instead.

### HasanAkram - 23 Jun 2011 - 18:44

The reason I am going gallic is my transitions are Q \times (\Sigma \cup {\epsilon}) \times \Delta^* \time K \times Q. Now the questions are the following: 1. Is that a good reason to use gallic or is there any better way to do that? 2. The operation you mentioned, would it push the gallic weights, i.e., the labels and the log weights? What I want is only to push the output labels. Is there any way to treat the logweight and the output string semiring differently? 3. Would it be possible to give a short example, showing if I define a
 typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;
how exactly am i supposed to add the arc, e.g., how can you give xx as a parameter? How to define x as a symbol, and not xx. In the example that I am working on, all the output label are
 -1
when I display them using an iterator. Any hint why is that happening? 4. Another problem is, I cant draw the fst using fstdraw when i map the fst to Gallic. However, it works with StdVectorFst.

### CyrilAllauzen - 24 Jun 2011 - 11:15

1. Using the gallic is a good way to do that.

2. Indeed, but it is possible to push only the string part of the weight. Look at fst/push.h when pushing labels only.

3. In your case, you want to set the olabel to be the same as the ilabel. The string in \Delta^* should be part of the weight. Look at the fst/string-weight.h and fst/pair-product.h to see how to build a GallicWeight.

4. The GallicArcs are not registered by default to avoid code bloat. If you want the command line utilities to work you need to create a DSO file as described here. In you case MyArc is GallicArc<LogArc, STRING_LEFT> and my_arc is gallic_log.

### HasanAkram - 28 Jun 2011 - 12:06

Thanks Cyril. btw, is there an existing implemented OSTIA algorithm (the transducer learning algorithm by Oncina el al.) using openfst? Or do you know anyone who implemented such thing?

## Trying to minimize an automaton leaves the automaton unchanged

### ErikAxelson - 15 Jun 2011 - 08:07

Hi all,

I'm trying to minimize the non-minimal automaton

0       1       1       1       200
0       100
1       1       1       1       100
1


and expect to get as a result the minimal automaton

0       0       1       1       100
0       100


However, calling fstminimize produces the same non-minimal automaton. I also get a negative result from fstequivalent if I compare the non-minimal automaton and the equivalent minimal automaton. I tried to modify the non-minimal automaton with

fstpush --push_weights and --to_final


to see if it would produce the correct output for fstminimize or fstequivalence but the results where the same as earlier.

I'm using OpenFst version 1.2.7.

Regards, Erik

### ErikAxelson - 15 Jun 2011 - 08:09

Should be

fstpush --push_weights --to_final


without the word "and".

### CyrilAllauzen - 20 Jun 2011 - 16:54

These two machines are not equivalent.

### ErikAxelson - 21 Jun 2011 - 05:05

Could you give me an example? I think both automata accept the empty string with weight 100, "1" with weight 200, "11" with weight 300, etc.

best regards

### CyrilAllauzen - 21 Jun 2011 - 12:34

Minimize and Equivalent work by first normalizing the machine(s) using pushing and then applying the unweighted minimization/equivalence algorithms.

The problem is that this is a case where for the output of fstpush --push_weights to have the required normalized property (the ⊕-sum of the outgoing transition weight and final weight at each state is 1) would require the use of initial weight, which the library does not support. This leads to the creation of an additional state to represent the initial weight. Hence in both machines, calling minimize lead to a machine with 2 states (instead of 1). We should update the documentation to make this clear.

The fact that equivalent returns that the machines are not equivalent is a bug due to the way we deal with the initial weight. Since the first machine is acyclic at the initial state, the initial weight is added to all the outgoing transition of the initial state. The second machine is cyclic at the initial state, so we cannot do the same thing. Instead a new initial state is created with an epsilon transition to the old initial state with weight the initial weight. Equivalent then treats this epsilon transition as a regular symbol and conclude that the machine are not equivalent. We will fix this bug in the next release of the library.

Best, Cyril

## Openkernel question

### KonstantinSelivanov - 12 Jun 2011 - 10:02

Hi,

I'm trying to use openkernel for my task. I managed to compile n-gram based kernel and calculate svm model. Prediction tests are fine.

But I can't really understand how can I use the kernel and svm model to predict my working set. My intuition is that for each string I have to compile an automaton and use it as input to svm predict function. But that function expects an svd vector.

So my question is how can I use openkernel (and svm) tools to predict a class of the single string (that is not in the training/test corpus).

Thanks! Kostya.

## SegFault

### GellingD - 31 May 2011 - 13:20

Hi,

I've built a wFST from a language model using the commandline tools. Now, I load this model in my c++ program, and want to compose it with some other FST, that I've just manually created in c++ using the same symboltable.

However, when sorting the FST that I created in advance, a segmentation fault occurs. The same thing happens, if I just compose the two FSTs without sorting the loaded FST. Counting the amount of states in the FST is no problem, though, neither is sorting the manually created FST.

Strangely, when doing something similar with the commandline tools, no segfault occurs either, though I do not sort the FSTs then. Any idea where I might look to find the cause of this segfault?

example code:


ArcSort(&fst, StdOLabelCompare()); // sort locally created StdVectorFst
cout << "number of states in model: " << (*model).NumStates() << endl; // runs fine
ArcSort(model, StdILabelCompare()); // generates segfault
cout << "check" << endl; // doesn't get printed

StdVectorFst result; // create local FST to store result
Compose(fst, *model, &result); // if second ArcSort is not performed, this generates segfault


### CyrilAllauzen - 20 Jun 2011 - 16:56

Hi,

I thing you should try to run Verify(*model) to make sure the model Fst is well-formed.

## What is Segmentation fault?

### JoachimChau - 26 May 2011 - 04:09

When I run the following command: fstencode --encode_labels model.fst encoder|fstdeterminize -|fstminimize -|fstencode --decode - encoder model_min.fst The model.fst is about 1.3G, after about 10min, the shell command exits and says: Segmentation fault

### JoachimChau - 26 May 2011 - 11:06

I run it step by step, the error occurred in the encoding step after minimized.

### JoachimChau - 26 May 2011 - 17:02

Hi, all
I tried to minimize the determinized Fst got in the fisrt step, in shell, it didnot stop for hours or just got a segmentation fault in less than 30min; in VS2008 with C++, it didnot stop. The Fst is about 6.1G and converage over ShortestDistence. My C++ code is as follows, did I got something wrong?
   StdVectorFst *ifst = StdVectorFst::Read(iPath);
EncodeMapper<StdArc> encoder(kEncodeLabels|kEncodeWeights, ENCODE);
Encode(ifst, &encoder);
Minimize(ifst);
Decode(ifst, encoder);
ifst->Write(oPath);


Any help appreciated

### JoachimChau - 30 May 2011 - 11:28

At last I got the result by runing the C++ code to minimize the determinized wfst. But if I run the Determinize and Minimize togther, it never stops(the determinize step stops in about 40min at linux shell, also the minimize step, but gets "Segmentation fault" when decoding). I don't know why.

### PaulDixon - 02 Jun 2011 - 19:49

What optimization flags are using when you compile the C++ code?

Have you tried compiling the Decode tool (or a simple standalone decoder) with debugging information and running the process in a debugger?

## wFSTs as language model

### GellingD - 24 May 2011 - 13:42

Hi,

I built a small unigram lm using the openFST commandline tools. it seems to be working fine for tokenizing sentences, but I'm wondering if composing could fail. Would it just return an empty fst? I want to use the models in c++ code, but the Compose function doesn't seem to have any error flags or the like.

The reason I'm asking, is that I want the model to be able to fail certain sentences, as they cannot form a valid english sentence (right now it still accepts everything, due to the inclusion of single-character words, but I will change this)

Any help appreciated

### PaulDixon - 24 May 2011 - 19:37

It depends on the type of the composition you are performing and the options set in ComposeOptions.

If you use the Compose method and leave the ComposeOptions.as default, it the composition is not able to produce a valid output fst with no dead-end states the output fst should have NumStates() = 0 and Start() = -1

If you the ComposeFst or the Compose method with connect set to false. You Fst may have dead-end states. In this case you will need to look for paths that have final states or run the Connect algorithm and see if the WFST has NumStates() = 0 and Start() = -1.

## fst determinization and minimization problem

### MabS - 13 May 2011 - 11:42

Hi All,

I am trying to determize and minimize the non-functional transducer as: fstencode --encode_labels G.fst enc.dat G.enc.fst fstdeterminize G.enc.fst G.det.fst fstencode --decode G.det.fst enc.dat G1.fst fstminimize G1.fst GM.fst

where G.fst is the language model text fst

But fstdeterminize takes a very long time and does not complete.

Reply is highly appreciated. Thank you. Cheers Mabs

## rho and sigma in openfst

### BlueSky - 22 Mar 2011 - 12:58

Hi Guys, Does anyone know how to represent the rho and sigm in the openfst(python openfst). for example for epsilon it is "openfst.epsilon" I am trying to find the same for "rho" Cheers,

## Cache usage in delayed composition of cached ComposeFsts

### MaciekSk - 13 Mar 2011 - 09:07

Hi Everyone, in my program I create one instance of a delayed composition of two large Fsts: (don't mind my pseudo C++/python language)

opts.gc_limit = 4096*2^20 delayedWithCacheFst = ComposeFst(fst1,fst2,opts)

Next, I use delayedWithCacheFst many times in composition with other different fsts:

for fst3 in manyCasesOfFst3: opts.gc_limit=0 finalComposedFst = ComposeFst(fst3, delayedWithCacheFst, opts) ShortestPahts(finalComposedFst, outFst, npaths, sp_opts) delete finalComposedFst

The behavior I hoped for is, that the cache of delayedWithCacheFst is reused in all further compositions, and speeds up the ShorthestPath alg. (since fst3 are different but from a set of similar problems).

The construction of all finalComposedFst should be fast (in this composition I use input matcher from delayed fst, the gc_limit is set to 0).

However, the behavior I observe suggests that with each loop revolution the cache of delayedWithCacheFst is copied and deleted. Is there any clear reason for that?

Both, the composition "ComposeFst(fst3,delayedWithCacheFst,...)" as well as freeing the memory "delete finalComposedFst" take seconds to run. It seems that the cache is created, used and deleted for each fst3 separately.

Why is this the case? How to avoid that?

I'm using OpenFst Version 1.2.6. I also use SWIG interface for many C++ classes, I hope this is not the reason for my problems.

### MaciekSk - 13 Mar 2011 - 14:49

ooops, sorry for the mess with newlines.

I did some experiments and I think I almost solved the problem. This has something to do with InputMatchers reusage.

Is it the case that composition cache is kept in the input matcher?

## how to use compose to reverse an fst?

### CordeliaZhang - 11 Mar 2011 - 20:17

Hi, is there a way to use compose to reverse an FST? That is, design b.fst such that c.fst is the reverse of a.fst after the operation of compose: fstcompose a.fst b.fst >c.fst Thanks, Cordelia

## Converting a grammar to an wfst

### ChristopherStanley - 10 Mar 2011 - 19:01

Hi, I was wondering if there is some kind of openfst library support for converting a grammar, like JFGS grammars to an FST? Or maybe this is not even possible? I'm pretty new to openfst and finite state stuff in general, at the moment I'm working on my bachelor's degree. Actually I read that the JSGF grammar is not just regular but includes some 'context-free' grammar features. But the finite state machines are regular, right? So i guess this means that there are some things that the JSGF can do that we cannot transform into an FST? I guess that is kind of rambling, but I heard that there was something like this and wanted to find out for sure.

### MaciekSk - 14 Mar 2011 - 17:02

Hi Chirstopher, OpenFST is a library of templates and interfaces, so nothing prevents you from implementing some kind of pushdown automaton (~= context-free grammar) behind the typical FST interface. As long as you don't try to Expand all of its (infinite number of) states, everything should be ok.

Now, afaik, translating grammars to OpenFST is not supported, but it shouldn't be too difficult with some simple parser, like Python PLY+SWIG, as long as the grammar is in some normal form. Example, if the grammar is in http://en.wikipedia.org/wiki/Greibach_normal_form then you could easily implement 1-1 correspondence between states and non-terminals, with additional rules that push further non-terminals from each transition on the stack, and whenever an epsilon trans. is met, the current state is replaced by the one popped from the stack, and edges follow from there.

### MaciekSykulski - 16 Mar 2011 - 07:10

...acttually a type of PDT is already implemented <fst/extensions/pdt/pdtlib.h>. http://openfst.cs.nyu.edu/doxygen/fst/html/pdtlib_8h_source.html

### AlexZatv - 31 May 2011 - 12:44

Looks like it is implemented in transducersaurus.

### AlexZatv - 31 May 2011 - 12:44

Looks like it is implemented in transducersaurus.

### AlexZatv - 31 May 2011 - 12:44

Looks like it is implemented in transducersaurus http://code.google.com/p/transducersaurus/

### AlexZatv - 31 May 2011 - 12:45

sorry for duplication of the message:)

## Epsilon Normalization and its effect on Weight Pushing

### EdWhittaker - 10 Mar 2011 - 09:53

Hi, I have a couple of large-ish, stochastic, integrated (CLGT) WFSTs. All operations have been performed in the log semi-ring. When I weight push and minimize these WFSTs after the final determinization it works fine e.g.:

fstpush --push_weights detCLG | fstminimize - MindetCLG

completes successfully in a couple of minutes.

If I try a similar operation, but after performing epsilon normalization e.g.

fstpush --push_weights EpsdetCLG | fstminimize - MinEpsdetCLG

sometimes it's pushable, sometimes it's not, and by not I mean that it takes more than a few days not to complete before I kill it.

(If I do epsilon normalization after weight pushing and minimization this invariably works but I'm not convinced the resulting WFST is stochastic, based on the above observation.)

The only difference between pushable and not was the original LM I used to construct G, but I think this difference is spurious.

So my first question is: is there any reason why epsilon normalization might sometimes render a stochastic (or at least pushable) WFST un-pushable? Looking at the function description I don't see that this should be the case.

My second question is: given that epsilon normalization is obviously useful, in terms of reducing the size of the final WFST, is it then better/safer/more reliable to run it as the final operation after weight pushing or can this negate the putative benefits of weight pushing for decoding?

### EdWhittaker - 17 Apr 2011 - 08:44

Eventually solved this by tweaking the LM parameters slightly. Now all networks can be pushed reliably in the log semiring irrespective of the fst operations applied or their order of application.

P.S. The weight push in the commands above is redundant (as others have pointed out) since fstminimize appears to implicitly calls this prior to performing minimization on the encoded automaton.

### HuanliangWang - 16 Jun 2012 - 06:12

Hi, EdWhittaker Could you tell how to tweak the LM parameters slightly? If I don't normalize LM weight, in some case fstminimize will fail when composing lexicon fst and LM fst.

## fstdeterminize vs. Determinize()

### JosefNovak - 09 Mar 2011 - 01:29

Hi, I've noticed in some cases it seems that fstdeterminize (the command line tool) is significantly faster than running Determinize() in a C++ program on the same input. Of course the result is exactly the same, but Determinize() seems to take noticeably longer. I'm curious as to why this might be the case? Are there perhaps some default properties that are set in the fstdeterminize command line tool, that I should setup when I use Determinize()?

### JosefNovak - 09 Mar 2011 - 03:58

Nevermind, a look at fstdeterminize.cc cleared things up.

### NikolayShmyrev - 05 Mar 2011 - 11:26

Hi all

It builds C*det(L*G) and doesn't determinize the output. There are even duplicated transitions. I drop duplicated transitions from the result and it seems to decode well.

Now, following the Paul's tutorial

I'm trying to determinize the cascade to get det(C*det(L*G)) and indeed it determinized properly. I only had to change scripts to keep input aux symbols which are mapped to epsilon by default scripts. However, decoding with the new determinized cascade is worse. It's slower and less accurate.

I see that with determinized decoding there are more states reached from single one by epsilon-in transitions. It's significantly better with non-deterministic cascade. I suppose this because determinization moved the labels in cascade around.

I wonder if there is some issue with determinization or I just need more clever pruning in the decoder to get results comparable to non-determinized cascade?

### NikolayShmyrev - 05 Mar 2011 - 19:53

Interesting that if I do last det over standard semiring, not over log, it gets better.

### PaulDixon - 07 Mar 2011 - 06:59

You mean det(Cdet(LG)) with the final determinization in the tropical semiring gives better ASR characteristics than Cdet(LG) all performed in log semiring?

### NikolayShmyrev - 07 Mar 2011 - 20:23

Hi Paul

Yes, that's what I mean. But I'm more interested why it doesn't improve over Cdet(LG) without final determinization

### NikolayShmyrev - 07 Mar 2011 - 20:25

I'm also interested why transducersaurus doesn't normalize Cdet(LG). It doesn't seem like juicer will do any online normalization during decoding.

### JosefNovak - 09 Mar 2011 - 00:43

Hi, Are you referring to 'epsilonnormalization'? I don't think this really makes much difference. At the moment, the python scripts in the project are designed to be the absolute simplest you can get away with, and still build a high performance LVCSR cascade (this is also why the default C transducer is just generating input epsilons on the explicit aux arcs - otherwise we have to remove them afterwards). They are mainly intended as learning tools. I'm planning the C++ build tools to be much more flexible.

I'd like to also point out that the absolute accuracy really shouldn't drop as a result of those optimization commands, it should just shift the accuracy versus real-time factor curves one way or the other (i.e. it requires a higher or lower beam to achieve the same accuracy, and the search is slower).

I'll try and generate some comparison curves for Juicer later in the week to see whether there are still issues there.

Also, I don't know if these questions are directly relevant to OpenFST though, maybe they would best just be directed to the project?

### NikolayShmyrev - 09 Mar 2011 - 19:57

Hi Josef

> Are you referring to 'epsilonnormalization'

Sorry, that was a typo. I meant final determinization as described in Paul's tutorial. I understand it's a simplies way so as long as it works it's ok. But don't you remove duplicate edges before decoding?

>I'd like to also point out that the absolute accuracy really shouldn't drop as a result of those optimization commands

Maybe, it shifted to some unexplorable area in my case > 20xRT

>I'll try and generate some comparison curves for Juicer later in the week to see whether there are still issues there.

That would be interesting to look on, thanks a lot.

> Also, I don't know if these questions are directly relevant to OpenFST though, maybe they would best just be directed to the project?

I would love to discuss it in other place. Honest this forum is very uncomfortable, no notification, no edits. But is there any more suitable discussion place for transducersaurus?

### JosefNovak - 10 Mar 2011 - 19:31

i made a google group which ive listed on the project web page. i doubt it will get much activity, but it is probably a more appropriate place for discussing questions specific to the project. hopefully we can build something nice that will foster further learning and be useful to others.

## Bash Completion Script

### PaulDixon - 28 Feb 2011 - 06:57

I uploaded a bash completion script to the contrib section. The script will add tab completion support for the flags of the OpenFst commands.

## Building OpenFST with unicode support on Mac OSX 10.6.

### JiaPu - 25 Feb 2011 - 17:51

Hello,

When I run "./configure --with-icu" on Mac OS 10.6.6, I got following error messages: * The icu-config script could not be found. Make sure it is * in your path, and that taglib is properly installed. * Or see http://ibm.com/software/globalization/icu/ configure: error: in /Volumes/Data/private_code/openfst-1.2.7': configure: error: --with-icu was given, but ICU Library v. 4.2 or newer not found

After a little poking around, I found that the configure script is looking for a "icu-config" file. Does anyone know if there's such file on Mac OS 10.6, and if yes, its location?

Thanks. jia

### JiaPu - 25 Feb 2011 - 18:14

Never mind. Just realized that this is a well-known issue. ICU is shipped without header on mac.

## FST Minimization

### BlueSky - 08 Feb 2011 - 05:09

Hi All,

I would like to ask about 2 things. First does anyone know a method in openfst which prints the fst with the characters not ascii ?

I have a problem with the minimization function, i created an fst with 60,000 words. The fst is created and determinized, but then the minimize function does not finish??

I appreciate the help :), Cheers Meme

### DoganCan - 10 Feb 2011 - 18:53

Hey,

1. I believe if you add the --with-icu flag during installation, you get unicode support. (Well, assuming you have ICU installed on your system, etc.)

2. You need to be more specific about the fst you are trying to minimize. Try to give a toy example for us to understand what it looks like.

Best, Dogan

### BlueSky - 11 Feb 2011 - 03:56

Thanks Dogan, I am representing each word in the fst by: for example: "car", each transition contains: c:c/0.0 a:a/0.0 r:r/0.0 all words end in a final state. cheers, Meme

### NikolayShmyrev - 22 Feb 2011 - 06:40

Hi I also have such issue combining lexicon and n-gram model. Here is the FST you can try

It's relatively small but it's minimization hangs.

### PaulDixon - 22 Feb 2011 - 20:52

Hi,

When you run minimization it will first weight push the machine and in some semirings this might not terminate (See Mohri 2002 Semiring frameworks and algorithms for shortest-distance problem. Michael Riley pointed this issue out to me).

You can verify this by converting to tropical semiring and see that it terminates when running.

fstprint lg.bfsm | fstcompile | fstminimize | fstinfo

But this will give poor ASR performance. If you really want to minimize the machine another solution is to first the encode weights and labels and then run minimization.

### NikolayShmyrev - 24 Feb 2011 - 07:07

Yes, this way it works. Thanks for the explanation.

## how to use fst to input and output?

### MengyunKong - 04 Feb 2011 - 19:57

Hi all,

After fstcompile and some other operations, say I now have a.fst. How can I "input some word into" the transducer and "get the output string" that I want?

(using python / shell)

Thanks.

### DoganCan - 10 Feb 2011 - 18:59

Hey,

I believe you are looking for fstcompose.

fstcompile a.txt > a.fst
fstcompile b.txt > b.fst (this is the fst representing the "some input word" you are referring to)
fstcompose b.fst a.fst > c.fst (this is the fst representing the "output string" you are looking for)
fstprint c.fst


Best, Dogan

### MengyunKong - 12 Feb 2011 - 20:01

Yes, I've figured it out. Thanks:)

### HasanAkram - 24 Jun 2011 - 07:31

Could you please who an example, how to to in in C++ code?

## Memory leaks after deleting transducers

### JosepMCrego - 03 Feb 2011 - 06:39

Hello,
Did anyone experience memory leaks after using 'delete' to delete a transducer created through 'Read(string)'? An example:

#include <fst/fstlib.h>
int main(int argc, char** argv){
string lm=argv[1];
delete fstlm;
cout << "Done!" << endl;
while (true) {}
return 0;
}


When reading a large (~1Gb) transducer, the delete operation produces memory leaks that I noticed using 'top'.
I'm using openfst-1.2.5 on a 64bits linux machine.
Thanks

### CyrilAllauzen - 03 Feb 2011 - 18:15

Hi,

I don't think there is a memory leak in VectorFst (we do run memory leak checks) and nothing in your example proves otherwise. Who knows how freed memory is reclaimed by the system and what top is reporting? You might want to run a memory leak checker (such as valgrind) if you still have doubts.

#include <fst/fstlib.h>
#define N 2
int main(int argc, char** argv){
string lm=argv[1];
fst::StdVectorFst *fstlm;
for (int i = 0; i < N; i++) {
delete fstlm;
cout << "Done!" << endl;
}
while (true) {}
return 0;
}


On my 64-bit linux machine, top reports much more memory used during the while(true) {} loop for N=1 than for N=2`.

Best, Cyril

## Has anyone used OpenFST for morphological analysis?

### RandyWest - 02 Feb 2011 - 16:57

Hi all,

I have a task similar to the following that I'd like to use OpenFST for: From some input word w, I would like to create an FST with output O corresponding to the result of a thesaurus lookup for w. The difficulty is that I would like all o in O to be of the same word form as w. I have no a priori knowledge of w's form other than the word itself, e.g., I cannot distinguish between the noun "home" and the verb "home," but when I see "homed," I would like to only output past tense verb synonyms. Likewise, I would like to use this type of word form analysis to cull any entries in the thesaurus whose possible forms do not match any of those for the input word.

I am considering using the CELEX database for this analysis, but it is as yet unclear to me how feasible that would be. Has anyone in the OpenFST user community performed similar analysis and could point me in the right direction?

Best, Randy

### KonstantinSelivanov - 03 Feb 2011 - 06:35

I've used OpenFST for morphological analysis of Russian. Here is example http://delinguis.com/morph Here is a shallow description http://delinguis.com/art/morph

But I didn't catch the problem you are solving. You don't have to disambiguate on the morphological level. Your analyzer's task is to produce all possible morph labels for a given word.