TWiki> Forum Web>FstForum>FstForumArchive2007 (revision 1)EditAttach

OpenFst Forum 2007 Archive

Newbie question: compiling a C++ program using the OpenFst library

This is a newbie question about how to successfully compile a C++
program that refers to the OpenFst library. I'm a newcomer to C++,
though I'm comfortable with C and with object-oriented programming
(e.g., Java). As a first example I tried constructing a program out
of the example "Creating FSTs Using Constructors and Mutators From
C++":

***

#include "fst/lib/fstlib.h"

int main()
{

using fst::StdVectorFst;
using fst::StdArc;

// A vector FST is a general mutable FST
StdVectorFst fst;

// Adds state 0 to the initially empty FST and make it the start state.
fst.AddState(); // 1st state will be state 0 (returned by AddState)
fst.SetStart(0); // arg is state ID

// Adds two arcs exiting state 0.
// Arc constructor args: ilabel, olabel, weight, dest state ID.
fst.AddArc(0, StdArc(1, 1, 0.5, 1)); // 1st arg is src state ID
fst.AddArc(0, StdArc(2, 2, 1.5, 1));

// Adds state 1 and its arc.
fst.AddState();
fst.AddArc(1, StdArc(3, 3, 2.5, 2));

// Adds state 2 and set its final weight.
fst.AddState();
fst.SetFinal(2, 3.5); // 1st arg is state ID, 2nd arg weight

fst.Write("binary.fst");

}

***


When I try to compile this program (with filename Example.cpp) using
g++ version 4.1.2 (on a 64-bit Linux machine), I get some errors which
I'm having a hard time deciphering:

rlevy@morel:~/src/C++$ g++ -I /local/contrib/cpl/OpenFst Example.cpp
/tmp/cc1hh0FE.o: In function `fst::SymbolTable::Write(std::basic_ostream >&) const':
Example.cpp:(.text._ZNK3fst11SymbolTable5WriteERSo[fst::SymbolTable::Write(std::basic_ostream >&) const]+0x1c): undefined reference to `fst::SymbolTableImpl::Write(std::basic_ostream >&) const'
/tmp/cc1hh0FE.o: In function `fst::FstImpl::WriteHeaderAndSymbols(std::basic_ostream >&, fst::FstWriteOptions const&, int, fst::FstHeader*) const':
Example.cpp:(.text._ZNK3fst7FstImplINS_6StdArcEE21WriteHeaderAndSymbolsERSoRKNS_15FstWriteOptionsEiPNS_9FstHeaderE[fst::FstImpl::WriteHeaderAndSymbols(std::basic_ostream >&, fst::FstWriteOptions const&, int, fst::FstHeader*) const]+0xc3): undefined reference to `fst::FstHeader::Write(std::basic_ostream >&, std::basic_string, std::allocator > const&) const'
/tmp/cc1hh0FE.o: In function `fst::CompatProperties(unsigned long long, unsigned long long)':
Example.cpp:(.text._ZN3fst16CompatPropertiesEyy[fst::CompatProperties(unsigned long long, unsigned long long)]+0xd7): undefined reference to `fst::PropertyNames'
/tmp/cc1hh0FE.o: In function `unsigned long long fst::TestProperties(fst::Fst const&, unsigned long long, unsigned long long*)':
Example.cpp:(.text._ZN3fst14TestPropertiesINS_6StdArcEEEyRKNS_3FstIT_EEyPy[unsigned long long fst::TestProperties(fst::Fst const&, unsigned long long, unsigned long long*)]+0x18): undefined reference to `FLAGS_fst_verify_properties'
collect2: ld returned 1 exit status


Could anyone lend me insight into the problem I'm encountering here?

Many thanks, and apologies for the newbie question...

Best

Roger
RogerLevy 27 Nov 2007 - 13:33
You simply need to link with the library (and with the pthread library)!
For instance:
$ g++ -I /local/contrib/cpl/OpenFst Example.cpp /local/contrib/cpl/OpenFst/fst/lib/libfst.so   -lpthread
CyrilAllauzen 27 Nov 2007 - 15:37
Thank you so much Cyril -- this solved the problem!

Best

Roger
RogerLevy 27 Nov 2007 - 19:59

Autoconfiscating patch

I have uploaded a very preliminary patch that autoconfiscates OpenFst. There are four files, a configure.ac, a Makefile.am, and two template files to generate test shell scripts.

Suppose the patch and uncompressed/untarred OpenFst are in directory ~/stage, then, assuming one has a recent autoconf, automake, and libtool installed and one wishes to install to a subdirectory ~/usr of one's home directory
$ cd ~/stage/OpenFst; patch < ../openfst_auto_20071112.diff
$ autoreconf --install
$ ./configure --prefix=$HOME/usr
$ make
$ make install
will install OpenFst. I have the impression that the headers go to ~/usr/include/fst/lib, the executables, which all begin with "fst", go to ~/usr/bin, and the library libfst goes to ~/usr/lib.
To run tests for whether dlopen() works
$ make test

I have tested the patch only on Mac OS 10.4. I have tested building in a directory separate from the source directory. I have not tested the installed executables.

Is libfstmain also supposed to be accessible after installation? Right now I believe the patch statically links it into the executables.

I have not extended the patch to work on Cygwin. Right now the patch has no host dependent code, but for Cygwin I believe I would be forced to use AC_CANONICAL_HOST and then test for Cygwin so that the proper flags are passed to produce dynamic modules and not static libraries.

I am willing to keep working on the patch to fill your requirements. Right now the patch does not recursively run make on subdirectories as there are only three subdirectories to process.

I also disclaim any copyright on the patch--the patch is completely in the public domain.
DavidShao 13 Nov 2007 - 00:30

Lazy VectorFst?

What happens when you construct a VectorFst from some lazy machine, like in the following example:

Fst* composed = new ComposeFst(fst1, fst2);
VectorFst
fst3(*composed);

Is the result fst3 still a lazy machine?

The background: I need to pass a machine into a function that needs a MutableFst. I can't pass ComposeFst or Fst directly, so I need to transform it to VectorFst, but want to be sure that the states are still only expanded lazily.

Thanks!
Markus
MarkusD 08 Nov 2007 - 13:46
No, fst3 is not a lazy machine. The constructor of VectorFst will expand the lazy machine.

Yes, you need to copy a ComposeFst into a VectorFst before passing it as an argument to a function that needs a MutableFst.

If a function (provided by the OpenFst library) requires its argument to be a MutableFst (or ExpandedFst) that means that the underlying algorithm only works with on expanded machine.

Best,
Cyril
CyrilAllauzen 09 Nov 2007 - 13:58

porting openfst to Windows

Hi,

I am starting to work on porting openFst to Windows/VisualC++. Is anyone interested or already working on this subject ?

Thank you,

Chris
ChristopherKermorvant 30 Oct 2007 - 07:10
I am generally interested in seeing OpenFst ported to Windows, especially if the result remains free, open-source and Apache-licensed. Please keep us informed.

Ken
KennethRBeesley 25 Sep 2008 - 16:47

Evaluating the Weights

Hi all,

I have a FST and a training data set (input and output pairs). How should I evaluate the weights for the arcs? I want to count the number of times each arc has been crossed. But each time I compose the FST with the input, I get a new FST and the stateID would change. And I will not know which arc it corresponds to in the original FST. How should I deal with this problem?

Many thanks in advance!

Wei
WeiChen 04 Oct 2007 - 09:10

fstminimize

Hi all,

While testing some very simple examples I came across something which confused me w.r.t. the behavior of fstminimize: Running fstrmepsilon on the output of fstminimize can actually produce a more "minimal" FST in some situations. Is this to be expected?

Small example:


#!/bin/bash

function draw_fst() {
fst_file="$1"
dot_file="${1%%.fst}.dot"
ps_file="${1%%.fst}.ps"
fstdraw --title="$fst_file" "$fst_file" "$dot_file"
dot -Tps "$dot_file" >"$ps_file"
gv "$ps_file" &
}

cat <a.txt
0 1 a x
1 2 b y
2
EOF
cat <a.isyms
0
a 1
b 2
EOF
cat <a.osyms
0
x 1
y 2
EOF

fstcompile \
--isymbols=a.isyms \
--osymbols=a.osyms \
--keep_isymbols \
--keep_osymbols \
a.txt a.fst
fstunion a.fst a.fst union.fst
draw_fst union.fst
fstminimize union.fst minimize.fst
draw_fst minimize.fst
fstrmepsilon minimize.fst rmepsilon.fst
draw_fst rmepsilon.fst


Cheers,
Andy
AndySchlaikjer 19 Sep 2007 - 20:05
Seems as though the code got a little mangled in the original post. Each of the lines beginning with "cat " is missing the "here-document" operator (two less-than signs + "EOF").. Hope it makes sense. AndySchlaikjer 19 Sep 2007 - 20:08
This is indeed completely expected.

1. Determinize and Minimize treats epsilon as a regular symbol. Hence, if the input has epsilon transitions, the result is not minimal if you interpret epsilon as the empty word (i.e. true epsilon).

2. Assuming that the input has no epsilon transitions, Minimize will produce the minimal deterministic automaton equivalent to the input. There might be non-deterministic automata with fewer states however.

What you observe is a combination of 1+2 since epsilon-removal indeed interprets epsilon as the empty word and produce a non-deterministic machine.

Best,
Cyril
CyrilAllauzen 20 Sep 2007 - 13:15
Ah, i see now. Union was producing a non-deterministic machine for this example, and i was handing that over to Minimize which actually requires as input a deterministic machine. This was made clearer when I tried to run the sequence "Union, Epsnormalize, Minimize" and Minimize produced the error "FATAL: Input Fst is not deterministic".

So it seems as though there are two ways to deal with this: (1) "Union, Epsnormalize, Determinize, Minimize", to ensure both Determinize and Minimize wouldn't encounter epsilon symbols, or (2) "Union, Minimize, Rmepsilon", though I'm not convinced this second one will always produce equivalent output..

So are there other situations in which it can be useful (perhaps for efficiency?) to run Determinize and Minimize algorithms on machines containing "partial" non-determinism due to epsilon edges?

Thanks,
Andy
AndySchlaikjer 20 Sep 2007 - 14:31
In (1), no need to use EpsNormalize, you can simply use RmEpsilon instead. If you follow (2) by Determinize and Minimize you should get the same result as (1).

Indeed, in some situation you cannot remove epsilons because it will lead to a space blow up. In that case, you might still want to optimize the machine with epsilons using determinization and minimization.

Best,
Cyril
CyrilAllauzen 21 Sep 2007 - 11:52

fstdeterminize

Hello,

I have an FST with log-arcs (an acceptor). I use ShortestDistance to compute the sum of its paths from initial state to all final states, and it somewhere around 0. However, after I run fstdeterminize, the sum of paths becomes -0.26 instead of the value close to 0. Shouldn't fstdeterminize maintain the value of sum of paths?

Thanks

Jerr.
JerrRo 05 Sep 2007 - 15:43
Indeed, fstdeterminize should maintain the value of the sum of all paths. Are you doing anything else than determinization (no minimization for instance)? And is everything indeed done in the log semiring?

The issue might be due to numerical instabilities. One solution would then be to use a smaller delta (in DeterminizeFstOptions).

Is your machine cyclic? Is the size of the determinized machine much larger than the original machine? The issue might then be that the machine is not determinizable and that fstdetermize terminates because of the numerical approximations. It is unlikely but possible.

In the log semiring, a cause of non-determinizablility is having two states q and q' both accessible from the initial state with paths labeled with the same input string x and such that q and q' both admits cycles labeled with the same input string y and the weight of that cycles are not equal.
CyrilAllauzen 09 Sep 2007 - 14:10

make on 64-bit Linux

Hi,

running make on a 64-bit machine does not work. Adding -fPIC to the CFLAGS in the Makefiles of bin, lib and test solves that problem.

Cheers,

Christian
ChristianKofler 04 Sep 2007 - 11:23

deleting transducers?

Suppose I have a subroutine that creates an empty transducer from heap memory, populates it somehow according to passed-in arguments, and passes back a ptr to it

StdVectorFst * makeFst( ... params ...) {
StdVectorFst *newfstp = new StdVectorFst() ; // get ptr to an empty FST
// add arcs, specify initial and final states
return newfstp ;
}

And assume that in the calling program I eventually want to delete it. Is it as
simple as calling 'delete' ;

StdVectorFst *ptr = makeFst(...args...) ;
// do something with the Fst, perhaps mutate it
// until I'm done with it
delete ptr ; // will this work as expected and avoid memory leaks?

Thanks,

Ken
KennethRBeesley 23 Aug 2007 - 17:59
Yes, this will work. CyrilAllauzen 23 Aug 2007 - 18:24

Crossproduct and Complement

1. Is there a general crossproduct algorithm that takes two acceptor networks and
produces a tranducer that encodes the crossproduct?

2. How about complement/negation? I presume that it would be limited to
unweighted networks?

Thanks,

Ken
KennethRBeesley 23 Aug 2007 - 13:03
1. You simply needs to create a transducer T with one unique state and a transition with labels (a,b) for each input symbol a and each output symbol b. You can then use composition: A o T o B would represent the cross product.

2. Complement is currently a library internal-only operation. It is limited to unweighted deterministic automata.
CyrilAllauzen 23 Aug 2007 - 18:10
2. Forgot to mention that you should use Difference instead of Complement. CyrilAllauzen 25 Aug 2007 - 13:14

AT&T FSM format (text format) and Unicode

You confirmed earlier that in Symbol Table Files that map symbols to internal
integer labels, one could simply use real Unicode code point values for the
integers. That was good news, but it leads to a few questions:

1. In the old AT&T documentation I read

"Some FSM operations allocate internal arrays based on the maximum inte-
ger used as an input or output arc symbol. This design choice, chosen
for efficiency, requires the user to avoid huge integer labels (e.g.,
INT_MAX) since memory may otherwise be exhausted."

Is that also the case with OpenFst? If so, it would be a major disincentive to
using real code point values, especially if (like myself) you work with characters encoded
in the Unicode supplementary area. Comments?

2. If you are using Supplementary Chars, or even Greek, Cyrillic, Hebrew, etc.,
then symbol table files could intuitively
contains lines such as the following (for the Deseret Alphabet):

?? 66560
?? 66561
?? 66562
?? 66563
?? 66564
?? 66565
etc.

Using such a symbol table, encoded in Unicode UTF-8, I tried 'fstcompile' on the following
file, also encoded in UTF-8

0 1 a b 0.5
1 2 ?? ?? 0.5
2 3 x y 0.5
3 1.0

fstcompile choked on line 2, which contains the first characters in this file
outside the ASCII range.
fstcompile has many options, but I haven't seen anything (yet) that would cause
it to handle the text input and symbol-table files as UTF-8. Have I missed
something?

3. (Previously mentioned topic) If one does use real Unicode code-point values
for the label integers, it would be convenient to allow them to be represented,
in the symbol-table files, in hex, e.g.

?? 0x10400
?? 0x10401
?? 0x10402
?? 0x10403
?? 0x10404
?? 0x10405
etc.

Comments would be welcome.

Thanks,

Ken
KennethRBeesley 23 Aug 2007 - 12:56
I see that the Unicode chars did not survive the reposting process in my question above. When pasted in, each ?? was a Unicode char.
Ken
KennethRBeesley 23 Aug 2007 - 13:05
The library can only handle ascii for symbol files and fst text format.

Note that the use of a symbol file is not required at all. You can directly define your fst using the "code-point value". Or write a script that turn the UTF-8 characters into their "code-point value".
CyrilAllauzen 24 Aug 2007 - 09:30
Thanks for the response.
I see that I asked too many questions in one posting. Sorry.

Here's Question #1 again:

In the old AT&T documentation I read

"Some FSM operations allocate internal arrays based on the maximum inte-
ger used as an input or output arc symbol. This design choice, chosen
for efficiency, requires the user to avoid huge integer labels (e.g.,
INT_MAX) since memory may otherwise be exhausted."

Is that also the case with OpenFst??
If so, it could be a major disincentive to using real code point values as labels, especially if (like myself) you work with characters encoded in the Unicode supplementary area.

Thanks,

Ken
KennethRBeesley 24 Aug 2007 - 18:18
No, it is not the case for OpenFst. The library does not allocate such arrays. CyrilAllauzen 29 Aug 2007 - 11:21
In the OpenFst - Release 1.3 news (http://www.openfst.org/twiki/bin/view/News/FstNews), it is written that "--with-icu configuration option no longer needed". Do that mean that OpenFSt 1.3 can handle directly UTF8 characters? YoMa 02 May 2012 - 07:54

ShortestDistance

Hi,

If I am not mistaken, when calculating shortest distance with reverse=true (say, in the log semiring), meaning - the backward values, the weight for states which have no access to the final state should be LogWeight::Zero (or +inf). However, when I did that for a machine of the following 0-1> 0->2 1 is final, 2 is not final, I got from ShortestDistance LogWeight::One() (very small value close to 0) when calculating shortestdistance with reverse=true for state 2. Am I misunderstanding something?

Thanks.


Jerr.
JerrRo 17 Aug 2007 - 13:28
You should indeed get LogWeight::Zero() for state 2. However, the size of the vector returned by ShortestDistance can be less than NumStates(),i.e, it is the maximum state visited as mentioned here [bad link?].

If a state i is such that i < distance.size(), then its shortest distance is distance[i] otherwise it is Weight::Zero().

In the example you gave, I suspect that distance.size() is 1 since 2 would likely not be visited by ShortestDistance when reverse is true.

Best,
Cyril
CyrilAllauzen 17 Aug 2007 - 16:45

composition of FSTs

I am trying to compose (programmatically) a transducer with itself several times. The FST is a result of many unions. The composition of that FST with itself takes a long time. I am trying to figure out what could be a good way to do that efficiently. Is composition slower with many epsilon moves? Would it be recommended to first determinize the FST and remove epsilon moves before composing it with itself (the original FST has many epsilon moves, being a result of a union)? JerrRo 14 Aug 2007 - 14:55
Indeed, union introduces epsilon-transitions and epsilon-transitions will slow down composition. Non-determinism will also slow down composition. So, I would recommend you first epsilon-remove and then determinize your fst.

However, there is always a risk with determinization (it might blow up or even not terminate). Hence, if what I suggest does not work better, you should investigate using determinize only, or rmepsilon only or determinize followed by rmepsilon.

Cyril
CyrilAllauzen 14 Aug 2007 - 15:21
Thanks for your quick response. I am using a different semi-ring I defined for the composition. For some odd reason, when trying to determinize the FST I get the error:

FATAL: StringWeight::Plus: unequal arguments (non-functional FST?)

I am not sure what it means. I am guessing the mention of StringWeight
is an error (in the error :)).

Thanks.


Jerr
JerrRo 14 Aug 2007 - 15:37
The library currently only supports the determinization of functional transducers (if two successfull paths have the same input label, they need to also have the same output label). The reason for that is that we use the weighted automata determinization algorithm viewing the output labels as weights in the string semiring (hence StringWeight in the error message).

A workaround is to use Encode/Decode to view the transducer as an acceptor, considering the pair (ilabel, olabel) as one symbol.

1. Encode:
EncodeMapper encoder(kEncodeLabels, ENCODE);
Encode(fst, &encoder);

2. Determinize.

3. Decode:
Decode(fst, encoder);
CyrilAllauzen 14 Aug 2007 - 16:16
Is it possible that RmEpsilonFst/ComposeFst have a bug? When I tried creating an RmEpsilon FST, I got SIGABRT later when trying to compose it with another machine. This means the following doesn't work:

RmEpsilonFst* pRm = new RmEpsilonFst( origFst);
ComposeFst
pCompose = new ComposeFst(someFst, pRm);
(get here SIGABRT - probably because of double free/delete)

while this works:

RmEpsilonFst
pRm = new RmEpsilonFst( origFst);
ComposeFst
pCompose = new ComposeFst(someFst, *origFst);
(just changed compose line, even keep the rm epsilon without using the resulting FST).

MyArc/MyWeight are debugged quite well (I use them in so many other places, and they seem to be working well), so I suspect the problem is not there.

I will try to check the FST library code, to see if anything pops out immediately, but I suspect that won't happen.

Thanks.


Jerr.
JerrRo 14 Aug 2007 - 17:08
Thanks Jerr., I will look into that. In the meantime, I recommend you use the destructive version of epsilon-removal: RmEpsilon(&A).
It is also more efficient in general.

Cyril
CyrilAllauzen 14 Aug 2007 - 20:12
Just to make sure I got it right:

2. Determinize.
Here I should output the fst to some new MutableFst (maybe it is even possible to call Determinize(myFst, &myFst) without fear?)

3. Decode:
Decode(fst, encoder);
Here I should only decode the older FST (not the new MutableFst) to restore it to its previous state. The new determinized MutableFst will be already decoded, so there is no need to change it.

Thanks.


Jerr.
JerrRo 15 Aug 2007 - 08:03
2. Both should work.
3. You want to decode the result of determinization (using the same encoder as for encoding). You want to decode the original fst only if you still need to use it. Encode [bad link?] and Decode [bad link?] take as argument pointers to MutableFst.
CyrilAllauzen 15 Aug 2007 - 09:44

write FST in AT&T FSM format

If I've constructed an fst and can write it out in binary form using
fst.Write("binary.fst")

how can I write out the same FST in AT&T FSM format?

Thanks,

Ken
KennethRBeesley 13 Aug 2007 - 15:51
The library only supports writing fsts in binary format, since this is the "working" format that all the utilities use. The command-line utility fstprint reads a binary fst and writes out its textual (AT&T FSM format) representation.

Best,
Cyril
CyrilAllauzen 14 Aug 2007 - 11:02
In the context of a GUI/IDE built on top of OpenFst, it might well be useful to have the library support
1) writing out an fst in its textual (AT&T FSM Format) representation, and
2) drawing the fst (via DOT)

Any chance of adding such support to future versions of the library?

Thanks,

Ken
KennethRBeesley 21 Aug 2007 - 14:13
Maybe.

As of now, nothing prevents someone from simply using the FstPrinter and FstDrawer classes (from fst/bin/print-main.h and fst/bin/draw-main.h) by copy-pasting them into one's code.
CyrilAllauzen 24 Aug 2007 - 09:34

VectorFst

Hello,

I am unioning several thousand of FSTs together... When I use the final UnionFst to create a VectorFst (new VectorFst<>(unioned_fst)), it takes a long time to actually create that FST and it also takes a lot of memory. Is there a specific reason for such process to take such a long time when moving from UnionFst to VectorFst?

Thanks.


Jerr.
JerrRo 13 Aug 2007 - 15:47
The reason is that UnionFst is a delayed fst. That means it is only built when visited, i.e., it's only when converting to VectorFst that the actual union is computed.

It will probably be more efficient to convert each intermediary UnionFst to VectorFst.

In most cases, a very efficient way will be to apply epsilon-removal and determinization after each union.

Best,
Cyril
CyrilAllauzen 14 Aug 2007 - 11:11
Thanks! JerrRo 14 Aug 2007 - 14:55

Determininizg / Minimizing

Hello,

After minimizing and determinizing a machine programmatically (using DeterminizeFst and Minimize) I get all kind of isolated states that can never be reached. However, when I use fstminimize and fstdeterminize, this does not happen. Is there a way to get rid of these states programmatically? Right now, I am pretty much doing the same MinimizeMain and DeterminizeMain do, and still I get these isolated states.

Thanks.


Jerr.
JerrRo 09 Aug 2007 - 12:53
You can get rid of theses states by using Connect(&mutable_fst).

However, it is not clear to me why these non-accessible states appear. I would need to know more precisely what you are doing.

Best,
Cyril
CyrilAllauzen 09 Aug 2007 - 14:38

Multicharacter Labels

The Quick Tour indicates, for symbol-table files: "You may use any string for a label"

Question: Are there any restrictions or conventions on these strings?
Question: If a string label contains more than one character, are there cases in the code where input strings must be 'tokenized' into individual labels based on the labels defined in the symbol-table files? If so, how is such tokenization performed.
Question: How have multicharacter labels been used traditionally in the AT&T/OpenFst tradition? Examples?

Thanks,

Ken
KennethRBeesley 02 Aug 2007 - 14:00
I guess these strings should not contain spaces, tabs and newlines.

The symbol table does not play any role internally. It is only used for displaying and printing. No tokenization is performed. A multicharacter string is always treated as an indivisible label in the library.

An example of multicharacter labels is the example given in the top banner of this page. More seriously, you can check out this paper for an example of multicharacter labels when building a language model.

Best,
Cyril
CyrilAllauzen 02 Aug 2007 - 14:26
How would a literal space be represented in an AT&T FSM format file, and in a symbol-table file?
Thanks,

Ken
KennethRBeesley 23 Aug 2007 - 13:08
It cannot. You would need to replace by an other symbol.

As I was saying before, a symbol list is not required. I created fsts using ascii codes for labels. And in that case, it is simply better not to use a symbol list to create the fsts.
CyrilAllauzen 24 Aug 2007 - 09:39

changing an arc weight

Hello,

I have an FST with LogWeight arcs. I would like to access these arcs and change them programatically. I wanted to do it using ArcIterator, but Value() returns the arc as const. Is there other way to change the weight on an arc?

thanks.

jerr
JerrRo 01 Aug 2007 - 16:02
You need to use MutableArcIterator instead of ArcIterator. You can then use the SetValue method to modify the arc pointed to by the iterator.
Cyril
CyrilAllauzen 01 Aug 2007 - 17:23

labels and integers

The OpenFst Quick Tour indicates, for symbol table files, that
"You may use any string for a label; you may use any non-negative integer for a label ID. The zero label is reserved for the epsilon label."

OK, but just to make sure, I have a few questions:
Question: Both examples use a dense range 0-3 of integers. Is this required or recommended?
Question: Could one just use Unicode code point values as the integers?
Question: Can one indicate the integer values in hex notation in the symbol-table file(s)?
Question: What is the maximum integer value? Can it handle Unicode supplementary code point values? up to 0x10FFFF?

Thanks,

Ken
KennethRBeesley 26 Jul 2007 - 00:47
You don't have to use dense range of integers.

You can indeed use Unicode values as the integers.

No, you cannot use hex notation in the symbol table file.

The maximum integer value depends on the Arc::Label type. For StdArc, Label is defined as int (signed 32 bits on most machines). Hence, 0x10FFFF should be fine. You can also define your own Arc type if you wish.

Best,
Cyril
CyrilAllauzen 29 Jul 2007 - 15:12
We use strtoll(...,10) for reading integers. I guess we didn't use strtoll(,...0) (which would allow 012 or 0x12) because people might inadventedly have leading zeroes but want base 10. I could be convinced to change this if people feel strongly about it.

-Michael
MichaelRiley 30 Jul 2007 - 19:51
Cyril/Michael,

Many thanks for your responses, and in general for making OpenFst available.

I (and I assume many others) are moving consistently toward using Unicode wherever possible for internal character representations, and in Unicode, Hex is the De Facto King. Having to use decimal for Unicode code point values is a nuisance and leads to errors. While I recognize your concern about those who might "inadvertently have leading zeroes but want base 10", I suspect that most of us are pretty well acquainted with the 12 vs. 012 vs. 0x12 distinction.

Possibilities:
1. Change to using strtoll(...,0) to allow 12, 012 and 0x12. Someone might write a trivial symbolTableLint program to scan the files and print warnings if both decimal and (apparently) octal representations are found in the same file.

2. Or better (?), provide some kind of optional header in the symbol-table file, or some other convenient user-specified flag, to indicate that strtoll(...,0) should be used instead of (the default) strtoll(...,10). That would maintain backward/legacy compability; and if I use 0x12 without overtly indicating that strtoll(...,0) should be used, then presumably I'd get a helpful compile-time error.

What do you think?

Ken
KennethRBeesley 02 Aug 2007 - 13:39

compiling OpenFst beta on macosx

I'm running Mac OS X 10.4.10. I downloaded OpenFst beta and edited bin/Makefile, bin/lib and bin/text, making the changes indicated for macosx. 'make all' seemed to work. But 'make test' yields:

[titania:openfst/OpenFst/fst] beesley% make test
( cd lib ; make all )
make[1]: Nothing to be done for `all'.
( cd bin ; make all )
make[1]: Nothing to be done for `all'.
(cd test ; make test )
g++ -I../.. -O2 -o weight_test.o -c weight_test.cc
g++ -Wl,-L/Users/beesley/fsimpls/openfst/OpenFst/fst/test/../lib -o weight_test weight_test.o -lfst -lm -lpthread -ldl
./weight_test # Tests Weight classes
dyld: Library not loaded: libfst.dylib
Referenced from: /Users/beesley/fsimpls/openfst/OpenFst/fst/test/./weight_test
Reason: image not found
make[1]: * [test_weights] Trace/BPT trap
make: * [test] Error 2


Question: what does this error indicate? and how can I fix it?
Thanks,

Ken
KennethRBeesley 26 Jul 2007 - 00:10
You need to add the paths to fst/lib, fst/bin and fst/test to your DYLD_LIBRARY_PATH environment variable:

export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/Users/beesley/fsimpls/openfst/OpenFst/fst/test:/Users/beesley/fsimpls/openfst/OpenFst/fst/bin:/Users/beesley/fsimpls/openfst/OpenFst/fst/lib
CyrilAllauzen 29 Jul 2007 - 14:48

  • interval-set.h: Updated interval-set.h that supports C++11 under gcc 4.6

creating string fsa via c++ code

hi, I want to create a fsa that just has input label. In other word something like this:
0 1 my
1 2 name
2
but I don't know how to create that via c++ code.

another question is how to compile a .stxt fst file via c++ code and then create a binary fst?
I appreciate if any help

Access control:

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2014-04-21 - CyrilAllauzen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback