OpenFst Forum 2014 Archive
--++ Using
LabelLookAheadMatcher on input fst and
PhiMatcher on output fst in
LookAheadComposeFilter
OliverW - 2014-12-10 - 16:19
Hi,
I'm trying to use the LabelLookAheadMatcher on an input fst (a lexicon transducer) and the PhiMatcher on an output fst (a custom ngram fst) in a LookAheadComposeFilter.
The setup is as follows:
// Phimatcher typedef fst::PhiMatcher<fst::SortedMatcher<fst::Fst<fst::LogArc> > > PM;
// some definitions for the lookahead matcher class
OLookAhead {
public: typedef fst::LogArc ARC;
typedef fst::Fst<ARC> FST;
typedef fst::SortedMatcher<FST> SM;
typedef fst::LabelLookAheadMatcher<SM, fst::kOutputLookAheadMatcher | fst::kLookAheadEpsilons | fst::kLookAheadWeight | fst::kLookAheadPrefix | fst::kLookAheadNonEpsilonPrefix> OLAM;
typedef fst::AltSequenceComposeFilter<OLAM, PM> ASCF;
typedef fst::LookAheadComposeFilter<ASCF, OLAM, PM, fst::MATCH_OUTPUT> LACF;
typedef fst::LabelReachableData<typename ARC::Label> LRD;
typedef fst::LabelReachable<ARC, fst::DefaultAccumulator<ARC>, LRD> Reachable;
};
// the lookahed matcher
OLookAhead::OLAM *OLAM12 = new OLookAhead::OLAM(Input_Unk_Lex_OSort, fst::MATCH_OUTPUT);
// create relabler to be used to relabel LanguageModelFST input labels
OLookAhead::Reachable RD(OLAM12->GetData());
// create phi matcher with relabeled phi symbol
PM *PM22 = new PM(LanguageModelFST, fst::MATCH_INPUT, RD.Relabel(PHI_SYMBOLID), false);
// the lookahead filter
OLookAhead::LACF *LACF = new OLookAhead::LACF(Input_Unk_Lex_OSort, LanguageModelFST, OLAM12, PM22);
// the composition options
fst::ComposeFstImplOptions<OLookAhead::OLAM, PM, OLookAhead::LACF> copts2(fst::CacheOptions(), OLAM12, PM22, LACF);
// do the composition
fst::ComposeFst<fst::LogArc> Input_Unk_Lex_LM(Input_Unk_Lex_OSort, LanguageModelFST, copts2);
The input labels of my language model fst are relabeled according to the relabler.
Unfortunately the composition result only contains one start state.
Interestingly if I leave out the relabling of the PHI symbol and the Language model input arc, I get an almost complete composition result the only part missing are the last states wich are reached by a transition with <eps> on the output arc.
Examples for the fsts can be found in here:
https://www.dropbox.com/sh/dt7jayrw8upw9dg/AACOS99CeBvnGkc2xbowQnMTa?dl=0
in_lex_lm: the composition result
in_lex: the input fst (lexicon)
lm: the output fst (relabeled ngram)
lm.orig: lm without relabeling
Relabeling.txt: the used label mapping
I would be happy if someone can help me with my problem.
Is it possible that the phi matcher dois not work with the lookahead?
I could not get my head around it but thinking about it, it seems quite complicated. How does the lookahed for the phi label behave?
Oliver
Special arcs from the command line?
Hello-
Is there any way to use phi, rho, or sigma arcs with the OpenFST command-line tools? I know it's possible using the C++ API, but it'd be lovely to be able to use them as easily as we currently use epsilon arcs with fstcompile and friends.
-SB
Error when compile openfst 1.4.1
NguyenHieu - 2014-11-23 - 23:49
I have compiled openfst 1.4.1 on ubuntu 13.10_64bit, every thing done well but when I use command: "fstinfo --help", it annouce error: "fstinfo: error while loading shared libraries: libfstscript.so.0: cannot open shared object file: No such file or directory". Anyone can help me to solve this problem
DoganCan - 2014-11-24 - 21:27
Hi Nguyen,
Did you do the "make install" step? That would explain why this library can not be found on your system even if you did not specify a custom installation prefix during configuration.
Alternatively, if you specified an installation prefix during configuration, you may need to add that path to LD_LIBRARY_PATH.
Hope these help.
Cheers,
Dogan
Minimizing Transducers
AnA - 2014-11-21 - 12:21
Is there a way to minimize a finite state transducer, s.t. the minimized transducer has the same input/output behavior as the original transducer?
Example:
0 1 1 1
0 1 0 1
1 2 0 0
1 2 1 0
2 1 1 0
2 1 0 0
(a.fsm)
For this example I would expect to get the following minimized transducer:
0 1 0 1
0 1 1 1
1 1 1 0
1 1 0 0
If I run
fstcompile < a.fsm > b.fsm
fstminimize b.fsm > c.fsm
I get an empty transducer instead.
DoganCan - 2014-11-24 - 21:10
Hi,
The reason you are getting an empty transducer is because your input transducer does not have any final states.
Looking at the minimization result you are expecting, I'm guessing your input should look like the following.
0 1 1 1
0 1 0 1
1 2 0 0
1 2 1 0
1
2 1 1 0
2 1 0 0
2
Cheers,
Dogan
Determinize. Is this a bug ?
OlivierB - 2014-10-06 - 06:50
Hi,
When I determinize this small (5 states, 7 arcs) Finite state recognizer,
the resulting finite state machine is made of
48653930 states and 72980894 arcs
and use 1.7 G of disk space.
Is this really the expected behaviour ?
If so, can someone give me an explanation for this ?
Thanks,
Olivier
<verbatim>
0 1 1 1 10
0 3 1 1 10
1 2 2 2 10
1
2 1 1 1
3 1 1 1
3 4 2 2
4 3 1 1
</verbatim>
OlivierB - 2014-10-06 - 06:59
New try, here is the input FST, with hopefully better formating :
0 1 1 1 10
<br>
0 3 1 1 10
<br>
1 2 2 2 10
<br>
1
<br>
2 1 1 1
<br>
3 1 1 1
<br>
3 4 2 2
<br>
4 3 1 1
<br>
This is indeed the expected behaviour. In theory, the algorithm should not terminate but it converges due to floating point issues. The reason for that is that your machine does not have the twins property so there is no guaranty that the algorithm terminates. You can read more about this in the references on the
determinize page. If you are really interested in the subject, you can also look at
Daniel Kirsten's paper: your machine is polynomially ambiguous and does not have the clone property, so the determinization algorithm will not terminate.
It would be very convenient (for me at least) if the OpenFst library included a boolean HasTwinsProperty(A) function that would return true if and only if FST A had that property. Before trying to determinize any FST with cycles, OlivierB and I could simply call HasTwinsProperty(A) to make sure that the algorithm would terminate. (I.e., I'm not suggesting that OpenFst itself should routinely call HasTwinsProperty(A) and store the result in the properties of each FST. I know that it's a rather expensive test to perform. Rather, it would just be available for end-users like me to call explicitly when we need to know.)
FST Permute ?
I remember using a python library sometime in 2009 (it was the AT&T version i think) which had a function to permute an FST given the window of permutation. I don't find it in OpenFst, neither can I find again the AT&T library that had this function. Does any have any pointers ?
Problems using farcompilestrings and far_reader?
EstherJudd - 2014-08-09 - 13:11
I am trying to create pronunciations for a long list of words to evaluate the g2p module. I figured the fastest way to do it is to create a FAR archive of all the words and then in c++ compose each FST in the FAR archive with the trained g2p model FST. But I am running into trouble with extracting each FST in c++. It says it finds a bad FST header <unspecified>.
Does anybody have similar code that I can derive from? Or tell me the correct flags to compile the FAR archive? Right now I use farcompilestrings --symbols=en.isyms --keep_symbols=1 test.txt test.far
The c++ code is borrowed from /usr/local/include/thrax/evaluator.h
EstherJudd - 2014-08-17 - 02:01
I got a little bit further here because I found out I forgot to include /usr/local/include/libfstfarscripts.so as a linker file (I did hav libfstfar.so but that wasn't enough. Now I am stuck with the call to the function ReassignSymbols which is defined in thrax/evaluator.h. It says it has not been declared in the scope of main(). I added #include <thrax/evaluator.h> to the .cc file but that is not enough. What is the best way in c++ to include a function defined in a header file like that?
PaulDixon - 2014-08-18 - 06:33
How about this?
https://gist.github.com/edobashira/8183f3ef2e7431345241
It composes the model FST with every FST in the input FAR file and writes the output to another FAR file.
For farcompilestrings I just used the symbols flags and the program assumes none of the FSTs have symbols tables and everything is in the tropical semiring.
I'm compiling on OSX 10.9 so you might need slightly different compiler switches for your platform
EstherJudd - 2014-08-19 - 14:08
Thanks, I was able to compile the program on Ubuntu by slightly changing the Makefile. I have one question though, how do you do it with symbol tables? The input far has only letters and the model fst goes from letters to phonemes, so the output is phonemes.
PaulDixon - 2014-08-19 - 15:22
If the symbol table in the far file and the letter symbol table of the g2p model are the same then compose won't throw an error and
everything should just work.
EstherJudd - 2014-08-19 - 15:45
The problem is that when you compile a far file, the symbol table only is added to the first FST and you have to make sure you transfer it to the rest. I have been trying to use code from thrax/evaluator.h to use a function called ReassignSymbols but haven't figured out quite how to pull that off yet.
PaulDixon - 2014-08-20 - 02:05
OK, how about just compiling the far --keep_symbols=false and compile the model --keep_isymbols=false
EstherJudd - 2014-08-20 - 03:02
I will give that a try.
EstherJudd - 2014-08-20 - 07:13
That was it! Thanks so much for your help. Now I just have to figure out how to extract the output strings into a text file but that shouldn't be too hard.
fstcompile to fstdraw
Is it possible to compile to draw a fst that has this format?
0 1 3005 0 4.71582,57,115
0 2 3783 0 0,57,115
0 3 0 1 -2.39746,71,347
...
I mean because of the weights:
Tks.
DoganCan - 2014-07-25 - 16:21
Hi Andre,
Maybe something like this?
https://github.com/dogancan/tropical_LT_tropical_LT_tropical-arc
Cheers, Dogan
Double Weighted arcs with composition lookahead filter
Hi
I am using the additional lattice4 library to provide two-weighted arcs. It works great but I am having a problem using the composition lookahead filter on WFSTs of the lattice4 arc type. I was just wondering if anyone had ever used composition filters on WFSTs with user-defined weights? Any help would be much appreciated.
PaulDixon - 2014-07-29 - 16:09
Try this (taken from matcher-fst.h)
typedef MatcherFst<ConstFst<KaldiLatticeArc>,
LabelLookAheadMatcher<SortedMatcher<ConstFst<KaldiLatticeArc> >,
olabel_lookahead_flags,
FastLogAccumulator<KaldiLatticeArc> >,
olabel_lookahead_fst_type,
LabelLookAheadRelabeler<KaldiLatticeArc> >
KaldiLatticeOLabelLookAheadFst;
And for shell tools building this into the shared object (from extensions/lookahead/olabel_lookahead-fst.cc)
compile this into the lattice4 shared object code
static FstRegisterer<KaldiLatticeOLabelLookAheadFst>
OLabelLookAheadFst_KaldiLatticeArc_registerer;
Determinize, multithreading
AlexZatv - 2014-07-22 - 16:21
Hello! I am using Determinize on VectorFst, in multi-threaded program, in OpenFst port for Windows. But, unfortunately, sometimes it fails. May be someone here have experience in multithreading in OpenFst and can help?
Deleting a specific arc in an FST
VinodhR - 2014-06-25 - 17:55
Hi,
I am iterating all the arcs of an FST.
for(StateIterator<VectorFst <StdArc>>; siter(fstnew); !siter.Done(); siter.Next()) {
for(MutableArcIterator<VectorFst <StdArc> > aiter(&fstnew,siter.Value()); !aiter.Done(); aiter.Next()) {
StdArc arc = aiter.Value();
<<delete current arc>>
}
}
I see that DeleteArcs has been overloaded with an additional parameter
n
. What does this
n
refer to ?
How do I delete the current arc that is being iterated ? or for that matter any specific arc in an FST ?
V
DoganCan - 2014-06-25 - 21:06
Hi Vinodh,
OpenFst does not provide a convenience method for deleting specific arcs from a
VectorFst
. I believe this has to do with the fact that these arcs are kept in a
std::vector
and the erase operation is inefficient for this data type, i.e. erasing an entry from a
std::vector
is linear in the size of the vector. As a workaround, you can construct a new fst, which includes only the arcs you want to keep, while iterating through the arcs of the original.
You can use
DeleteArcs
to delete all arcs at a state or the last
n
arcs added to a state.
Cheers,
Dogan
Using fstreplace after fst composition
Is it possible to use fstreplace tool in FSTs after composing them with some other FSTs?
For example, I have a grammar
G
with some symbols I want to replace later in it, let's say,
S1
and
S2
. I have also a lexicon
L
. The symbols
S1
and
S2
have two other grammar FSTs
GS1
and
GS2
.
I am trying:
fstcompose L G > LG
fstcompose L GS1 > LGS1
fstcompose L GS2 > LGS2
fstreplace LG 'label' LGS1 'label1' LGS2 'label2' > LG1
But this is getting a lot different if I do:
fstreplace G 'label' GS1 'label1' GS2 'label2' > G2
fstcompose L G2 > LG2
LG2
is correct, I can use it for my purposes, but I can't make it right the first way, any idea why?
DoganCan - 2014-06-25 - 03:37
Hi Andre,
There were a few inconsistencies in your example commands. I edited your question and tried to fix them to the best of my understanding. Please check the question and confirm that the example commands match your case.
The most likely reason why
LG1
and
LG2
differ is because
G
includes arcs labeled with
S1
and
S2
while
L
does not. When you compose
L
with
G
, the paths in
G
which contain
S1
and
S2
do not match any path in
L
. I believe you can handle these symbols by adding two simple loops to
L
, e.g. assuming
0
is both the initial state and a final state of
L
, you would add
0 0 <eps> S1
0 0 <eps> S2
These loops will match
S1
and
S2
during composition, hence preserve the paths which include these symbols in
LG
.
Cheers,
Dogan
Hey, Dogan, thanks for your answer.
Actually
L
contains the symbols
S1
and
S2
exactly like you sugested.
0 1 <eps> <eps> 0.693147182
0 2 sil <eps> 0.693147182
1 4 <eps> S1
1 5 <eps> S2
1 1 sil _silence_
1 6 k w6
1 12 d w3
1 1 o^ w1 0.693147182
1 3 o^ w1 0.693147182
1 15 g w5
1 18 t w2
1 21 t w4
1 1 #0 #0
1
2 1 #3 <eps>
3 2 sil <eps>
4 1 #1 <eps> 0.693147182
4 3 #1 <eps> 0.693147182
5 1 #2 <eps> 0.693147182
5 3 #2 <eps> 0.693147182
6 7 a <eps>
7 8 x <eps>
8 9 o^ <eps>
9 10 R <eps>
10 11 o <eps>
11 1 S <eps> 0.693147182
11 3 S <eps> 0.693147182
12 13 o^ <eps>
13 14 i <eps>
14 1 z <eps> 0.693147182
14 3 z <eps> 0.693147182
15 16 a' <eps>
16 17 t <eps>
17 1 S <eps> 0.693147182
17 3 S <eps> 0.693147182
18 19 e^ <eps>
19 20 N <eps>
20 1 o_ <eps> 0.693147182
20 3 o_ <eps> 0.693147182
21 22 r <eps>
22 23 e^ <eps>
23 1 S <eps> 0.693147182
23 3 S <eps> 0.693147182
The symbols
S1
and
S2
will still be in LG after composition because I want to replace them later, isn't it correct? But in fact
GS1
and
GS2
do not contain the symbols
S1
and
S2
. They are instead what I want to use to replace later.
G
0 1 #0 <eps> 227.955917
0 4 w1 w1
1 2 S1 S1 1.60943794
1 3 S2 S2 1.60943794
1 4 w1 w1 1.60943794
1 5 w2 w2 1.60943794
1 1.60943794
2 1 #0 <eps> 227.955917
2
3 1 #0 <eps> 227.955917
3 2 S1 S1
4 1 #0 <eps> 227.955917
4 5 w2 w2
5 1 #0 <eps> 227.955917
5 3 S2 S2
GS1
0 1 #0 <eps> 16.534359
0 2 w6 w6 0.693147182
0 3 w5 w5 0.693147182
1 2 w6 w6 1.38629436
1 3 w5 w5 1.38629436
1 0.693147182
2 1 #0 <eps> 227.955917
2
3 1 #0 <eps> 227.955917
3
GS2
0 1 #0 <eps> 16.534359
0 2 w3 w3 0.693147182
0 3 w4 w4 0.693147182
1 2 w3 w3 1.38629436
1 3 w4 w4 1.38629436
1 0.693147182
2 1 #0 <eps> 227.955917
2
3 1 #0 <eps> 227.955917
3
By the way, I use the option
epsilon_on_replace
with fstreplace.
Thanks.
DoganCan - 2014-06-25 - 20:36
Hi Andre,
The difference is due to the structure of the lexicon. There are two issues as far as I can tell.
- The loops for handling
S1
and S2
symbols are introducing extra disambiguation and silence symbols on the input side. As a result, LG1
will always include these extra symbols on its input side. On the other hand, G2
does not include S1
and S2
symbols. Hence these dummy loops are never traversed while composing L
with G2
. Make sure these dummy paths do not introduce extra symbols on the input side.
- An optional silence is allowed at the beginning. These optional silences will make their way through to
LGS1
and LGS2
and create additional silence symbols on the input side of LG1
. You might want to use a slightly different lexicon, which does not allow these extra silences at the beginning, while constructing LGS1
and LGS2
.
Cheers, Dogan
Phi-matcher in PDT
Is it possible to use specialized matchers in a PDT? Specifically, I want to create a PDT from an RTN (using pdtreplace) where some of the FSTs in the RTN have phi transitions. Is this possible with the current version of the pdt extension, and if so how?
BenSnyder - 2014-08-14 - 12:43
I've been grappling with this questions too. I've created a PhiParenMatcher class (inheriting from PhiMatcher with the <M> template specialized to be a PDT ParenMatcher). The main problem is that the phi match fails when the destination state of the phi arc doesn't have an
explicit match to the requested match_label, but does have a non-consuming match. This commonly happens in a PDT because the destination state has outgoing arcs labelled with parentheses, which are non-consuming and treated analogously to epsilons.
In fact, I think this is a more general problem with the PhiMatcher, even with a standard FST -- the phi match doesn't work properly when the destination state of the phi arc has an outgoing epsilon arc. I will update if I come up with a solution.
Encoding and composing a character-to-token lexicon transducer with UNKs
Is there a relatively standard or straightforward way to build a lexicon transducer (i.e., a character-to-token transducer) which handles unknown words, outputting an appropriate UNK symbol? The lexicon examples at
http://www.openfst.org/twiki/bin/view/FST/FstExamples are pretty straightforward, and we can extend the lexicon script to construct a trie transducer for an arbitrary corpus (and handle spaces as word delimiters). But the transducer can't rewrite input that includes unobserved words.
My intuition is that generalizing further to handle UNKs would be similar to encoding a language model with backoff transitions (Allauzen et al., 2003). But I'm confused about the exact encoding, and how to compose the resulting transducer with other transducers in a chain, using the appropriate matchers (my guess is we want either a phi or a rho matcher, but I'm not certain which).
For example, consider a system consisting of 3 transducers:
* F: A transducer which passes through all characters unchanged, or collapses sequences of repeated characters at some cost. (Note that this transducer is not deterministic, since it doesn't
require that a repeated sequence be collapsed).<br>
* G: Character-to-token lexicon transducer<br>
* H: Word-level n-gram model<br>
In theory, H(G(F(x))) would normalize input text, deleting duplicated characters as appropriate (e.g., if the character-deletion costs and language model probabilities balance properly, it might change 'pair off access' to 'pair of aces' but leave 'turn off access' unchanged).
F will naturally include arcs with epsilon outputs to delete duplicate characters. But it seems like those epsilon outputs aren't semantically the same as the epsilon inputs in H that encode LM backoffs, or likely to analogous epsilon inputs in G. Both compositions (F o G) and (F o G) o H will require handling backoff arcs. That means we need to use a phi or rho matcher, correct? Can we perfom these compositions with the command-line fstcompose tool? Or does phi/rho matching require custom C++ code?
Of course, we could avoid the problem by using a character-level language model instead (in which case we'd only need 2 transducers, and could let OpenGRM encode the LM backoffs), but I'd like to incorporate the longer distance dependencies of a word-level model.
If there are standard tools to do this that I'm missing, I'd love pointers. Thanks in advance.
DoganCan - 2014-06-19 - 22:54
Hi Aaron,
I'm not quite sure about the exact transduction you are looking for. Is the following an accurate description of what you have in mind?
- Input is a sequence of characters where word boundaries are marked with whitespace characters, i.e. there is no confusion about where one word starts and ends.
- Characters may repeat and deleting repeated characters by paying its cost is OK.
- There may be character sequences which can not be mapped to any word in the vocabulary even after character deletion. These should be mapped to UNK.
Could you give an example input and the desired output? Ideally something which covers all above cases.
Cheers,
Dogan
Dogan,
I'm specifically interested in the character-to-token (lexicon) transducer that handles unknown words. The duplicate-deletion transducer is an example of a character-level transducer I might combine with it (the transducer I'm working with currently learns character-level mutations from data, and I want to support other character transducers as well, such as something hand-coded in Thrax).
To combine any of those character transducers with a word-level language model, I need a lexicon transducer in between.
So a trivial example chain (that only handles the word 'it') might be:
1. A character-mutation transducer including i:i and t:t pass-through arcs (at 0 cost), and t:<eps> at some cost
2. A lexicon which accepts i t <space> and outputs 'it'. (sorry, I can't figure out how to include images for these transducers). If any character other than 'i' appears at the start state, it should take a failure transition to an UNK state (at some cost); similarly for any character other than 't' in the 'i' state; the UNK state would have a self-loop for any character other than <space>, and an arc for <space> back to the start, outputting <unk>.
I can create transducer 2 with explicit arcs for all characters in the vocabulary, but of course that's very large for a sizable (word) vocabulary (millions of arcs). So it's impractical to compose with the language model (And it would presumably be much larger if we handled characters outside of the English ASCII range). I know language models use failure arcs to avoid encode backoff transitions compactly. My intuition is that a lexicon with UNK support would look similar. But I haven't figured out how to do that encoding, or what matchers to use (if a specific matcher is needed) for composition and inference. I probably need to go back and read Cyril, Mehryar, and Brian's 2003 language modeling paper in more detail. But it seemed like a problem that might already have been solved.
Did that help clarify my question at all?
Thanks,
Aaron
DoganCan - 2014-06-20 - 21:54
Hi Aaron
Does the following lexicon for the vocabulary
{man, mars}
match what you have in mind? Note that
phi
matches rest of symbols but does not consume any symbols, while
rho
matches rest of symbols and consumes a symbol.
Cheers,
Dogan
DoganCan - 2014-06-21 - 07:05
Hi Aaron,
I just pushed an example setup to github at the address
https://github.com/dogancan/lexicon-unk
It includes a simple script for constructing the lexicon and a rudimentary fst composition tool with
phi
and
rho
matching on the input side of the second (right-hand) input fst.
Cheers,
Dogan
EvaHasler - 2015-08-11 - 11:27
Hi Dogan
I have a question that is related to your example from above. I am using your example to set up a rho matcher and it works fine. However, when the second lattice also contains arcs with epsilons as input symbols, it does not work: FATAL: ComposeFst: 2nd argument requires matching but cannot.
How do I fix it to be able to compose with epsilons as well?
Thanks,
Eva
Compile openfst 1.4.1 on 64-bit Ubuntu VM?
EstherJudd - 2014-06-17 - 12:26
When I try to compile openfst on Ubuntu everything seems to compile just fine. But then I try to run a program such as fstinfo and I get the following error:
fstinfo: error while loading shared libraries: libfstscript.so.1: cannot open shared object file: No such file or directory
I have tried configuring with --enable-shared on, I have tried downgrading from gcc 4.8.2 to gcc-4.4. There seems to be no difference. The libraries are installed in /usr/local/lib/fst. I have even tried adding the directory to the path in my bash login-script. What else can I do?
PaulDixon - 2014-06-17 - 16:07
Have you tried adding /usr/local/lib/ to the LD_LIBRARY_PATH environment variable?
EstherJudd - 2014-06-17 - 17:22
That helped. Dumb of me to not think of that. Now I'm just grappling with compiling thrax. I get an internal compiler error whenever I try.
how to add LD_LIBRARY_PATH environment variable
Is there a simple way to add weighted arcs from final nodes to initial node?
JihoonKim - 2014-06-09 - 00:50
I have an FST which has more than 10000 final nodes. I'd like to add weighted arcs from these final nodes to initial node. Closure operation seems not to support weighted arcs. Is there any simple way to do that?
Thanks in advance.
JihoonKim - 2014-06-09 - 04:03
To clear up, I want the arcs with epsilon input/output.
PaulDixon - 2014-06-09 - 13:32
How about just iterating over all the states and adding the additional arcs?
Untested code:
template<class Arc>
void AddFinalStartArc(MutableFst<Arc> *fst, typename Arc::Weight weight) {
typedef typename Arc::Weight Weight;
for (StateIterator<Arc> siter(*fst); !siter.Done(); siter.Next())
if (fst->Final(siter.Value()) != Weight::Zero())
fst->AddArc(siter.Value(), Arc(0, 0, weight, fst->Start());
}
Compile error OpenFST 1.4.1
Hi, I'm trying to install the latest version on Windows using Cygwin. All previous versions that I tried (last was 1.3.3) compiled without problems. In this version I get the following compile error:
/bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT determinize.lo -MD -MP -MF .deps/determinize.Tpo -c -o determinize.lo determinize.cc
libtool: compile: g++ -DHAVE_CONFIG_H -I./../include -std=c++0x -MT determinize.lo -MD -MP -MF .deps/determinize.Tpo -c determinize.cc -DDLL_EXPORT -DPIC -o .libs/determinize.o
as: .libs/determinize.o: too many sections (45066)
/tmp/ccUSIrLz.s: Assembler messages:
/tmp/ccUSIrLz.s: Fatal error: can't write .libs/determinize.o: File too big
as: .libs/determinize.o: too many sections (45066)
/tmp/ccUSIrLz.s: Fatal error: can't close .libs/determinize.o: File too big
Makefile:390: recipe for target 'determinize.lo' failed
Any idea what has changed that it doesn't compile anymore? From what I understand there is little I can do on my side to make it work? Any help appreciated as usual.
Thanks,
Roland
PaulDixon - 2014-05-24 - 14:52
PaulDixon - 2014-05-24 - 14:59
I saw a similar error about too many segments when compiling version 1.4.1 on Visual Studio. In VS there is a switch /bigobj that worked for me but there doesn't seem to be an equivalent for Cygwin. You could try compiling without debugging information to try reduce the file size. I'm not familiar with Cygwin but would using a 64bit compiler help?
http://mingw-w64.sourceforge.net/
You also try using the VS compiled version, I put a link to the solution in the contrib section. They should run fine under Cygwin
Someone reported on the Kaldi forum that if you configure with --enable-static --disable-shared , it works.
Kaisheng Yao posted this to kaldi-users:
I found the following link
http://stackoverflow.com/questions/16596876/object-file-has-too-many-sections
It is related to using template.
Another way to resolve this is turning on optimization. In my case I use
EXTRA_CXXFLAGS += -Wno-sign-compare -O1
in src\lat\Makefile
I therefore changed the following in src\kaldi.mk and compilation was successful after this change
CXXFLAGS = -msse -msse2 -Wall -I.. -DKALDI_DOUBLEPRECISION=0 -fPIC -DHAVE_POSIX_MEMALIGN -DHAVE_CLAPACK -I ../../tools/CLAPACK/ -Wno-sign-compare -Winit-self -I ../../tools/CLAPACK/ -I $(FSTROOT)/include $(EXTRA_CXXFLAGS) -O1
Just to clarify DP's comment:
--enable-static --disabled shared does NOT work
Turning on C++ optimization DOES work.
Put CXXFLAGS="-O" at the end of the configuration command, e.g.
./configure CXXFLAGS="-O"
Multiple score on the arc...?
Hi, Is it possible to define multiple scores on the arcs? e.g. acoustic and language modeling scores at the same time. Would determinization and minimization algorithms work in this situation as is?
Thanks.
PaulDixon - 2014-04-09 - 05:09
Yes to both. This is described in the OpenFst paper
http://openfst.org/twiki/pub/FST/FstBackground/ciaa.pdf.
Kaldi also has a LatticeWeight type which will do exactly what you want. Here is minimal standalone example that will build an OpenFst shared object and enable the type in the shell tools.
https://gist.github.com/edobashira/10244848
Hi Paul, I compiled the standalone version on openfst.1.3.4. However, I am getting the following error when run the following test command. Can you help me what could be the problem?
$ fstcompile --arc_type=lattice4 test.txt
ERROR: GenericRegister::GetEntry : lookup failed in shared object: lattice4-arc.so
FATAL: No operation found for "CompileFst" on arc type lattice4
DoganCan - 2014-05-11 - 19:06
Hi Zeeshan,
Did you build the shared object
lattice4-arc.so
using Paul's gist? It seems like it is not in the working directory or in the dynamic library search path
LD_LIBRARY_PATH
(
DYLD_LIBRARY_PATH
on OS X).
Cheers,
Dogan
Hi Dogan,
Thanks for the reply. It worked fine on pure Linux system but on cygwin it's not working though the
lattice4-arc.dll
is in the working directory.
The problem is that on cygwin shared libraries for openFst are not built. Any idea on how to build openFst shared libraries on cygwin?
Thanks,
Zeeshan
DoganCan - 2014-05-13 - 18:26
Hi Zeeshan,
I have no idea how things work under cygwin. That being said, are you sure your problem is related to shared libraries not being built? I think your problem might have to do with the shared object file being named
lattice4-arc.dll
instead of
lattice4-arc.so
. Try renaming that file to
lattice4-arc.so
. If that does not work, try
lattice4-arc.so.dll
. Good luck.
Cheers, Dogan
Error installing openGRM on Mac 10.9.2
SepidehA - 2014-04-05 - 13:38
Hi all,
I have installed OpenFST successfully on my mac 10.9.2 with --enable-far.
But then when I want to configure my OpenGRM for installing I get
" checking fst/fst.h usability... no
checking fst/fst.h presence... no
checking for fst/fst.h... no
configure: error: fst/fst.h header not found"
but my fst.h is there in /usr/local/include
any ideas?
Thanks
Hi,
It seems this is due by Apple changing the default C++ standard library in the latest version of Xcode (version 5).
CXX=-stdlib=libstdc++ ./configure
should fix this.
EstherJudd - 2014-05-07 - 13:03
I have the same problem. The command above does not fix it. I had it working on a different machine previously but just started working on my iMac with 10.9.2 and the latest version of Xcode installed.
PaulDixon - 2014-05-08 - 01:15
The latest c++11 version of OpenFst (1.4.1) and OpenGrm Ngram (1.21) should just compile without any extra flags or environment variables.
EstherJudd - 2014-05-08 - 11:43
Then why when I run ./configure do I get the following messages? fat binaries are installed in /usr/local/bin and it is in my path
checking fst/fst.h usability... no
checking fst/fst.h presence... no
checking for fst/fst.h... no
configure: error: fst/fst.h header not found
Here's what worked for me on Mavericks (10.9.2)
In the extracted opengrm-ngram-1.2.1 directory:
# Replace imports of 'unordered_map' with 'tr1/unordered_map'
perl -pi -e 's#<unordered_map>#<tr1/unordered_map>#' src/include/ngram/ngram-shrink.h src/include/ngram/ngram-marginalize.h src/include/ngram/ngram-count.h
# Configure with Clang arguments
./configure --prefix=/usr/local CXX="clang++ -std=c++11 -stdlib=libstdc++"
make
sudo make install
Note: To configure OpenFST, I used:
./configure --prefix=/usr/local --enable-compact-fsts --enable-const-fsts --enable-far=yes --enable-lookahead-fsts --enable-pdt CXX="clang++ -std=c++11 -stdlib=libstdc++"
Of course, your mileage may vary. But maybe someone will find that helpful.
Aaron: My original message was about compiling OpenGrm NGram 1.1.x. With version 1.2.1, you shouldn't need to
-stdlib=libstdc++
. Have you tried compiling without it?
Esther: Could you clarify which version of OpenFst and OpenGrm NGram you were using? Which version of Xcode? Also could you look at
config.log
and see if you're seeing anything suspicious there. Also, if you've recently installed Xcode5, you might need to accept the license:
/Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild -license accept
.
Cyril: Compiling OpenFST without -stdlib=libstdc++ seems to work, but configuring OpenGRM without that flag fails.
./configure --prefix=/usr/local CXX="clang++ -std=c++11"
...
checking for fst/fst.h... no
configure: error: fst/fst.h header not found
I have the same problem when I try to configure Thrax. It shows configure: error: fst/fst.h header not found
Using the flag CXX="clang++ -std=c++11" does not fix it.
In OpenFst, if I configure with CXX="clang++ -std=c++11 -stdlib=libstdc++" as suggested by Aaron, I got a fatal error of unordered_map not found. Well I can configure and install OpenFst without this flag but I cannot configure Thrax.
Thank you very much for your help.
Below would be helpful for the above thread-
http://www.cslu.ogi.edu/~sproatr/Courses/TextNorm/tutorial.html
In short:
1. Install - openfst-1.5.0 or higher version
http://www.openfst.org/twiki/bin/view/FST/WebHome
2. Set the LD_LIBRARY_PATH=/usr/local/bin [all the library *.so by default to be installed ]
3. Clean up if already tried [ make clean ]
4. Try installing it back [make && make install]
walah Done
Intersection gives a FATAL error
Hi all,
I have two transducers with no symbol tables:
0 0 9 9 0
0 0 6 6 0
0 1 8 8 0
0 1 7 7 0
0 1 3 3 0
1 1 9 9 0
1 1 6 6 0
1 2 8 8 0
2 2 9 9 0
2 2 6 6 0
2 3 4 4 0
3 3 9 9 0
3 3 6 6 0
3 3 8 8 0
3 3 7 7 0
3 3 3 3 0
3 4 5 5 0
4 4 9 9 0
4 4 6 6 0
4 5 8 8 0
5 5 9 9 0
5 5 6 6 0
5 6 8 8 0
5 6 7 7 0
5 6 3 3 0
6 6 9 9 0
6 6 6 6 0
6 0
and
0 0 5 5 0
0 0 4 4 0
0 1 3 3 0
1 2 6 6 0
2 3 5 5 0
2 3 4 4 0
2 2 3 3 0
2 4 9 9 0
2 2 8 8 0
2 2 7 7 0
3 3 5 5 0
3 3 4 4 0
3 2 3 3 0
3 2 8 8 0
3 2 7 7 0
4 4 5 5 0
4 4 4 4 0
4 5 3 3 0
5 5 5 5 0
5 5 4 4 0
5 0
When I try to intersect them (t1 is the first transducer and t2 the second):
RmEpsilon(t1);
RmEpsilon(t2);
ArcSort(t1, OLabelCompare<StdArc>());
ArcSort(t2, ILabelCompare<StdArc>());
EncodeMapper<StdArc> encoder(kEncodeLabels,ENCODE);
EncodeFst<StdArc> enc1(*t1, &encoder);
EncodeFst<StdArc> enc2(*t2, &encoder);
IntersectFst<StdArc> intersect(enc1, enc2);
I get an error message "FATAL: ComposeFst: 1st argument cannot match on output labels and 2nd argument cannot match on input labels (sort?)."
What am I doing wrong?
PaulDixon - 2014-03-08 - 09:10
You need to sort the encoded fsts before the intersection.
It seems that using Encode instead of EncodeFst solves the problem, because then I'm able to sort the arcs before intersection:
Encode<StdArc>(t1, &encoder);
Encode<StdArc>(t2, &encoder);
ArcSort(t1, OLabelCompare<StdArc>());
ArcSort(t2, ILabelCompare<StdArc>());
IntersectFst<StdArc> intersect(*t1, *t2);
OpenFst website back up
The OpenFst website was down for a few days due to some hardware issue.
The site is back up but some things (such as the doxygen links) are not working properly yet. We're working on resolving the remaining issues.
I've built OpenFst in my home directory.
I'm aiming to compile the following toy code:
1 #include <fst/fstlib.h>
2
3 int main(void) {
4 fst::StdVectorFst fst;
5 }
So far I've run a number of commands that are variations of the following theme:
g++ main.cc -I/home/oadams/openfst-1.3.4/src/include/ -L/home/oadams/openfst-1.3.4/src/lib/.libs/libfst.so
I get various undefined reference errors. I feel I should probably have append the commands with -l
, but for the obvious variations I get errors along the lines of:
/usr/bin/ld: cannot find -lopenfst
collect2: error: ld returned 1 exit status
How can I get it to compile? Any help would be greatly appreciated.
Hello,
I've got a problem running the fstprune binary with my user defined arc type.
It works fine if I do something like:
> fstprune myfst.fst > /dev/null
but if I do
> fstprune --weight=8 myfst.fst > /dev/null
I get
ERROR: GenericRegister::GetEntry : tropical_LT_tropical.so: cannot open shared object file: No such file or directory
Segmentation fault
I believe this is due to function ConvertKeyToSoFilename() in weight-class.h:
virtual string ConvertKeyToSoFilename(const string &key) const {
return key + ".so";
}
which should probably be:
virtual string ConvertKeyToSoFilename(const string& key) const {
string legal_type(key);
ConvertToLegalCSymbol(&legal_type);
return legal_type + "-arc.so";
}
Thanks a lot for your feedback!
Federico
DoganCan - 27 Jan 2014 - 03:19
Hi Federico,
I believe, you need a shared object for your user defined weight type (see attached code) as well. Make sure the the DSO is named tropical_LT_tropical.so
and put it somewhere dlopen
can find.
Also, you will get the following error with that fstprune
call since 8
can not be parsed as a pair weight.
fstprune --weight=8 test.fst > /dev/null
FATAL: StrToWeight: Bad weight = "8", source = WeightClass, line = 0
Try this instead
fstprune --weight=8,0 test.fst > /dev/null
Cheers,
Dogan
#include <fst/arc.h>
#include <fst/script/weight-class.h>
using namespace fst;
namespace fst {
namespace script {
REGISTER_FST_WEIGHT(LexicographicWeight<TropicalWeight, TropicalWeight>);
}
}
Thanks a lot Dogan!
Actually I've got the shared object (named tropical_LT_tropical-arc.so) which is loaded correctly when I do for example fstprint. I see in the code DSOs are generally loaded adding the suffix -arc:
legal_type + "-arc.so"
It's only in src/include/fst/script/weight-class.h
that the suffix is not added. After changing function ConvertKeyToSoFilename()
in weight-class.h from:
virtual string ConvertKeyToSoFilename(const string &key) const {
return key + ".so";
}
to:
virtual string ConvertKeyToSoFilename(const string &key) const {
//return key + ".so"; //ff257: is this a bug?
string legal_type(key);
ConvertToLegalCSymbol(&legal_type);
return legal_type + "-arc.so";
}
it worked fine. Thanks for confirming that I need to add REGISTER_FST_WEIGHT. I didn't do it initially because it doesn't seem to be mentioned in the on-line documentation, but after a grep of the src for REGISTER_FST I had found it. Then yes, the last missing bit was to pass 8,0 rather than 0. I've just tried and it works perfectly!
Thanks a lot!
Federico
Apologies for the bad formatting... And I don't see an option to fix it
DoganCan - 28 Jan 2014 - 03:55
Hi Federico,
I am glad it worked out. Regarding the change you made in the source code, I don't think it is a bug. OpenFst binaries can dynamically load user defined weight types just like user defined arc types. Even though in practice these two mean more or less the same thing, technically these are two different things. In this particular case, fstprune
is looking for the user defined weight type (not the arc type) to parse the command line weight parameter. That is why you don't see the same error when you call fstprint
(or when you call fstprune
without the --weight
option) which only needs to load the user defined arc type.
I believe the intended way of handling this issue is by creating another DSO named tropical_LT_tropical.so
(in addition to tropical_LT_tropical-arc.so
) which only registers the weight type. Patching the library code is not a great idea if you plan to update the OpenFst library later when a new version comes out or if you plan to run your code on different machines -- you will have to remember patching the library -- or if you plan to distribute your code.
Cheers,
Dogan
Is there a general approach to eliminating redundant (higher-weighted) paths in cyclic WFSAs?
AlecS - 16 Jan 2014 - 05:36
Given a simple tropical semiring WFSA, defined by the expression
A* | (A / 1)* | (B /2)
Is there any general way to reduce this to the semantically equivalent WFSA
A* | (B /2)
In other words, how to remove redundant higher-weighted paths in tropical semiring WFSAs with weighted cycles?
Does anyone have a solution to this?
Thanks.
DoganCan - 18 Jan 2014 - 21:12
Hi Alec,
I don't think there is a general solution to your problem provided by the public release of OpenFst at this time. Determinization will do what you want as long as the input automaton has the twins property. The example you gave does not have this property, hence determinization will not work in this case.
Having said that, conclusion section of this paper alludes to a general disambiguation algorithm for weighted automata which does not require the twins property to hold. I believe there is a good chance that algorithm will be a part of the public release of OpenFst once it is published, but this is pure speculation on my end.
Cheers,
Dogan
AlecS - 04 Feb 2014 - 05:26
Hi Dogan,
Thanks for your input.
Yes I agree, if someone could devise a working disambiguation algorithm for weighted FSAs, that would do the trick.
Also, a determinization algorithm that would determinize all determinizable WFSAs, despite the lack of twins property in certain cases (like the example above), would suffice in my case.
Let’s hope your speculation is correct.
Thanks, Alec
PaulDixon - 07 Feb 2014 - 01:01
mifst toolkit has a determinization algorithm that tries to detect self loops with different weights.
-- Michael Riley - 2018-05-30