OpenFst Forum 2015 Archive
converting standard lattice format into and FST file
Hi i have Standard lattice format files, and i want to convert them into WFST to use them in Kaldi
each file contains more than one NULL node, which were used in htk to decrease number of arcs, now if i want to convert those files to FST
What should those null nodes be in the symbol file
Not that i am working with the specific symbol file as input symbol file and output symbol file in order to finish the process
Minimization for fsts with negative weights
--
Miikka Silfverberg - 2015-09-02
I get weird results when I try to minimize a tropical weight fst with negative weights as shown below. Additionally, the minimization takes quite long (around 20 seconds) although the fst only has 2 states.
$ cat bar.txt
0 1 1 1 -5
1 1 2 2 -1
1 0
$ fstcompile bar.txt bar.ofst
$ fstminimize bar.ofst bar.min.ofst
$ fstprint bar.min.ofst
0 1 1 1 -16777220
1 1 2 2
1 16777216
As far as I understand, this fst is already minimal so
fstminimize
shouldn't really do anything.
If I convert all weights to positive weights, the problem vanishes
$ cat baz.txt
0 1 1 1 5
1 1 2 2 1
1 0
$ fstcompile baz.txt baz.ofst
$ fstminimize baz.ofst baz.min.ofst
$ fstprint baz.min.ofst
0 1 1 1 5
1 1 2 2 1
1
I'm using the latest stable release (
http://openfst.org/twiki/pub/FST/FstDownload/openfst-1.5.0.tar.gz).
DoganCan - 2015-09-02 - 22:29
Hi Miikka,
The minimization algorithm first applies weight pushing to normalize the distribution of weights along the paths of the input automaton. Weight pushing algorithm requires shortest distance from any state to final states to be defined. The example automaton with negative weights violates this condition due to the negative weight cycle.
You can look into
Jason Eisner's WFSA minimization algorithm if you need to minimize automata with negative weights, particularly if they have negative weight cycles.
Cheers,
Dogan
Thanks for the explanation Dogan
I'll look into Eisner's minimization algorithm. Do you happen to know about freely available implementations?
DoganCan - 2015-09-03 - 22:34
Hi Miikka
I don't know if there is a publicly available implementation of Eisner's algorithm.
Cheers,
Dogan
Gentle Intro to Lazy FST Operations
Where can I find a gentle introduction to lazy FST operations? When and how to use them? Tradeoffs in size and performance?
there is teh OpenFSt library paper ref'ed under background material topic here and there is some comments on efficiency in the efficiency topic.
Wildcard arcs
Hi, sorry if this has been asked before. I am using OpenFST via the command line interface, so I want to build my FSTs as text files before compiling them. I would like to know if it's possible to add an arc that will:
- accept any as-yet unspecified symbol as input
- output the same symbol
So in my head I imagine something like:
0 1 a a [wgt_a]
0 1 * * [wgt_other]
Where the asterisks represent "any other symbol", i.e. that arc operates as a catch-all for any symbol that is not "a".
(Update 17:35 - I think what I want is a "rho" matcher. Can I specify this inside a text file? Or is there another way to do it?)
Thanks!
DoganCan - 2015-08-11 - 22:59
Hi Russell,
In OpenFst, special symbol information is not stored in the FSTs themselves, but are handled by the FST operations that accept these symbols as arguments. Unfortunately, special symbol matching operations are not available through the command line interface of standard OpenFst binaries. What you can do instead is to use a custom FST composition binary that can handle
rho
symbols.
Check out
https://github.com/dogancan/lexicon-unk. There you will find a tool called
fstphirhocompose
which allows bare bones matching of
phi
and
rho
symbols on the second FST. This tool won't do what you need out of the box since you want matched symbols to be copied to the output side. To fix that, simply change each occurrence of
MATCHER_REWRITE_NEVER
with
MATCHER_REWRITE_ALWAYS
in
fstphirhocompose.cc
before compiling the code. You use this tool exactly the same way you would use fstcompose, however you can also specify the labels corresponding to the
phi
and
rho
symbols in the second FST through the command line interface. Take a look at the
run.sh
script in that repo to see how this tool can be used to match special symbols.
Cheers,
Dogan
TopSort not working on FST
I run into a weird issue when I create a fst in C++ from a string by adding states and arcs. I have a function that does this work. When I write the FST using Write(), the final state is printed at the top. It looks like this:
5
0 1 66 66
0 2 81 81
0 3 2837 2837
1 2 10 10
1 3 749 749
1 4 930 930
2 3 66 66
2 4 324 324
2 5 864 864
3 4 8 8
3 5 146 146
4 5 30 30
fsttopsort on this fst does not sort the states either. I overcame this by just piping fstprint X | sort -k 1 -n | fstcompile
Any ideas as to why this might be happening and how it can be fixed inside the C++ code?
DoganCan - 2015-08-03 - 20:08
Hi Vivek,
It is hard to debug your problem without seeing the code. My best guess is that you are not setting state 0 as the initial state.
Cheers, Dogan
Thanks Dogan. I resolved this sometime back and should've added a comment. Your guess was right, I was setting the SetStart to str.size() instead of 0.
Generation of words accepted by a Finite-State Machine
Hello,
I would like to know how to print all the words that can be accepted for a Finite-State Machine. Note that I'm using acceptors. There is a library called Lextools who does the Job, specifically using a function of that library called lexfsmstrings
, but unfortunately it's not longer available free by the guys of AT&T.
You could ask for a very large number of n-best from fstshortestpath.
> all the words...
Provided the FSA/FST is acyclic (otherwise it would take somewhat longer), it is relatively easy to write a program in C++ that recursively iterates over all the arcs. With output size of around 10k per fst it was surprisingly fast.
KyleGorman - 2016-10-11 - 19:45
Pynini (pynin.opengrm.org) i also has something called StringPaths which enumerates all paths in an acyclic FST. If you don't want to use Python, you could build something off of the paths.h header therein.
If the FST is not acyclic, it would, of course, take not just "somewhat" longer but infinitely longer
Arcs labeled with strings
Hi again,
I have another question. I need to print using Graphviz a FST whose arc labels could be displayed as letters. I will highly appreciate a solution for this case. Thanks
Julio
I've already found a solution to this too. Thanks.
Putting only one expression on arcs
Hello,
I need to put only one expression in the arcs of my FST since I'm not<br> modelling transducers, but acceptors. <br>
I mean, for example, to put in an arc only "x" instead of "x:y". <br>
If someone has a quick solution to this, I will highly appreciate it.
Regards, <br>
Julio Carrasquel <br>
juliocarrasquel@gmail.com
I forgot to mention, I'm using in my project the command line library.
Nevermind,
I found a solution using the option --acceptor=true through the command fstdraw.
Thanks in any case.
Expected edit distance
Hi,
I was wondering if it is possible using the library to compute the expected edit distance between two distributions over strings.
So, if X and Y are FSAs that contain a single sequence each, and T is my edit distance transducer then shortestdist(X o T o Y) over the tropical SR computes the edit distance.
(a) Now assume X and Y are probabilistic FSAs (log SR), i.e. they define a valid distribution over a set of sequences, how do I compute the expected edit distance, i.e. \sum_x \sum_y P(x) P(y) ED(x,y)?
(b) And finally, given a probabilistic FST that defines a joint distribution for X and Y, how do I compute the expected edit distance over the joint, i.e. \sum_x \sum_y P(x,y) ED(x,y)?
I guess in the case (a) the expectation SR might help? But how do I do it in the case (b)?
Any help would really be appreciated.
Thanks,
Roland
DoganCan - 2015-05-11 - 19:58
Hi Roland,
There is an algorithm by Mohri for computing expected edit distance between weighted automata. See
this paper for details.
I implemented a slightly modified version of this algorithm some time ago. I just put together a simple setup based on that implementation and pushed it to GitHub in case you want to take a look at it.
https://github.com/dogancan/expected-edit-distance
Cheers,
Dogan
Amazing, thanks, I'll check it out!
Roland
Assigning the symbol table to FSTs in C++
AnuragDas - 2015-05-04 - 08:56
Hi,
I am able to build a FST from a string through C++. But the symbol table is showing "None" when i do a fstinfo. I want to add symbol table like "**Byte" which gets added automatically when we do "thraxmakedep" from a grammar file with --save_symbols set = True but through c++ only.
From weighted words to weighted phrase: how to isolate word scores at phrase level?
Such an operation must be classical in any concrete usage of FST cascade (OCR, speech, ...) but I struggle to express it
in OpenFST.
Take a simple example: the input is a character stream, made of words inter-spaced with blanks. A non-weighted and
functional lexicon FST maps sequences of letters to words. To deal with uncertainty, a weighted and non-functional
extension of it maps a sequence of letters to word/score pairs.
For handling a sequence of words, a concatenation of WFSTlex is needed, resulting in multiplying (in the weight
semiring) word scores but eventually, the contribution of each word is lost: the only score available encompasses the
entire input.
So a language model FST, further down in the cascade has no access to individual word scores, preventing it to compute a
context-dependent weighted average for example, associating a word x with a different coefficient depending on it
preceding y or z.
Compiling with mingw32
In case you want to compile openfst 1.4.1 with mingw (for cross-compiling it for windows from unix, for example). You need to use a mmap implementation such as mman-win32 and set LDFLAGS to use it.
make LDFLAGS=-lmman
Here is the patch:
diff -rupN old/src/lib/mapped-file.cc new/src/lib/mapped-file.cc
old/src/lib/mapped-file.cc 2014-04-30 00:15:18.000000000 +0200
+++ new/src/lib/mapped-file.cc 2015-02-26 08:53:46.339405863 +0100
@@ -20,6 +20,10 @@
#include <fcntl.h>
#include <unistd.h>
+#if (defined _WIN32 || defined _WIN64 || defined
WINDOWS || defined
MINGW32)
+#include <windows.h>
+#endif
+
namespace fst {
// Alignment required for mapping structures (in bytes.) Regions of memory
@@ -76,7 +80,13 @@ MappedFile* MappedFile::Map(istream* s,
size_t pos = spos;
int fd = open(source.c_str(), O_RDONLY);
if (fd = -1) {
+#if (defined _WIN32 || defined _WIN64 || defined
WINDOWS || defined
MINGW32)
+ SYSTEM_INFO system_info;
+ GetSystemInfo(&system_info);
+ int pagesize = system_info.dwPageSize;
+#else
int pagesize = sysconf(_SC_PAGESIZE);
+#endif
off_t offset = pos % pagesize;
off_t upsize = size + offset;
void *map = mmap(0, upsize, PROT_READ, MAP_SHARED, fd, pos - offset);
Is there an existing implementation to check if one node is reachable from another in an fst ?
NatS - 2015-02-24 - 12:44
KostyaK - 2015-02-14 - 17:49
Hi!
I used with etalon code to do lookahead composition:
<verbatim>
typedef fst::StdVectorFst GRAPH;
GRAPH _graph1, _graph2;
//...
fst::StdOLabelLookAheadFst graph1Look(_graph1);
std::auto_ptr<GRAPH> graph2TrueRelabel(_graph2.Copy(true));
fst::LabelLookAheadRelabeler<GRAPH::Arc>::Relabel(graph2TrueRelabel.get(), graph1Look, true);
fst::ArcSort(graph2TrueRelabel.get(), fst::StdILabelCompare());
std::auto_ptr<GRAPH> res(new GRAPH);
fst::Compose(graph1Look, *graph2TrueRelabel.get(), res.get());
</verbatim>
It's work great, but i want to do lookahead composition without PushWeightsComposeFilter for example. First, I try to repeat lookahead composition without class wrappers:
<verbatim>
typedef fst::StdVectorFst GRAPH;
GRAPH _graph1, _graph2;
//...
typedef fst::LabelLookAheadMatcher<fst::SortedMatcher<GRAPH>, fst::olabel_lookahead_flags> LOOK_MATCHER;
typedef fst::SortedMatcher<GRAPH> SORTED_MATCHER;
typedef fst::SequenceComposeFilter<LOOK_MATCHER, SORTED_MATCHER> SEQ_FILTER;
typedef fst::LookAheadComposeFilter<SEQ_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> LOOK_COMPOSE_FILTER;
typedef fst::PushWeightsComposeFilter<LOOK_COMPOSE_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> PUSH_WEIGHTS_FILTER;
typedef fst::PushLabelsComposeFilter<PUSH_WEIGHTS_FILTER, LOOK_MATCHER, SORTED_MATCHER, fst::MATCH_OUTPUT> PUSH_LABELS_FILTER;
typedef PUSH_LABELS_FILTER COMPOSE_FILTER;
// My compose options
fst::CacheOptions cacheOptions;
fst::ComposeFstImplOptions<LOOK_MATCHER, SORTED_MATCHER, COMPOSE_FILTER> composeOptions(cacheOptions);
composeOptions.matcher1 = new LOOK_MATCHER(_graph1, fst::MATCH_OUTPUT);
//Relabel _graph1 such as fst::StdOLabelLookAheadFst graph1Look(_graph1);
LOOK_MATCHER::MatcherData* matcherData = composeOptions.matcher1->GetData();
fst::LabelReachable<Arc> graph1Reachable(matcherData);
std::auto_ptr<GRAPH> graph1Relabeled(&_graph1);
graph1Reachable.Relabel(graph1Relabeled.get(), false);
graph1Relabeled->SetInputSymbols(NULL);
//Relabel _graph2 such as fst::LabelLookAheadRelabeler<GRAPH::Arc>::Relabel(graph2TrueRelabel.get(), graph1Look, true);
fst::LabelReachable<Arc> graph2Reachable(matcherData);
std::auto_ptr<GRAPH> graph2Relabeled(&_graph2);
graph2Reachable.Relabel(graph2Relabeled.get(), true);
fst::ArcSort(graph2Relabeled.get(), fst::StdILabelCompare());
composeOptions.matcher2 = new SORTED_MATCHER(*graph2Relabeled.get(), fst::MATCH_INPUT);
composeOptions.filter = new COMPOSE_FILTER(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions.matcher1, composeOptions.matcher2);
std::auto_ptr<GRAPH> composedRes(new GRAPH);
*composedRes = fst::ComposeFst<Arc>(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions);
</verbatim>
But result of my lookahead composition code is not equal to result of etalon code: my graph is much smaller than etalon. What I'm doing wrong? May be exists other way?
KostyaK - 2015-02-14 - 17:57
<verbatim>
test
</verbatim>
KostyaK - 2015-02-14 - 18:08
Text formatting don't work?
Etalon lookahead composition code:
http://pastebin.com/wbD2ZudV
My lookahead composition code:
http://pastebin.com/LjAUdv91
fstshortestpath --unique?
GezaKiss - 2015-02-10 - 14:50
I used this command, expecting it to produce all unique output strings - but it produced only one. The FST I tried it on contained just the two possible pronunciations of the orthographic form "record", and nothing else (here '}' separates the input and output symbols):
r}9r e}E c}k o}3r r}<epsilon> d}d
r}9r e}I c}k o}oU r}9r d}d
My questions:
1) What is the meaning of the --unique parameter?
2) How can I get the behavior I want using OpenFst (i.e. all the possible output strings)?
Thanks! Géza
KyleGorman - 2015-09-03 - 16:20
Hi Géza, I assume you've figured this out by now, but just in case: --unique simply indicates that the resulting shortest path FST will contain only distinct paths, but it doesn't specify how many distinct paths will be present. How many paths are returned is controlled by the --nshortest flag, which it defaults to 1.
(Presumably the reason --unique doesn't have the semantics you expected it to is that the machine might be cyclic, in which case termination would not be guaranteed. But any suggestions to improve the documentation would be taken under consideration.)
farcreate bug
DoganCan - 2015-01-15 - 21:36
Hi,
I just wanted to report a small bug in
farcreate
. When the
--file_list_input
option is set to
true
,
farcreate
skips the first input file on the command line due to an off by one error. Below is the relevant fix.
Cheers,
Dogan
diff -ur openfst-1.4.1.orig/src/include/fst/extensions/far/create.h openfst-1.4.1/src/include/fst/extensions/far/create.h
--- openfst-1.4.1.orig/src/include/fst/extensions/far/create.h 2014-04-25 02:55:40.000000000 -0700
+++ openfst-1.4.1/src/include/fst/extensions/far/create.h 2014-04-25 02:57:48.000000000 -0700
@@ -48,7 +48,7 @@
vector<string> inputs;
if (file_list_input) {
- for (int i = 1; i < in_fnames.size(); ++i) {
+ for (int i = 0; i < in_fnames.size(); ++i) {
ifstream istrm(in_fnames[i].c_str());
string str;
while (getline(istrm, str))
KyleGorman - 2015-09-10 - 19:49
Hi, thanks for that bug report. A fix should be in the next release.
--
Michael Riley - 2018-05-30