TWiki> FST Web>FstAdvancedUsage (revision 23)EditAttach

OpenFst Advanced Usage

Below are a variety of topics covered in greater depth or of more specialized interest than found in the Quick Tour. Reading the Quick Tour first is recommended.

Items marked with a Recent changes are changes in OpenFst-1.2 (as yet to be released).

Arc Iterators

An arc iterator doc is used to access the transitions leaving an FST state. It has the form:

template <class F>
class ArcIterator {
  typedef typename F::Arc Arc;
  typedef typename Arc::StateId StateId;

 public:
  ArcIterator(const &F fst, StateId s);  
  // End of iterator? 
  bool Done() const;  
  // Current arc (when !Done) 
  const Arc& Value() const;  
  // Advance to next arc (when !Done)  
  void Next();  
  // Return current position 
  size_t Position();  
  // Return to initial position          
  void Reset();  
  // Arc access by position              
  void Seek(size_t pos);
};

It is templated on the Fst class F to allow efficient specializations but defaults to a generic version on the abstract base Fst class.

See here for conventions that arc iterator use must respect.

All current OpenFst library Seek() methods are constant time.

An example use of an arc iterator is shown here.

A MutableArcIterator doc is similar to an ArcIterator except its constructor takes a pointer to a MutableFst and it additionally has a SetValue() method.

Arc Filters

Arc filters are accepted by various operations to control which arcs are transitioned. An arc filter has the form:

template <class Arc>
class SomeArcFilter {
public: 
  // Return true iff arc is to be transitioned. 
  bool operator()(const Arc &arc) const;
};

Pre-defined arc filters include:

Name Description
AnyArcFilter Accept all arcs doc
EpsilonArcFilter Accept only arcs with input and output epsilons doc
InputEpsilonArcFilter Accept only arcs with input epsilons doc
OutputEpsilonArcFilter Accept only arcs with output epsilons doc

Arcs

An Arc is a type that represents an FST transition from a given source state. It specifies an input label, an output label, a weight, and a destination state ID and it has a type name. In particular, it has the following form:

struct SomeArc {
   typedef W Weight;
   typedef L Label;
   typedef S StateId;

   static const string &Type();

   Label ilabel;
   Label olabel;
   Weight weight;
   StateId nextstate;
};

where W is a valid weight type, and L and S are signed integral types.

The following arc types are defined in the OpenFst library:

Name Label Type State ID Type Weight Type Registered
GallicArc<A, S> A::Label A::StateId GallicWeight<A::Label, A::Weight, S>  
LexicographicArc<W1, W2> int int LexicographicWeight<W1, W2>  
LogArc int int LogWeight DONE
MinMaxArc int int MinMaxWeight  
ProductArc<W1, W2> int int ProductWeight<W1, W2>  
StdArc int int TropicalWeight DONE
StringArc<S> int int StringWeight<int, S>  

Additional arc information:

Base Fsts

Every Fst doc must specify an initial state, the final weights, arc and epsilon counts per states, an Fst type name, the Fst's properties, how to copy, read and write the Fst, and the input and output symbol tables (if any). In particular, the base Fst class has the inteface:

template <class A>
class Fst {
 public:
  typedef A Arc;
  typedef typename A::Weight Weight;
  typedef typename A::StateId StateId;
  
  // Initial state 
  virtual StateId Start() const = 0; 
  // States's final weight 
  virtual Final(StateId) const = 0: 
  // State's arc count 
  virtual NumArcs(StateId) const = 0; 
  // States's input epsilon count 
  virtual NumInputEpsilons(StateId) const = 0; 
  // State's output epsilon count 
  virtual NumOutputEpsilons(StateId) const = 0; 
  // If test=false, return stored properties bits for mask (some poss. unknown)
  // If test=true, return property bits for mask (computing o.w. unknown)  
  virtual Properties(uint64 mask, bool test) const = 0; 
  // Fst type name  
  virtual const string& Type() const = 0; 
  // Get a copy of this Fst 
  virtual Fst<A> *Copy() const = 0; 
  // Read an Fst from an input stream; returns NULL on error 
  static Fst<A> *Read(istream &strm, const FstReadOptions &opts); 
  // Read an Fst from a file; return NULL on error
  // Empty filename reads from standard input 
  static Fst<A> *Read(const string &filename); 
  // Write an Fst to an output stream; return false on error 
  virtual bool Write(ostream &strm, const FstWriteOptions &opts); 
  // Write an Fst to a file; return false on error
  // Empty filename writes to standard output 
  virtual bool Write(const string &filename); 
  // Return input label symbol table; return NULL if not specified 
  virtual const SymbolTable* InputSymbols() const = 0; 
  // Return output label symbol table; return NULL if not specified 
  virtual const SymbolTable* OutputSymbols() const = 0;
};

Fst is an abstract class (note the pure virtual methods). All OpenFst FSTs must meet this interface.

The companion state iterator and arc iterator classes provide access to the states and transitions of the FST.

Caching

Most of the delayed Fst classes use internal caching to save expanded states and arcs. This caching is controlled by this struct:

struct CacheOptions { 
  // enable GC 
  bool gc; 
  // # of bytes allowed before GC               
  size_t gc_limit;                          

  CacheOptions(bool g, size_t l) : gc(g), gc_limit(l) {}
  CacheOptions()
      : gc(FLAGS_fst_default_cache_gc),
        gc_limit(FLAGS_fst_default_cache_gc_limit) {}
};

All OpenFst cached Fsts have constructors that accept this (or a class derived from it) as an argument. The member defaults are controlled by global flags. These options can be used for:

  • Maximal caching: If gc is false, then any expanded state will be cached for the extent of the FST. This case is useful when states are revisited and memory is not a concern.

  • Bounded caching: If gc is true, then the cache will be garbage-collected when it grows past gc_limit. This case is useful when states are revisited and memory is a concern. This is the default case (based on the global flags).

  • Minimal caching: It is generally not possible to avoid all caching in such an FST since the cache is used to implement the arc iterators efficiently (creating an iterator computes and writes the state's arcs to the cache, iterating reads from the cache). However, if (1) gc is true, (2) gc_limit is 0, and (3) arcs iterators have been created (and then destroyed) only one state at a time, then only information for that state is cached and this case is especially optimized. This case is useful when states are not revisited (e.g. when a cached FST is simply being copied to a mutable FST).

Command Line Flags

OpenFst has several global options in the library proper that most users can ignore, leaving them with their default values:

Option Type Default Description
FLAGS_fst_compat_symbols bool true Require symbol tables to match when appropriate
FLAGS_fst_default_cache_gc bool true Enable garbage collection of cached Fsts
FLAGS_fst_default_cache_gc_limit int64 1048576 Byte size that triggers garbage collection of cached Fsts
FLAGS_fst_pair_parentheses string "" Characters enclosing the first weight of a printed pair weight (and derived classes) to ensure proper I/O of nested pair weights; must have size 0 (none) or 2 (open and close parenthesis)
FLAGS_fst_pair_separator string "," Character separator between printed pair weights; must be a single character
FLAGS_fst_verify_properties bool false Verify Fst properites are correctly set when queried

The first ensures the arguments of binary FST operations (e.g. composition) have compatible symbol tables (e..g output symbol table matches input symbol table for composition). The second two are used to control the caching of expanded state and arc information found in most delayed Fst classes; the default values should normally be satisfactory. The next two are used to control the text formating of ProductWeight and other weight pairs. The last is used to ensure that the properties of an FST have been correctly set; it is used for debugging only since it incurs considerable computational cost.

In each of the Fst distribution installed binaries, the above options, as well as any of those defined specific to the binary, can be set from the command line using e.g. --fst_default_cache_gc=false or --fst_pair_parenthesis="(" . Additionally, the option --help and --v=N (where N = 0,1,2,..) will print out usage information and set the verbosity level of logging, respectively. The flag processing is modeled after the Google gflags package.

In a user-defined binary, the command line options processing will all also work if the user calls:

SetFlags(usage, &argc, &argv, true);

In that case, the user can set his own flags as well, following the conventions in <fst/flags.h>.

Alternatively, the user can process options in his own way and directly assign to any of the above global options if he wishes to modify their defaults.

Composition Filters

A composition filter determines which matches are allowed to proceed in composition. The basic filters handle correct epsilon matching. In particular, they ensure that redundant epsilon paths, which would be incorrect with non-idempotent weights, are not created. More generally, composition filters can be used to block or modify composition paths for efficiency or specialized purposes. Their interface is:

template <class A>                            
class SomeComposeFilter {
 public:
   typedef A Arc;
   typedef ... FilterState; 
  
   // Required constructor. 
   SomeComposeFilter(const Fst<A> &fst1, const Fst<A> &fst2);                       
   // Return start state of filter. 
   FilterState Start() const; 
   // Specifies current composition state. 
   void SetState(StateId s1, StateId s2, const FilterState &f);               
                                                                               
   // Apply filter at current composition state to these transitions.
   // If an arc label to be matched is kNolabel, then that side does not consume a symbol.  
   // Returns the new filter state or, if disallowed, FilterState::NoState().
   // The filter is permitted to modify its inputs, e.g. for optimizations. 
   FilterState FilterArc(A *arc1, A *arc2) const;
 
   // Apply filter at current composition state to these final weights 
   // (cf. superfinal transitions). The filter may modify its inputs,
   // e.g. for optimizations. 
   void FilterFinal(Weight *final1, Weight *final2) const;
}; 
The filter's state is represented by the type SomeComposeFilter::FilterState and is stored in the composition state table tuple. It has the form:

class SomeFilterState {                                                          
  public:                                                             
   // Required constructors 
   SomeFilterState(); 
   SomeFilterState(const SomeFilterState &f);       
   // An invalid filter state.  
   static const SomeFilterState NoState();        
   // Maps state to integer for hashing.    
   size_t Hash() const;                   
   // Equality of filter states. 
   bool operator==(const SomeFilterState &f) const;       
   // Inequality of filter states.  
   bool operator!=(const SomeFilterState &f) const;   
   // Assignment to filter states. 
   SomeFilterState& operator=(const SomeFilterState& f);
}; 

The following composition filters are defined in the OpenFst library:

Name Description
SequenceComposeFilter Requires epsilons on FST1 to be read before epsilons on FST2 doc
AltSequenceComposeFilter Requires epsilons on FST2 to be read before epsilons on FST1 doc
MatchComposeFilter Requires epsilons on FST1 to be matched with epsilons on FST2 whenever possible doc

SequenceComposeFilter is the default composition filter. It can be changed by using the version of ComposeFst that accepts ComposeFstOptions. doc [bad link?]

Expanded Fsts

An ExpandedFst doc is an Fst that has an additional method that specifies the state count as well as methods to copy and read the expanded FST. In particular, an ExpandedFst class has the interface:

template <class A>
class ExpandedFst : public Fst<class A> {
 public:
  typedef A Arc;
  typedef typename A::StateId StateId;
  
  // State count 
  StateId NumStates(); 
  // Get a copy of this ExpandedFst 
  virtual ExpandedFst<A> *Copy() const = 0; 
  // Read an ExpandedFst from an input stream; returns NULL on error 
  static ExpandedFst<A> *Read(istream &strm, const FstReadOptions &opts); 
  // Read an ExpandedFst from a file; return NULL on error
  // Empty filename reads from standard input 
  static ExpandedFst<A> *Read(const string &filename);
};

ExpandedFst is an abstract class (note the pure virtual methods). Examples are VectorFst doc and ConstFst doc

FST Input/Output

The following code:

VectorFst<A> ifst;
...
ifst.Write("a.fst");
VectorFst<A> *ofst = VectorFst<A>::Read("a.fst");

writes and reads a defined Fst type (VectorFst) and arc type (A) to and from a file in a straight-forward way.

Library Registration

The call:

Fst<Arc> *fst = Fst<A>::Read("a.fst");

reads the same VectorFst from the file as above, but returns a base Fst. This form, useful for code that works generically for different Fst types, can not work unless the Fst and arc type are library-registered. Some arc types (see here) are already registered for all the Fst types defined in the OpenFst library. Other arc type A and Fst type F pairs can be registered with the following call:

REGISTER_FST(F, A);

To avoid code bloat in a given program, registering arc types, in particular, should be used sparingly.

Main Registration

In the above examples, the user provided the arc type as a template parameter. However, the call:

$ fstinfo a.fst

works e.g. for both StdArc and LogArc arcs. This is accomplished by calling in main(argc, argv):

return CALL_FST_MAIN(InfoMain, argc, argv);

where:

template <class Arc>
int InfoMain(int argc; char **argv, istream &strm, const FstReadOptions &opts) {
    Fst<Arc> *fst = Fst<Arc>::Read(strm, opts);
    ...
    return 0;
}

is a templated `main' function that does the arc-specific work. CALL_FST_MAIN passes to InfoMain the command line arguments and an opened stream to an Fst (opened from the first argument or standard input if no arguments). CALL_FST_MAIN does the type dispatch by examining the Fst's header and then passing on the (partially-read) input stream, which can be used by InfoMain to read in the actual Fst. This dispatch works only with arc and main function pairs that have been main-registered. Each OpenFst distribution binary registers its templated main function with the arc types marked registered here. An arc type A and main function M pair can be registered with the following call:

REGISTER_FST_MAIN(M, A);

To avoid code bloat in a given program, registering arc types should be used sparingly.

To use main registration in your own program, you need to include additionally <fst/main.h> and link additionally to libfstmain.so.

FST Dynamic Shared Objects

The examples above show how users can modify programs to be able to read new arc and Fst types. However, it would not be ideal to have to do so for all the distribution binaries or other existing programs. Instead, this can be done more easily with dynamic shared objects (DSOs).

To add a new Fst type, MyFst with MyFst::Type() = "my_fst", use the code:

extern "C" {
void my_fst_init() { 
  // Register some arc types with this Fst type 
  REGISTER_FST(MyFst, StdArc);
  REGISTER_FST(MyFst, LogArc);
}
}

compiled into a dynamic shared object my_fst.so. If my_fst.so can be found in the LD_LIBRARY_PATH (or equivalent), you should be able to read the new Fst type with existing programs.

To add a new arc type, MyArc with MyArc::Type() = "my_arc", use the code:

extern "C" {
void my_arc_init() { 
  // Register some Fst types with this arc type 
  REGISTER_FST(VectorFst, MyArc);
  REGISTER_FST(ConstFst, MyArc);

  // Register the OpenFst binaries with this arc type 
  REGISTER_FST_MAINS(MyArc); 
  // Register some other main with this arc type 
  REGISTER_FST_MAIN(SomeMain, MyArc);
}
}

compiled into a dynamic shared object my_arc.so. If can be found in LD_LIBRARY_PATH (or equivalent), you should be able to read the new arc type with existing programs.

Mappers

Mappers are function objects used by the Map operation to transform arcs and/or final states. A mapper has the form:

// This determines how final weights are mapped.  
enum MapFinalAction { 
  // A final weight is mapped into a final weight. An error
  // is raised if this is not possible.  
  MAP_NO_SUPERFINAL,

  // A final weight is mapped to an arc to the superfinal state                 
  // when the result cannot be represented as a final weight.                   
  // The superfinal state will be added only if it is needed.                   
  MAP_ALLOW_SUPERFINAL,

  // A final weight is mapped to an arc to the superfinal state                 
  // unless the result can be represented as a final weight of weight           
  // Zero(). The superfinal state is always added (if the input is              
  // not the empty Fst).                                                        
  MAP_REQUIRE_SUPERFINAL
};

// This determines how symbol tables are mapped.                                
enum MapSymbolsAction { 
  // Symbols should be cleared in the result by the map.                        
  MAP_CLEAR_SYMBOLS,

  // Symbols should be copied from the input FST by the map.                    
  MAP_COPY_SYMBOLS,

  // Symbols should not be modified in the result by the map itself.            
  // (They may set by the mapper).                                              
  MAP_NOOP_SYMBOLS
};

class SomeMapper {                                                               
  public:                                                                     
   // Maps an arc type A to arc type B.                                       
   B operator()(const A &arc);                                                
   // Specifies final action the mapper requires (see above).                 
   // The mapper will be passed final weights as arcs of the                  
   // form A(0, 0, weight, kNoStateId).                                       
   MapFinalAction FinalAction() const;                                        
   // Specifies input symbol table action the mapper requires (see above).    
   MapSymbolsAction InputSymbolsAction() const;                              
   // Specifies output symbol table action the mapper requires (see above).   
   MapSymbolsAction OutputSymbolsAction() const;                              
   // This specifies the known properties of an Fst mapped by this            
   // mapper. It takes as argument the input Fst's known properties             
   uint64 Properties(uint64 props) const;                                  
};

The following mappers are defined in the OpenFst library:

Name Description
FromGallicMapper Extracts output label from gallic weight doc
IdentityMapper Maps to self doc [bad link?]
InvertWeightMapper Reciprocate all non-0 weights doc
PlusMapper Adds (⊕) a constant to all weights doc
QuantizeMapper Quantize all weights doc
ReverseWeightMapper Reverse all weights doc
RmWeightMapper Map all non-0 weights to 1 doc
SuperFinalMapper Redirects final states to new superfinal state doc
TimesMapper (Right) multiplies (⊗) a constant to all weights doc
ToGallicMapper Combines output label and weight into gallic weight doc

ToGallicMapper and FromGallicMapper are used, for example, to implement transducer determinization and minimization using weighted acceptor versions of these algorithms. Other specialized mappers are used to implement Decode, Encode, Invert, and Project.

Matchers

Matchers can find and iterate through requested labels at FST states; their principal use is in composition matching. In the simplest form, these are just a search or hash keyed on labels. More generally, they may implement matching special symbols that represent sets of labels such as ρ (rest), σ (all) or φ (fail), which can be used for more compact automata representations and faster matching.

The Matcher interface is:

// Specifies matcher action. 
enum MatchType {
   MATCH_INPUT, // Match input label.                        
   MATCH_OUTPUT, // Match output label.                           
   MATCH_NONE, // Match nothing.                             
   MATCH_UNKNOWN, // Match type unknown.                        
};

template <class F>
class SomeMatcher {  
  public:          
   typedef F FST; 
   typedef F::Arc Arc; 
   typedef typename Arc::StateId StateId;           
   typedef typename Arc::Label Label;      
   typedef typename Arc::Weight Weight;   
  
   // Required constructors.                   
   SomeMatcher(const F &fst, MatchType type); 
   SomeMatcher(const SomeMatcher &matcher); 
  
   // Returns the match type that can be provided (depending on   
   // compatibility of the input FST). It is either  
   // the requested match type, MATCH_NONE, or MATCH_UNKNOWN. 
   // If 'test' is false, a constant time test is performed, but 
   // MATCH_UNKNOWN may be returned. If 'test' is true,  
   // a definite answer is returned, but may involve more costly 
   // computation (e.g., visiting the Fst).  
   MatchType Type(bool test) const; 

   // Specifies the current state.        
   void SetState(StateId s); 
 
   // This finds matches to a label at the current state.
   // Returns true if a match found. kNoLabel matches any
   // 'non-consuming' transitions, e.g., epsilon transitions, 
   // which do not require a matching symbol.          
   bool Find(Label label);

   // These iterate through any matches found: 
   // No more matches.   
   bool Done() const; 
   // Current arc (when !Done)    
   const A& Value() const; 
   // Advance to next arc (when !Done) 
   void Next();  
  
   // This specifies the known Fst properties as viewed from this 
   // matcher. It takes as argument the input Fst's known properties.  
   uint64 Properties(uint64 props) const; 
};

The following matchers are defined in the OpenFst library:

Name Description
SortedMatcher Binary search on sorted input doc
RhoMatcher<M> ρ symbol handling; templated on underlying matcher doc
SigmaMatcher<M> σ symbol handling; templated on underlying matcher doc
PhiMatcher<M> φ symbol handling; templated on underlying matcher doc

SortedMatcher requires the underlying Fst be sorted on the appropriate side. Find(0) matches any epsilons on the underlying Fst explicitly (as if they were any other symbol) but also returns an implicit self-loop (namely Arc(kNoLabel, 0, Weight::One(), current_state) if the match_type is MATCH_INPUT and Arc(0, kNoLabel, Weight::One(), current_state) if the match_type is MATCH_OUTPUT). This loop implements a 'non-consuming' match in composition (with kNoLabel informing composition that this is the case). A composition filter determines which of these epsilon transitions are ultimately accepted.

The special symbols referenced above behave as described in this table:

  Consumes no symbol Consumes symbol
Matches all ε σ
Matches rest φ ρ

The ε symbol is assigned label 0 by convention. The numeric label of the other special symbols is determined by a constructor argument to their respective matchers.

The ρ, σ and φ matchers augment the functionality of their underlying template argument matcher. In this way, matchers can be cascaded (with special symbol precedence determined by the order).

A design choice for these matchers is whether to remove the special symbol in the result (used for the ρ, σ, and φ matchers) or return it (used for epsilon-handling). The first case is equivalent to (but more efficient than) applying special-symbol removal prior to composition (c.f., epsilon removal). This case requires that only one of the FSTs in composition contain such symbols. The second case requires well-defined semantics and that composition proper identify and handle any non-consuming symbols on each FST. (The result of Find(kNoLabel) identifies on one FST, while the matcher's returning a kNolabel loop handles the other, both described above.)

The template Matcher<F> selects the pre-designated matcher for Fst type F; it is typically SortedMatcher. Composition uses this matcher by default. It can be changed by using the version of ComposeFst that accepts ComposeFstOptions doc [bad link?]. Note two matchers (of the same C++ type but different MatchType) are used in composition -- one for each FST. Whether actual match queries are performed on one or both FSTs depends on the matcher constructor arguments, the matcher capabilities (queried by Type()) and composition itself.

An example use of a matcher is here and an example use of a ρ matcher in composition is here.

Mutable Fsts

A MutableFst doc is an ExpandedFst that has additional methods that specifiy how to set the start state, final weights, properties and the input and output symbols, how to add and delete states and arcs, as well as methods to copy and read the mutable FST. In particular, a MutableFst class has the interface:

template <class A>
class MutableFst : public ExpandedFst<class A> {
 public:
  typedef A Arc;
  typedef typename A::StateId StateId;
  typedef typename A::Weight Weight;
  
  // Set the initial state 
  virtual void SetStart(StateId) = 0; 
  // Set the initial state        
  virtual void SetFinal(StateId, Weight) = 0; 
  // Set property bits wrt mask 
  virtual void SetProperties(uint64 props, uint64 mask) = 0; 
  // Add a state, return its ID 
  virtual StateId AddState() = 0; 
  // Add an arc to state 
  virtual void AddArc(StateId, const A &arc) = 0; 
  // Delete some states 
  virtual void DeleteStates(const vector<StateId>&) = 0; 
  // Delete all states 
  virtual void DeleteStates() = 0; 
  // Delete some arcs at state    
  virtual void DeleteArcs(StateId, size_t n) = 0; 
  // Delete all arcs at state 
  virtual void DeleteArcs(StateId) = 0;          
  // Get a copy of this MutableFst 
  virtual MutableFst<A> *Copy() const = 0; 
  // Read an MutableFst from an input stream; returns NULL on error 
  static MutableFst<A> *Read(istream &strm, const FstReadOptions &opts); 
  // Read an MutableFst from a file; return NULL on error
  // Empty filename reads from standard input 
  static MutableFst<A> *Read(const string &filename); 
  // Set input label symbol table; NULL signifies not unspecified                
  virtual void SetInputSymbols(const SymbolTable* isyms) = 0; 
  // Set output label symbol table; NULL signifies not unspecified               
  virtual void SetOutputSymbols(const SymbolTable* osyms) = 0;
};

MutableFst is an abstract class (note the pure virtual methods). An example is VectorFst doc .

The companion mutable arc iterator class provides access to and modification of the transitions of the FST

Natural Orders

The natural order ≤ associated with a semiring is defined as a ≤ b iff a ⊕ b = a. In the OpenFst library, we define the strict version of this order as:

template <class W>
NaturalLess() {
  bool operator()(const W &w1, const W &w2) const {
    return (Plus(w1, w2) == w1) && w1 != w2;
  }
};

An order is left monotonic w.r.t a semring iff a ≤ b ⇒ ∀c, c ⊕ a ≤ c ⊕ b and c ⊗ a ≤ c ⊗ b; right monotonic is defined similarly. An order is negative iff 10.

The natural order is a left (right) monotonic and negative partial order iff the semiring is idempotent and left (right) distributive. It is a total order iff the semiring has the path property. See Mohri, "Semiring Framework and Algorithms for Shortest-Distance Problems", Journal of Automata, Languages and Combinatorics 7(3):321-350, 2002.

This is the default total order (under the requirements above) that we use for the shortest path and pruning algorithms. This order is the natural one to use given that it generally needs to be total, monotonic and. negative: total so that all weights can be compared, monotonic so there is a practical algorithm, and negative so that the "free" weight 1 is preferred to the "disallowed" weight 0.

Operation Options

Many FST operations have versions that accept options, especially option structures, that have not been documented in this Wiki for brevity other than to mention some of the parameters that can be changed. For example, most of the delayed Fsts have constructors that accept options that control caching behavior.

Here is an example that selects minimal caching and the rho matcher (for fst2 ρ's) in composition::

typedef RhoMatcher< SortedMatcher<StdFst> > RM;

ComposeFstOptions<StdArc, RM> opts;
opts.gc_limit = 0;
opts.matcher1 = new RM(fst1, MATCH_NONE, kNoLabel);
opts.matcher2 = new RM(fst2, MATCH_INPUT, SomeRhoLabel);

StdComposeFst cfst(fst1, fst2, opts);

Follow the links to the code under each operation's documentation for the specific details.

Properties

Each Fst has associated with it a set of stored properties that assert facts about it. These are queried in an FST with the Properties() method and set in a MutableFst with the SetProperties() method. OpenFst library operations use these properties to optimize their performance. OpenFst library operations and mutable FSTs attempt to preserve as much property information in their results as possible without significant added computation.

Some properties are binary - they are either true or false. For each such property, there is a single stored bit that is set if true and not set if false. The binary Fst properties are:

Name Description
kExpanded Is an ExpandedFst
kMutable Is a MutableFst

Other properties are trinary - they are either true, false or unknown. For each such property, there are two stored bits; one is set if true, the other is set if false and neither is set if unknown.

Type Name Description
Acceptor kAcceptor Input and output label are equal for each arc
  kNotAcceptor Input and output label are not equal for some arc
Accessible kAccessible All states reachable from the initial state
  kNotAccessible Not all states reachable from the initial state
  kCoAccessible All states can reach a final state
  kNotCoAccessible Not all states can reach a final state
Cyclic kCyclic Has cycles
  kAcyclic Has no cycles
  kInitialCyclic Has cycles containing the initial state
  KInitialAcyclic Has no cycles containing the initial state
Deterministic kIDeterministic Input labels are unique leaving each state
  kNonIDeterministic Input labels are not unique leaving some state
  kODeterministic Output labels are unique leaving each state
  kNonODeterministic Output labels are not unique leaving some state
Epsilons kEpsilons Has input/output epsilons
  KNoEpsilons Has no input/output epsilons
  kIEpsilons Has input epsilons
  KNoIEpsilons Has no input epsilons
  kOEpsilons Has output epsilons
  KNoOEpsilons Has no output epsilons
Sorted kILabelSorted Input labels sorted for each state
  kNotILabelSorted Input labels not sorted for each state
  kOLabelSorted Output labels sorted for each state
  kNotOLabelSorted Output labels not sorted for each state
  kTopSorted States topologically sorted
  kNotTopSorted States not topologically sorted
Weighted kWeighted Non-trivial arc or final weights
  kNotWeighted Only trivial arc and final weights

The call fst.Properties(mask, false) returns the stored property bits set in the mask bits; some properties may be unknown. it is a constant-time operation. The call fst.Properties(mask, true) returns the stored property bits set in the mask bits after computing and updating any of those set in the mask that are unknown. It is a linear-time (O(V + E)) operation if any of the requested bits were unknown.

State Iterators

A state iterator doc is used to access the states of an FST. It has the form:

template <class F>
class StateIterator {
  typedef typename F::Arc Arc;
  typedef typename Arc::StateId StateId;

 public:
  StateIterator(const &F fst); 
  // End of iterator? 
  bool Done() const; 
  // Current state ID (when !Done)  
  StateId Value() const; 
  // Advance to next state (when !Done) 
  void Next(); 
  // Return to initial position           
  void Reset();
};              

It is templated on the Fst class F to allow efficient specializations but defaults to a generic version on the abstract base Fst class.

See here for conventions that state iterator use must respect.

An example use of a state iterator is shown here.

State Queues

State queues are used by, among others, the shortest path and shortest distance algorithms and by the Visit operation. A state queue has the form:

template <class StateId>                                                      
class SomeQueue {
 public: 
   // Ctr: may need args (e.g., Fst, comparator) for some queues 
   SomeQueue(...); 
   // Returns the head of the queue 
   StateId Head() const; 
   // Inserts a state 
   void Enqueue(StateId s); 
   // Removes the head of the queue  
   void Dequeue(); 
   // Updates ordering of state s when weight changes, if necessary  
   void Update(StateId s); 
   // Does the queue contain no elements? 
   bool Empty() const; 
   // Remove all states from queue  
   void Clear();
};

Pre-defined state queues include:

Queue Description
AutoQueue Automatically-selected from Fst properties doc
FifoQueue First-In, first-Out doc
LifoQueue Last-In, first-Out doc
SccQueue Component graph top-ordered meta-queue doc
ShortestFirstQueue Priority (least weight) doc
StateOrderQueue State-ID ordered doc
TopOrderQueue Topologically ordered doc

Some queues accept arc filters to control which transitions are explored.

State Tables

State tables determine the bijective mapping between state tuples (e.g. in composition triples of two FST states and a composition filter state) and their corresponding state IDs. They are classes, templated on state tuples, of the form:

template <class T>                                                          
class SomeStateTable { 
   typedef typename T StateTuple;

   // Required constructors.                                                  
   SomeStateTable(); 
   // Lookup state ID by tuple. If it doesn't exist, then add it.             
   StateId FindState(const StateTuple &);   
   // Lookup state tuple by state ID.     
   const StateTuple<StateId> &Tuple(StateId) const;    
 };

A state tuple has the form:

template <class S>
struct SomeStateTuple {
   typedef typename S StateId; 

   // Required constructor. 
   SomeStateTuple(); 
   // Data 
   ...                    
};

A specific state tuple is a ComposeStateTuple doc [bad link?] that has data members StateId state_id1, StateId state_id2, and FilterState filter_state.

The following state tables are defined in the OpenFst library:

Name Description
HashStateTable Hash map implementation doc
CompactHashStateTable Hash set implementation doc
VectorStateTable Vector implementation doc
VectorHashStateTable Vector and hash set implementation doc
ErasableStateTable Deque implementation - permits erasures doc

Different state tables provide different time and space tradeoffs for applications.

Composition state tables are defined using state tables with ComposeStateTuple. They are the principal data structure used by composition other than the result cache.

The following composition state tables are defined in the OpenFst library:

Name State Table Description
GenericComposeStateTable CompactHashStateTable General-purpose choice doc
ProductComposeStateTable VectorStateTable Efficient when the composition state space is densely populated doc
StringDetComposeStateTable VectorStateTable Efficient when FST1 is a string and FST2 is deterministic doc
DetStringComposeStateTable VectorStateTable Efficient when FST1 is deterministic and FST2 is a string doc
EraseableComposeStateTable ErasableStateTable Allows composition state tuple erasure doc

GenericComposeStateTable is the default composition state table. It can be changed by using the version of ComposeFst that accepts ComposeFstOptions. doc [bad link?]

Symbol Tables

Symbol tables store the bijective mapping between textual labels, used in reading and printing an FST textual file, and their integer assignment, used in the FST's internal representation. Symbol tables are usually read in with fstcompile, can be stored by the FST, and used to print out the FST with fstprint,. Here are some examples of manipulating symbol tables directly:

// Various ways to reading symbol tables 
StdFst *fst = StdFst::Read("some.fst");
SymbolTable *isyms = fst->InputSymbolTable();
SymbolTable *osyms = fst->OutputSymbolTable();

SymbolTable *syms = SymbolTable::ReadText("some.syms");

// Adding and accessing symbols and keys 
syms->AddSymbol("kumquat", 7);
int64 key = syms->Find("kumquat");
string symbol = syms->Find(7);

// Various ways of writing symbol tables 
fst->SetInputSymbols(isyms);
fst->SetOutputSymbols(osyms);
fst->Write("some.fst"):

syms->WriteText("some.syms");

User-defined Fst Arcs and Weights

A user may define his own weight type so long as it meets the necessary requirements.

A user may define his own arc type so long as has the right form. Some Fst I/O with new arc types requires registration.

User-defined Fst Classes Work in progress, under construction

TBA

Visitors

The simplest way to traverse an FST is in state order using a state iterator.

A very general traversal method is to use:

Visit(fst, visitor, queue); doc [bad link?]

where the visitor object specfies the actions taken in the traversal while the state queue object specifies the traversal order. A visitor has the form:

// Visitor Interface - class determines actions taken during a visit.           
// If any of the boolean member functions return false, the visit is            
// aborted by first calling FinishState() on all unfinished (grey)              
// states and then calling FinishVisit().                                 
                                          
template <class Arc>
class SomeVisitor {
public:
   typedef typename Arc::StateId StateId;

   SomeVisitor(T *return_data); 
   // Invoked before visit 
   void InitVisit(const Fst &fst); 
   // Invoked when state discovered (2nd arg is visitation root) 
   bool InitState(StateId s, StateId root); 
   // Invoked when arc to white/undiscovered state examined 
   bool WhiteArc(StateId s, const Arc &a); 
   // Invoked when arc to grey/unfinished state examined 
   bool GreyArc(StateId s, const Arc &a); 
   // Invoked when arc to black/finished state examined 
   bool BlackArc(StateId s, const Arc &a); 
   // Invoked when state finished 
   void FinishState(StateId s); 
   // Invoked after visit 
   void FinishVisit(); 
};

While a depth-first search can be implemented using Visit() with the LifoQueue(), it is often better to use the more specialized DFSVisit()doc [bad link?] in <fst/dfs-visit.h> since it is somewhat more space-efficient and the specialized visitor interface described there has additional funcitionality for a DFS.

Pre-defined FST visitors include:

Visitor Type Description
CopyVisitor Visit Copies in a queue-specified order doc
SccVisitor DfsVisit Finds strongly-connected components, accessibility and coaccessibility doc
TopOrderVisitor DfsVisit Finds topological order doc

The visit operations optionally accept arc filters to control which transitions are explored.

Weights

A Weight is a type that is used to represent the cost of taking transitions in an FST.

The following basic weight templates are defined in the OpenFst library:

Semiring Name Set
(Plus)

(Times)
0
(Zero)
1
(One)
Notes
Lexicographic LexicographicWeight<W1, W2> W1 X W2 min W1 X ⊗W2 (0W1,0W2) (1W1,1W2) min: lexicographic order w.r.t.
W1 and W2 natural orders
doc
Log LogWeightTpl<T> [-∞, ∞] -log(e-x + e-y) + 0 T: floating point doc
MinMax MinMaxWeightTpl<T> [-∞, ∞] min max -∞ T: floating point doc
Product ProductWeight<W1, W2> W1 X W2 W1 X ⊕W2 W1 X ⊗W2 (0W1,0W2) (1W1,1W2)   doc
String StringWeight<L, STRING_LEFT> L* ∪ {∞} longest com. prefix ε L: signed integral doc
  StringWeight<L, STRING_RIGHT> L* ∪ {∞} longest com. suffix ε L: signed integral doc
Tropical TropicalWeightTpl<T> [-∞, ∞] min + 0 T: floating point doc

The following weight types have been defined in the OpenFst library in terms of the above:

Name Type
GallicWeight<L, W, S> ProductWeight<StringWeight<L, S>, W>
LogWeight LogWeightTpl<float>
MinMaxWeight MinMaxWeightTpl<float>
TropicalWeight TropicalWeightTpl<float>

Weight pairs, such as ProductWeight and LexicographicWeight, can use command line flags to control their textual formatting. FLAGS_fst_pair_weight_separator is printed between the weights (default: ","). FLAGS_fst_pair_parentheses (default: "") brackets the pair; if you create complex nested pairs (i.e., tuples), they may need to be printed with non-empty brackets (e.g. "()") to ensure correct parsing if read back in. These affect only textual (not binary) I/O.

Additional weight information:

-- MichaelRiley - 27 Feb 2009

Edit | Attach | Watch | Print version | History: r73 | r25 < r24 < r23 < r22 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r23 - 2010-07-30 - MichaelRiley
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback