TWiki> FST Web>FstEfficiency (revision 3)EditAttach


By reading the Quick Tour and working through the examples, users typically can create correct implementations of applications. What requires more experience is creating efficient implementations.

Efficient Representations

  • Fst Type: The specific FstType can affect both time and space in an application.
    • Mutable: VectorFst is recommended for general mutations, while EditFst can be used for small changes to an existing FST.
    • Immutable: ConstFst uses similar space to VectorFst but is much faster to read and write and possibly to access since it has less fragmentation. The CompactFst types permit smaller memory and file representations of acceptors, of unweighted acceptors and transducers and weighted and unweighted strings. These types admit optional memory-mapping (coming soon).
    • Specialized: For some applications, specialized FST types can be created that are tailored to the structure of the automaton. One relatively simple way to do this is to extend the CompactFst types by defining a new compactor. A more general, but complex, way is to define an entirely new FST type. For example, NGramFst (coming soon) can be used to very compactly represent n-gram language models. This type admits memory-mapping.
  • Arc Type: A particularly simple way to improve the representation size is to use an Arc that has StateId, Label, and Weight types appropriately sized to a problem.
  • Interface Type: When creating state or arc iterators, the specific interface type can greatly affect efficiency. Using the most specific Fst type as the template argument of the iterators will access specializations that avoid virtual function and other overhead of more generic calls. E.g., prefer ArcIterator<VectorFst<Arc>>(fst, s) to ArcIterator<Fst<Arc>>(fst, s), ArcIterator<ExpandedFst<Arc>>(fst, s), or ArcIterator<MutableFst<Arc>>(fst, s) if performance is critical.

Efficient Algorithms

General Issues

  • Association Order: Although an operation may be associative, efficiency may greatly vary depending on the order of application. For example, prefer (A ∪ B) ∪ C to A ∪ (B ∪ C) if A and B are small and C is large. Similar considerations apply to concatenation and intersection/composition.
  • Optimization: What optimizations should be applied is very problem-dependent. Epsilon removal, determinization and minimization of an unweighted automaton will give in the smallest, irredundant result. However this could be exponentially larger than the (non-deterministic) input, so applying some or none of these operations may be preferred. In the weighted and/or transducer case, the input may not even be determinizable although it can be determinized and minimized as an appropriately encoded automaton.
  • Eager vs Delayed Computation: If only a small portion of the result is visited (e.g., of a composition), then prefer delayed computation (e.g., invoke ComposeFst). On the other hand, if much of the result is visited, then prefer the eager version (e.g., invoke Compose) due to lower overhead.
  • Caching: Various operations such as the delayed FST operations and the CompactFst classes may use caching to avoid recomputation on repeated access. The size of the cache and whether (LRU) garbage collection is performed is controlled by options. In some cases, caching can also be disabled per state by arc iterator flags.
  • Properties: An FST maintains stored property bits such as whether it is an acceptor, acyclic, or input-label sorted. Many algorithms check for various of these properties, which will be computed in linear time if they are not already stored. This linear pass can thus be avoided if the needed bits are already set (e.g., by the user).

Algorithm-Specific Issues

  • Composition:
    • Matchers:
    • Filters:
    • State Tables:
  • Epsilon Removal:
  • Shortest Path/Distance: The shortest path/distance algorithms use a queue discipline to determine the order of expansion of states:
    • LifoQueue: when the input is unweighted
    • StateOrderQueue: when the input is topologically-ordered
    • TopOrderQueue: when the input is acyclic, but not topologically-ordered
    • ShortestFirstQueue: when the input is cyclic, but has no negative weights
    • FifoQueue: otherwise
    • Pointing hand The queue is selected based on the property bits. This choice and possible testing can be overridden by passing the queue as an option. With the ShortestFirstQueue, the shortest path algorithm can be passed the first_path=true option to stop when the first solution is discovered (correct if the weights are between 0 and 1). This is a case where delayed computation of the FST being searched may be preferred.

-- MichaelRiley - 12 May 2012

Edit | Attach | Watch | Print version | History: r8 | r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2012-05-13 - MichaelRiley
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback