TWiki
>
GRM Web
>
NGramLibrary
>
NGramQuickTour
>
NGramMarginal
(revision 1) (raw view)
Edit
Attach
---+ NGramMarginal ---++ Description (Available in versions 1.1.0 and higher.) %ICON{wip}% (UNDER CONSTRUCTION) This operation _re-estimates_ smoothed n-gram models by imposing marginalization constraints similar to those used for Kneser-Ney modeling on Absolute Discounting models. Specifically, the algorithm modifies lower-order distributions so that the expected frequencies of lower-order n-grams within the model are equal to the smoothed relative frequency estimates of the baseline smoothing method. Unlike Kneser-Ney, this algorithm may require multiple iterations to converge, due to changes in the state probabilities. ---++ Usage |<verbatim> ngrammarginalize [--opts] [in.mod [out.mod]] --iterations: type = int, default = 1, number of iterations of steady state probability calculation --max_bo_updates: type = int, default = 10, maximum within iteration updates to backoff weights --output_each_iteration: type = bool, default = false, whether to output a model after each iteration in addition to final model --steady_state_file: type = string, default = "", name of separate file to derive steady state probabilities </verbatim> | | |<verbatim> class NGramMarginal(StdMutableFst *model); </verbatim>| | ---++ Examples <verbatim> ngrammarginalize --iterations=5 earnest.mod >earnest.marg.mod </verbatim> --- <verbatim> int total_iterations = 5; vector<double> weights; for (int iteration = 1; iteration <= total_iterations; ++iteration) { StdMutableFst *model = StdMutableFst::Read("in.mod", true); NGramMarginal ngrammarg(model); ngrammarg.MarginalizeNGramModel(&weights, iteration, total_iterations); if (iteration == total_iterations) ngrammarg.GetFst().Write("out.mod"); delete model; } </verbatim> ---++ Caveats Note that this method assumes that the baseline smoothed model provides smoothed relative frequency estimates for all n-grams in the model. Thus the method is not generally applicable to models trained using Kneser-Ney smoothing, since lower-order n-gram weights resulting from that method do not represent relative frequency estimates. See reference below for further information on the algorithm. ---++ References B. Roark, C. Allauzen and M. Riley. "Smoothed marginal distribution constraints for language modeling". To appear in _Proceedings of the Association for Computational Linguistics (ACL)_. 2013. August, Sofia, Bulgaria. (A link to this paper will be provided as soon as the ACL posts the final version in the anthology.)
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r1 - 2013-07-18
-
BrianRoark
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback