TWiki
>
GRM Web
>
NGramLibrary
>
NGramQuickTour
>
NGramMarginal
(2013-08-07,
BrianRoark
)
(raw view)
E
dit
A
ttach
---+ NGramMarginal ---++ Description (Available in versions 1.1.0 and higher.) This operation _re-estimates_ smoothed n-gram models by imposing marginalization constraints similar to those used for Kneser-Ney modeling on Absolute Discounting models. Specifically, the algorithm modifies lower-order distributions so that the expected frequencies of lower-order n-grams within the model are equal to the smoothed relative frequency estimates of the baseline smoothing method. Unlike Kneser-Ney, this algorithm may require multiple iterations to converge, due to changes in the state probabilities. ---++ Usage |<verbatim> ngrammarginalize [--opts] [in.mod [out.mod]] --iterations: type = int, default = 1, number of iterations of steady state probability calculation --max_bo_updates: type = int, default = 10, maximum within iteration updates to backoff weights --output_each_iteration: type = bool, default = false, whether to output a model after each iteration in addition to final model --steady_state_file: type = string, default = "", name of separate file to derive steady state probabilities </verbatim> | | |<verbatim> class NGramMarginal(StdMutableFst *model); </verbatim>| | ---++ Examples <verbatim> ngrammarginalize --iterations=5 earnest.mod >earnest.marg.mod </verbatim> --- <verbatim> int total_iterations = 5; vector<double> weights; for (int iteration = 1; iteration <= total_iterations; ++iteration) { StdMutableFst *model = StdMutableFst::Read("in.mod", true); NGramMarginal ngrammarg(model); ngrammarg.MarginalizeNGramModel(&weights, iteration, total_iterations); if (iteration == total_iterations) ngrammarg.GetFst().Write("out.mod"); delete model; } </verbatim> ---++ Caveats Note that this method assumes that the baseline smoothed model provides smoothed relative frequency estimates for all n-grams in the model. Thus the method is not generally applicable to models trained using Kneser-Ney smoothing, since lower-order n-gram weights resulting from that method do not represent relative frequency estimates. See reference below for further information on the algorithm. ---++ References B. Roark, C. Allauzen and M. Riley. 2013. "[[http://www.aclweb.org/anthology-new/P/P13/P13-1005.pdf][Smoothed marginal distribution constraints for language modeling]]". In _Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)_, pp. 43-52. The !BibTex entry is [[http://www.aclweb.org/anthology-new/P/P13/P13-1005.bib][here]].
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r3 - 2013-08-07
-
BrianRoark
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback