TWiki
>
GRM Web
>
NGramLibrary
>
NGramQuickTour
>
NGramRead
(revision 3) (raw view)
Edit
Attach
---+ NGramRead ---++ Description It has flags for specifying the format of the text input, currently one of two options: * By default or with the flag _--counts_, the text file is read as a sorted list of n-grams with their count. The format is:<br> _w<sub>1</sub> ... w<sub>k</sub> cnt_ <br> where _w<sub>1</sub> ... w<sub>k</sub>_ are the _k_ words of the n-gram and _cnt_ is the (float) count of that n-gram. The n-grams in the list must be lexicographically ordered. An n-gram count automaton is built from the input. * By using the flag _--ARPA_, the file is read as an n-gram model in the well-known ARPA format. An n-gram model automaton is built from the input. By default, =ngramread= constructs a symbol table on the fly, consisting of _⟨epsilon⟩_ and every observed symbol in the text. With the flag _--symbols=filename_ you can provide the filename to provide a fixed symbol table, in the standard _OpenFst_ format. All symbols in the input text not found in the provided symbol table will be mapped to an OOV symbol, which is _⟨unk⟩_ by default. The flag _--OOV_symbol_ can be used to specify the OOV symbol in the provided symbol table if it is not _⟨unk⟩_. The tokens _⟨s⟩_ and _⟨/s⟩_ are taken to represent start-of-sequence and end-of-sequence, respectively. Neither of these symbols are used in our automaton format (see above). ---++ Usage ---++ Examples ---++ Caveats -- Main.MichaelRiley - 09 Dec 2011
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r3 - 2011-12-13
-
MichaelRiley
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback