Difference: GrmNGramForum (69 vs. 70)

Revision 702014-05-14 - AaronDunlop

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

OpenGrm NGram Forum

Line: 17 to 17
 
Added:
>
>

Creating a character n-gram model

AaronDunlop - 2014-05-14 - 17:49

Is there a path similar to that in the quick tour (http://openfst.cs.nyu.edu/twiki/bin/view/GRM/NGramQuickTour#OpenGrm_NGram_Library_Quick_Tour) for character n-gram models?

I have a trivial corpus that works fine with the instructions described there for a word n-gram model, but I can't figure out how to use ngramcount to build a character model.

I'm trying (with a 4-line corpus file titled 'animals.txt'):

farcompilestrings -token_type=utf8 -keep_symbols=1 animals.txt >animals.char.far ngramcount -order=5 animals.char.far >animals.cnt

The 'farcompilestrings' command appears to work, but ngramcount fails with the error "ERROR: None of the input FSTs had a symbol table". I've tried with '-keep_symbols=0', and and without any '-keep_symbols' argument, and the results seem to be the same

I tried creating a symbol file with the UTF-8 characters in my 'corpus', but farcompilestrings doesn't like the space symbol in that file (it reports 'ERROR: SymbolTable::ReadText: Bad number of columns (1), file = animals.char.syms, line = 5:<>').

It seems like there must be an option to tell ngramcount to use bytes or UTF-8 characters as symbols (analogous to farcompilestrings '-token_type=utf8'), but I haven't found it.

Thanks in advance for any suggestions.

<--/commentPlugin-->
Log In

 

Error in installing opengrm

AbhirajTomar - 2014-04-04 - 05:06

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback