This template class builds an n-gram database. The first template argument (string_tmpl) specifies the type of a key (string), the second template argument (value_tmpl) specifies the type of a value associated with a key, and the third template argument (ngram_generator_tmpl) customizes generation of feature sets (n-grams) from keys.
This class is inherited by writer_base, which adds the functionality of managing a master string table (list of strings).
string_tmpl | The type of a string. | |
value_tmpl | The value type. This is required to be an integer type. | |
ngram_generator_tmpl | The type of an n-gram generator. |
Public Types | |
typedef string_tmpl | string_type |
The type representing a string. | |
typedef value_tmpl | value_type |
The type of values associated with key strings. | |
typedef ngram_generator_tmpl | ngram_generator_type |
The function type for generating n-grams from a key string. | |
typedef string_type::value_type | char_type |
The type representing a character. | |
Public Member Functions | |
ngramdb_writer_base (const ngram_generator_type &gen) | |
Constructs an object. | |
virtual | ~ngramdb_writer_base () |
Destructs an object. | |
void | clear () |
Clears the database. | |
bool | empty () |
Checks whether the database is empty. | |
int | max_size () const |
Returns the maximum length of keys in the n-gram database. | |
bool | fail () const |
Checks whether an error has occurred. | |
std::string | error () const |
Returns an error message. | |
bool | insert (const string_type &key, const value_type &value) |
Inserts a string to the n-gram database. | |
bool | store (const std::string &base) |
Stores the n-gram database to files. | |
Protected Types | |
typedef std::vector< string_type > | ngrams_type |
The type of an array of n-grams. | |
typedef std::vector< value_type > | values_type |
The vector type of values associated with an n-gram. | |
typedef std::map< string_type, values_type > | hashdb_type |
The type implementing an index (associations from n-grams to values). | |
typedef std::vector< hashdb_type > | indices_type |
The vector of indices for different n-gram sizes. | |
Protected Attributes | |
indices_type | m_indices |
The vector of indices. | |
const ngram_generator_type & | m_gen |
The n-gram generator. | |
std::stringstream | m_error |
The error message. |
simstring::ngramdb_writer_base< string_tmpl, value_tmpl, ngram_generator_tmpl >::ngramdb_writer_base | ( | const ngram_generator_type & | gen | ) | [inline] |
Constructs an object.
gen | The n-gram generator. |
bool simstring::ngramdb_writer_base< string_tmpl, value_tmpl, ngram_generator_tmpl >::empty | ( | ) | [inline] |
Checks whether the database is empty.
true
if the database is empty, false
otherwise. int simstring::ngramdb_writer_base< string_tmpl, value_tmpl, ngram_generator_tmpl >::max_size | ( | ) | const [inline] |
Returns the maximum length of keys in the n-gram database.
bool simstring::ngramdb_writer_base< string_tmpl, value_tmpl, ngram_generator_tmpl >::fail | ( | ) | const [inline] |
Checks whether an error has occurred.
true
if an error has occurred. std::string simstring::ngramdb_writer_base< string_tmpl, value_tmpl, ngram_generator_tmpl >::error | ( | ) | const [inline] |
Returns an error message.
bool simstring::ngramdb_writer_base< string_tmpl, value_tmpl, ngram_generator_tmpl >::insert | ( | const string_type & | key, | |
const value_type & | value | |||
) | [inline] |
Inserts a string to the n-gram database.
key | The key string. | |
value | The value associated with the string. |
bool simstring::ngramdb_writer_base< string_tmpl, value_tmpl, ngram_generator_tmpl >::store | ( | const std::string & | base | ) | [inline] |
Stores the n-gram database to files.
name | The prefix of file names. |
true
if the database is successfully stored, false
otherwise.