Double-array trie, which was proposed by Jun-ichi Aoe in the late 1980s, represents a trie in two parallel arrays (BASE and CHECK). Reducing the storage usage drastically, double array tries have been used in practical applications such as morphological analysis, spelling correction, and Japanese Kana-Kanji convertion.
Static Double Array Trie (DASTrie) is an implementation of static double-array trie. For the simplicity and efficiency, DASTrie focuses on building a static double array from a list of records sorted by dictionary order of keys. DASTrie does not provide the functionality for updating an existing trie, whereas the original framework of double array considers dynamic updates. DASTrie provides several features:
std::map
). Thus, it is very straightforward to put and obtain the record values for key strings, while some implementations just return unique integer identifiers for key strings. It is also possible to omit record values so that DASTrie behaves as a string set (e.g., std::set
).int
, double
) and strings (char*
and std::string
) can be used as a value type of records without any additional effort. User-defined types can also be used as record values only if two operators for serialization (operator<<()
and operator>>()
) are implemented.std::ostream
) with dastrie::builder::write() function. Serialized data can be embedded into files with other arbitrary data.std::istream
) (with dastrie::trie::read() function) or from a memory block (with dastrie::trie::assign() function) to which a serialized data is read or memory-mapped from a file.DASTrie is distributed under the term of the modified BSD license.
#include <fstream> #include <iostream> #include <string> #include <dastrie.h> int main(int argc, char *argv[]) { typedef dastrie::builder<char*, int> builder_type; typedef dastrie::trie<int> trie_type; typedef builder_type::record_type record_type; // Records sorted by dictionary order of keys. record_type records[] = { {"eight", 8}, {"five", 5}, {"four", 4}, {"nine", 9}, {"one", 1}, {"seven", 7}, {"six", 6}, {"ten", 10}, {"three", 3}, {"two", 2}, }; try { // Build a double-array trie from the records. builder_type builder; builder.build(records, records + sizeof(records)/sizeof(records[0])); // Store the double-array trie to a file. std::ofstream ofs("sample.db", std::ios::binary); builder.write(ofs); ofs.close(); } catch (const builder_type::exception& e) { // Abort if something went wrong... std::cerr << "ERROR: " << e.what() << std::endl; return 1; } // Open the trie file. std::ifstream ifs("sample.db", std::ios::binary); if (ifs.fail()) { std::cerr << "ERROR: Failed to open a trie file." << std::endl; return 1; } // Read the trie. trie_type trie; if (trie.read(ifs) == 0) { std::cerr << "ERROR: Failed to read a trie file." << std::endl; return 1; } /* Note that, although this sample program uses a file, a trie class can also receive a double-array trie directly from a builder, trie.assign(builder.doublearray(), builder.tail(), builder.table()); */ // Get the values of keys or the default value if the key does not exist. std::cout << trie.get("one", -1) << std::endl; // 1 std::cout << trie.get("other", -1) << std::endl; // -1 // Check the existence of a key and obtain its value. int value; if (trie.find("two", value)) { std::cout << value << std::endl; // 2 } // Check the existence of keys. std::cout << trie.in("ten") << std::endl; // 1 (true) std::cout << trie.in("eleven") << std::endl; // 0 (false) // Get records whose keys are prefixes of "eighteen". trie_type::prefix_cursor pfx = trie.prefix("eighteen"); while (pfx.next()) { std::cout << pfx.query.substr(0, pfx.length) << " " // eight << pfx.value << std::endl; // 8 } return 0; }
Implementation | Parameters | Build [sec] | Access [sec] | Database [bytes] |
---|---|---|---|---|
DASTrie 1.0 | Default | 182 | 1.72 | 131,542,283 |
darts 0.32 | Default | 22.3 | 1.25 | 406,358,432 |
DynDA 0.01 | Default | 335 | 2.53 | 195,374,108 |
Tx 0.12 | Default | 28.3 | 26.6 | 52,626,805 |
Implementation | Parameters | Build [sec] | Access [sec] | Database [bytes] |
---|---|---|---|---|
DASTrie 1.0 | Default | 0.30 | 0.05 | 4,534,065 |
DASTrie 1.0 | Compact (-c) | 0.27 | 0.06 | 3,783,249 |
darts 0.32 | Default | 0.65 | 0.04 | 12,176,328 |
DynDA 0.01 | Default | 0.28 | 0.07 | 7,095,226 |
TinyDA 1.23 | Default | 0.36 | 0.06 | 4,575,520 |
Tx 0.12 | Default | 0.68 | 0.70 | 1,646,558 |
The DASTrie distribution contains "a portable stdint.h", which is released by Paul Hsieh under the term of the modified BSD license, for addressing the compatibility issue of Microsoft Visual Studio 2008. The original code is available at: http://www.azillionmonkeys.com/qed/pstdint.h