CRFsuite
CRFSuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data.
The first priority of this software is to train and use CRF models as fast as possible even at the expense of its memory space and code generality.
CRFsuite runs 5.4 - 61.8 times faster than C++ implementations for training.
libLBFGS
libLBFGS is a C port of the implementation of Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method written by Jorge Nocedal in FORTRAN.
Unlike C codes generated automatically by f2c (Fortran 77 into C converter), this port includes changes based on my interpretations, improvements, optimizations, and clean-ups so that the ported code would be well-suited for a C code.
It facilitates some optimizations and enhancements such as callback interface, thread safety, and SSE/SSE2 optimization.
SimString
SimString is an implementation of a simple and efficient algorithm for approximate string matching.
Approximate string matching is the operation to retrieve strings in a string collection (database) whose similarity with a query string is no smaller than a threshold.
Finding not only identical but also similar strings, SimString facilitates various applications including spelling correction, fuzzy search, approximate dictionary matching, duplicate record detection, database merging.
Classias
Classias is a collection of machine-learning algorithms for classification.
Currently, this software supports the following formalizations: L1/L2-regularized logistic regression (aka. Maximum Entropy); L1/L2-regularized L1-loss linear-kernel Support Vector Machine (SVM); and Averaged perceptron.
It implements several algorithms for training classifiers: Averaged perceptron, L-BFGS, OWL-QN, Pegasos, Truncated Gradient.
C++ implementation of Constant Database (CDB++)
C++ implementation of Constant Database (CDB++) is a light-weight library for static hash database.
By including a single header file (cdbpp.h), one can build compact hash database and search entries very quickly.
However, CDB++ does not support dynamic update and deletion of elements from an existing database.
CDB++ is suitable for implementing a database in which fast look-ups of keys and their values are essential while a database update rarely occurs.
Static Double Array Trie (DASTrie)
Static Double Array Trie (DASTrie) is a C++ template library of static double-array trie.
For the simplicity and efficiency, DASTrie focuses on building a static double array from a list of records sorted by dictionary order of keys.
One can implement associative arrays (e.g., std::map) with arbitrary value types and/or sets (e.g., std::set), only by including a header file.
DASTrie implements double arrays whose each element is 4 or 5 bytes long, whereas most implementations consume 8 bytes for an double-array element.