CRFsuite

A fast implementation of Conditional Random Fields (CRFs)

Introduction

CRFsuite is an implementation of Conditional Random Fields (CRFs) [Lafferty 01][Sha 03][Sutton] for labeling sequential data. Among the various implementations of CRFs, this software provides following features.

Fast training and tagging. The primary mission of this software is to train and use CRF models as fast as possible. See the benchmark result for more information.
Simple data format for training and tagging. The data format is similar to those used in other machine learning tools; each line consists of a label and attributes (features) of an item, consequtive lines represent a sequence of items (an empty line denotes an end of item sequence). This means that users can design an arbitrary number of features for each item, which is impossible in CRF++.
State-of-the-art training methods. CRFsuite implements:
- Limited-memory BFGS (L-BFGS) [Nocedal 80]
- Orthant-Wise Limited-memory Quasi-Newton (OWL-QN) method [Andrew 07]
- Stochastic Gradient Descent (SGD) [Shalev-Shwartz 07]
- Averaged Perceptron [Collins 02]
- Passive Aggressive [Crammer 06]
- Adaptive Regularization Of Weight Vector (AROW) [Mejer 10]
Forward/backward algorithm using the scaling method[Rabiner 90]. The scaling method seems faster than computing the forward/backward scores in logarithm domain.
Linear-chain (first-order Markov) CRF.
Performance evaluation on training. CRFsuite can output precision, recall, F1 scores of the model evaluated on test data.
An efficient file format for storing/accessing CRF models using Constant Quark Database (CQDB). It takes a little time to start up a tagger since a preparation is done only by reading an entire model file to a memory block. Retriving the weight of a feature is also very quick.
C++/SWIG API. CRFsuite provides an easy-to-use API for C++ language (crfsuite.hpp). CRFsuite also provides the SWIG interface for various languages (e.g., Python) on top of the C++ API. See the API Documentation for more information.

For more information about CRFsuite, please refer to these pages.

Download

The current release is CRFsuite version 0.12.

Source package (the source package requires libLBFGS 1.8 or later)
Win32 binary (this binary requires Microsoft Visual C++ 2010 Redistributable Package (x86) to be installed on your computers)
Linux 64bit binary

CRFsuite is distributed under the modified BSD license.

Please use the following BibTex entry when you cite CRFsuite in your papers.

@misc{CRFsuite,
	author = {Naoaki Okazaki},
	title = {CRFsuite: a fast implementation of Conditional Random Fields (CRFs)},
	url = {http://www.chokkan.org/software/crfsuite/},
	year = {2007}
}

Change log

Refer to the full change log. Updates for the latest release are:

CRFsuite 0.12 (2011-08-11)

[CORE] Optimized the implementation for faster training; approximately 1.4-1.5 x speed up.
[CORE] Faster routine for computing exp(x) using SSE2.
[CORE] Restructured the source code to separate routines for CRF graphical models and training algorithms; this is an initial attempt for implementing CRFs with different feature types (e.g., 2nd-order CRF, 1st-order transition features conditioned on observations) and different training algorithms.
[CORE] Implemented new training algorithms: Averaged Perceptron, Passive Aggressive, and Adaptive Regularization of Weights (AROW).
[CORE] Removed automatic generation of BOS/EOS features; one can use these features by inserting attributes to the first/last items (e.g., "__BOS__" at the first item and "__EOS__" at the last item).
[CORE] Fixed some memory-leak problems.
[CORE] Reduced memory usage in training.
[CORE] Fixed a crash problem when the model file does not exist in tagging.
[FRONTEND:LEARN] Training and test sets are maintained by group numbers; specify the group number for hold-out evaluation with "-e" option.
[FRONTEND:LEARN] Training algorithm is now specified by "-a" option instead of "-p algorithm=".
[FRONTEND:LEARN] Renamed some training parameters; for example, an L2 regularization coefficient is specified by "c2" instead of "regularization.sigma" (c2 = 0.5 / sigma * sigma).
[FRONTEND:LEARN] Show the list of parameters, default values, and descriptions with "-H" option.
[FRONTEND:LEARN] Removed the support of comment lines for simplicity; one may forget to escape '#' characters in a data set. CRFsuite now does not handle '#' as a special character.
[FRONTEND:TAGGER] Output probabilities of predicted sequences with "-p" option.
[FRONTEND:TAGGER] Output marginal probabilities of predicted items with "-i" option.
[API] Numerous changes in API for the enhancements.
[API] Renamed the library name "libcrf" to "libcrfsuite".
[API] Renamed the prefix "crf_" to "crfsuite_" in structure and function names.
[API] Implemented a high-level and easy-to-use API for C++/SWIG (crfsuite.hpp and crfsuite_api.hpp).
[API] Implemented the Python SWIG module and sample programs; writing a tagger is very easy with this module.
[SAMPLE] A new sample: Named Entity Recognition (NER) using the CoNLL2003 data set.
[SAMPLE] Rewritten samples.
[SAMPLE] A sample program (template.py) for using feature templates that are compatible with CRF++.
[SAMPLE] New samples in example directory: Named Entity Recognition (ner.py) using the CoNLL2003 data set, and part-of-speech tagging (pos.py).

Related software

References

[Andrew 07] Galen Andrew and Jianfeng Gao. “Scalable training of L1-regularized log-linear models”. Proceedings of the 24th International Conference on Machine Learning (ICML 2007). 33-40. 2007.

[Crammer 06] Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. “Online Passive-Aggressive Algorithms”. Journal of Machine Learning Research. 7. Mar. 551-585. 2006.

[Collins 02] Michael Collins. “Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms”. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). 1-8. 2002.

[Lafferty 01] John Lafferty, Andrew McCallum, and Fernando Pereira. “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”. Proceedings of the 18th International Conference on Machine Learning. 282-289. 2001.

[Malouf 02] Robert Malouf. “A comparison of algorithms for maximum entropy parameter estimation”. Proceedings of the 6th conference on Natural language learning (CoNLL-2002). 49-55. 2002.

[Mejer 10] Avihai Mejer and Koby Crammer. “Conﬁdence in Structured-Prediction using Conﬁdence-Weighted Models”. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010). 971-981. 2010.

[Nocedal 80] Jorge Nocedal. “Updating Quasi-Newton Matrices with Limited Storage”. Mathematics of Computation. 35. 151. 773-782. 1980.

[Rabiner 90] Lawrence R. Rabiner. “A tutorial on hidden Markov models and selected applications in speech recognition”. Readings in speech recognition. 267-296. 1990. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

[Sha 03] Fei Sha and Fernando Pereira. “Shallow parsing with conditional random fields”. NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 134-141. 2003.

[Shalev-Shwartz 07] Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. “Pegasos: Primal Estimated sub-GrAdient SOlver for SVM”. Proceedings of the 24th International Conference on Machine Learning (ICML 2007). 807-814. 2007.

[Sutton] Charles Sutton and Andrew McCallum. “An Introduction to Conditional Random Fields”. (submitting).