The CRFSuite C++/SWIG API provides a high-level and easy-to-use library module for a number of programming languages. The C++/SWIG API is a wrapper for the CRFSuite C API.
The C++ library is implemented in two header files, crfsuite_api.hpp and crfsuite.hpp. One can use the C++ API only by including crfsuite.hpp. The C++ library has a dependency to the CRFSuite C library, which means that the C header file (crfsuite.h) and libcrfsuite library are necessary.
The SWIG API is identical to the C++ API. Currently, the CRFsuite distribution includes a Python module for CRFsuite. Please read README under swig/python directory for the information to build the Python module.
This code demonstrates how to use the crfsuite.Trainer object. The script reads a training data from STDIN, trains a model using 'l2sgd' algorithm, and stores the model to a file (the first argument of the commend line).
#!/usr/bin/env python import crfsuite import sys # Inherit crfsuite.Trainer to implement message() function, which receives # progress messages from a training process. class Trainer(crfsuite.Trainer): def message(self, s): # Simply output the progress messages to STDOUT. sys.stdout.write(s) def instances(fi): xseq = crfsuite.ItemSequence() yseq = crfsuite.StringList() for line in fi: line = line.strip('\n') if not line: # An empty line presents an end of a sequence. yield xseq, tuple(yseq) xseq = crfsuite.ItemSequence() yseq = crfsuite.StringList() continue # Split the line with TAB characters. fields = line.split('\t') # Append attributes to the item. item = crfsuite.Item() for field in fields[1:]: p = field.rfind(':') if p == -1: # Unweighted (weight=1) attribute. item.append(crfsuite.Attribute(field)) else: # Weighted attribute item.append(crfsuite.Attribute(field[:p], float(field[p+1:]))) # Append the item to the item sequence. xseq.append(item) # Append the label to the label sequence. yseq.append(fields[0]) if __name__ == '__main__': # This demonstrates how to obtain the version string of CRFsuite. print crfsuite.version() # Create a Trainer object. trainer = Trainer() # Read training instances from STDIN, and set them to trainer. for xseq, yseq in instances(sys.stdin): trainer.append(xseq, yseq, 0) # Use L2-regularized SGD and 1st-order dyad features. trainer.select('l2sgd', 'crf1d') # This demonstrates how to list parameters and obtain their values. for name in trainer.params(): print name, trainer.get(name), trainer.help(name) # Set the coefficient for L2 regularization to 0.1 trainer.set('c2', '0.1') # Start training; the training process will invoke trainer.message() # to report the progress. trainer.train(sys.argv[1], -1)
This code demonstrates how to use the crfsuite.Tagger object. The script loads a model from a file (the first argument of the commend line), reads a data from STDIN, predicts label sequences.
#!/usr/bin/env python import crfsuite import sys def instances(fi): xseq = crfsuite.ItemSequence() for line in fi: line = line.strip('\n') if not line: # An empty line presents an end of a sequence. yield xseq xseq = crfsuite.ItemSequence() continue # Split the line with TAB characters. fields = line.split('\t') item = crfsuite.Item() for field in fields[1:]: p = field.rfind(':') if p == -1: # Unweighted (weight=1) attribute. item.append(crfsuite.Attribute(field)) else: # Weighted attribute item.append(crfsuite.Attribute(field[:p], float(field[p+1:]))) # Append the item to the item sequence. xseq.append(item) if __name__ == '__main__': fi = sys.stdin fo = sys.stdout # Create a tagger object. tagger = crfsuite.Tagger() # Load the model to the tagger. tagger.open(sys.argv[1]) for xseq in instances(fi): # Tag the sequence. tagger.set(xseq) # Obtain the label sequence predicted by the tagger. yseq = tagger.viterbi() # Output the probability of the predicted label sequence. print tagger.probability(yseq) for t, y in enumerate(yseq): # Output the predicted labels with their marginal probabilities. print '%s:%f' % (y, tagger.marginal(y, t)) print