Classias - Performance test

Table of Contents

Introduction
rcv1.binary

Introduction

This page reports the performance of different algorithms for training classifiers. This is the list of implementations used for the experiments.

Classias 1.0
LIBLINEAR 1.4
LIBSVM 2.89

The experiments use the training (and test) sets of rcv1.binary (binary). All of the experiments were run on Debian GNU/Linux 4.0 over Intel Xeon 5140 CPU (2.33GHz) with 4GB main memory.

rcv1.binary

Table 1 reports the training speed and accuracy. Classias, LIBLINEAR, and LIBSVM are roughly comparable in terms of the accuracy. However, LIBLINEAR trains classification models much faster than Classias. We are planning to investigate whether the speed difference can be reduced by tuning the parameters (initial learning rate, convergence criterion) of the training algorithms in Classias.

Table 1. Performance of training

Software	Algorithm	Obj	Acc	# Iters	Time	Speed	L2norm	Active	C
					[s]	[s/iter]
Classias	L1-reg LR (OW-LQN)	1035.8	0.9664	233.8	8.91	0.038	299.5	1562	0.10
Classias	L2-reg LR (L-BFGS)	567.9	0.9705	116.2	4.35	0.037	179.9		0.01
Classias	Averaged Perceptron	28.8	0.9584	100.0	1.73	0.014	95.8		1.00
Classias	L2-reg LR (Pegasos)	573.6	0.9703	359.6	12.58	0.032	176.7		0.01
Classias	L2-reg L1-SVM (Pegasos)	1305.7	0.9707	203.6	6.55	0.029	48.3		0.40
Classias	L1-reg LR (TG)	437.0	0.9703	998.8	46.15	0.043	52743.1	14427	0.02
Classias	L1-reg L1-SVM (TG)	1389.7	0.9687	452.6	19.50	0.040	4761.9	9665	0.40
LIBLINEAR	L2-reg LR	13470.0	0.9700		1.30				10.00
LIBLINEAR	L2-reg L2-SVM (dual)	1163.8	0.9712		0.38				1.00
LIBLINEAR	L2-reg L2-SVM (primal)	1183.4	0.9713		0.92				1.00
LIBLINEAR	L2-reg L1-SVM (dual)	1853.1	0.9703		0.49				2.00
LIBLINEAR	Multi (Crammer & Singer)	680.7	0.9705		0.48				0.40
LIBLINEAR	L1-reg L2-SVM	0.0	0.9677		1.45			1330	1.00
LIBLINEAR	L1-reg LR	0.0	0.9661		9.20			1141	4.00
LIBSVM	linear kernel	1483.8	0.9707		167.4				1.00

Algorithm: The description of the algorithm used for training. Several abbreviations are used: L1-regularized (L1-reg), L2-regularized (L2-reg), Logistic Regression (LR), L1-loss SVM (L1-SVM), L2-loss SVM (L2-SVM), Truncated Gradient (TG).
Obj: The final value of the objective function. For averaged perceptron, this value presents the number of violations.
Acc: The average accuracy of 5-fold cross validation.
# Iters: The number of iterations.
Time [s]: The average time, in seconds, elapsed for one run of 5-fold cross validation.
Speed [s/iter]: The averaged elapsed time, in seconds, for an iteration.
L2norm: The L2-norm of feature weights.
Active: The number of active (non-zero) features.
C: The coefficient for the regularization. For Classias and LIBLINEAR, we tried 0.01, 0.02, 0.04, 0.1, 0.2, 0.4, 1.0, 2.0, 4.0, 10.0, and chose the coefficient that yielded the best accuracy. We set C = 1.0 for LIBSVM.