1、Libsvm-2.6使用介绍,,Libsvm-2.6特点,Support multi-class classification Different SVM formulation Cross-validation for model selection Probability estimate Weighted SVM for unbalanced data Both C+ and Java sources Version 2.8 released on April fools day,2005,Libsvm-2.6程序结构,Kernel 类 Solver类:Generalized SMO和S
2、VMLight algorithm 解二次规划问题 采用one-against-one 解决多类分类,Format of training and testing data file, : : . +1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 -1 1:0.583333 2:-1 3:0.333333 4:-0.603774 5:1 6:-1 7:1 +1 1:0.166667 2:1 3:-1 4:-0.433962 5:-0.383562 6:-1 7:-1 -1 1:0.458333 2:1 3:1 4:-0.358491
3、5:-0.374429 6:-1 7:-1,Data scaling,Avoid attributes in greater numeric ranges dominate those in smaller number ranges. Usually scale each attribute to 0,1 or-1,+1.svmscale l -1 u 1 s range train.1train.1.scale svmscale r range test.1test.1.scale,Svmtrain,One-class:Here a hyperplane is placed such th
4、at it separates the dataset from the origin with maximal margin. The regularization parameter nu(0,1), is a user defined parameter indicating the fraction of the data that should be accepted by the description. nu-SVR: nu回归机。引入能够自动计算epsilon的参数nu。若记错误样本的个数为q ,则nu大于等于q/l,即nu是错误样本的个数所占总样本数的份额的上界;若记支持向量
5、的个数为p,则nu小于等于p/l,即nu是支持向量的个数所占总样本数的份额的下界。首先选择参数nu和C,然后求解最优化问题。 Shrinking: 优化求解过程中是否采用shrinking. 边界支持向量BSVs(aiC的SV)在迭代过程中ai不会变化,如果找到这些点,并把它们固定为C,可以减少QP的规模。 Probability estimate: 是否训练SVC和SVR获得概率输出 -wi 不平衡样本的加权参数,Output of training C-SVM,optimization finished, #iter = 219 nu = 0.431030 :nu-SVM is a som
6、ewhat equivalent form of C-SVM where C is replaced by nu. obj = -100.877286:optimal objective value of the dual problme.rho = 0.424632 :bias term of the decision function. nSV = 132, nBSV = 107: number of the bounded support vectors Total nSV = 132,Model file,svm_type c_svc kernel_type rbf gamma 0.0
7、769231 nr_class 2:number of classes. For regression and one-class model, this number is 2. total_sv 132 rho 0.424632 label 1 -1 nr_sv 64 68: number of support vector for each class. SV,Two tools for Model Selection,Easy.py: does everything automatically-from data scaling to parameter selection Grid.
8、py: uses grid search to find the best model parameters Grid.py的输出文件 -out: 搜索过程。每个参数取值及此时精度 -png: 搜索过程等高线图,Proposed procedure,Transform data to the format of Libsvm. Conduct simple scaling on the data. Consider the RBF kernel. Using the cross-validate to find the best model parameters. Using the best
9、 parameters to train the whole training set. Test,Experiments,Original sets with default parametersAccuracy=9.7561%Scaled sets with default parametersAccuracy=87.8049% Scaled sets with parameter selection Accuracy=95.123%Using an automatic scriptAccuracy=95.122%,Remark,Recommend Python 2.3 Recommend Gnuplot version 3.7.3.Vesion 3.7.1 has a bug.,References,A practical guide to support vector machines classification LIBSVM: a Library for Support Vector Machines FAQ and Readme in Libsvm-2.6 http:/www.csie.ntu.edu.tw/cjlin/,