Archive

Posts Tagged ‘cascaded kernels’

Incremental complexity support vector machine

September 18, 2009 4 comments

One of the problems with using complex kernels with support vector machines is that they tend to produce classification boundaries that are odd, like the ones below.

svm_rbf_s1

svm_rbf_s10

(I generated them using a java SVM applet from here, whose reliability I cannot swear to, but have no reason to doubt.) Both SVM boundaries are with Gaussian RBF kernels: the first with \sigma = 1 and the second with \sigma = 10 on two different data sets.

Note the segments of the boundary to the east of the blue examples in the bottom figure, and those to the south and to the north-east of the blue examples in the top figure. They seem to violate intuition.

The reason for these anomalous boundaries is of course the large complexity of the function class induced by the RBF kernel with large \sigma, which gives the classifier a propensity to make subtle distinctions even in regions of  somewhat low example density.

A possible solution: using complex kernels only where they are needed

We propose to build a cascaded classifier, which we will call Incremental Complexity SVM (ICSVM), as follows.

We are given a sequence of kernels K_1, K_2,\ldots,K_m of increasing complexity. For example the sequence is of polynomial kernels, where K_i is the polynomial kernel with degree i.

The learning algorithm first learns an SVM classifier \psi_1 with kernel K_1, that classifies a reasonable portion of the examples with a large margin \lambda_1. This can be accomplished by setting the SVM cost parameter C to some low value.

Now all the examples outside the margin are thrown out, and another SVM classifier \psi_2 with kernel K_2 is learned, so that a reasonable portion of the remaining examples are classified with some large margin \lambda_2.

This procedure is continued until all the examples are classified outside the margin or the set of kernels is exhausted. The final classifier is a combination of all the classifiers \psi_i.

A test example can be classified as follows. We first apply classifier \psi_1 to the test example, and if it is classified with margin \geq \lambda_1, we output the assigned label and stop. If not we classify it with classifier \psi_2 in a similar fashion, and so on…

Such a scheme will avoid anomalous boundaries as those in the pictures above.

Discussion

1. With all the work that has been done on SVMs it is very likely that this idea or something very similar has been thought of, but I haven’t come across it.

2. There is some work on kernel learning where a convex combination of kernels is learned but I think that is a different idea.

3. One nice thing about such a classification scheme is that at run-time it will expend less computational resources on easier examples and more on more difficult ones.  As my thesis supervisor used to say, it is silly for most classifiers to insist on acting exactly the same way on both easy and hard cases.

4. The choices of the cost parameters C for the SVMs is critical for the accuracy of the final classifier. Is there a way of formulating the choice of the parameters in terms of minimizing some overall upper bound on the generalization error from statistical learning theory?

5. Is there a one-shot SVM formulation with the set of kernels that exactly or approximately acts like our classifier?

6. The weird island-effect and what Ken calls the lava-lamp problem in the boundaries above are not just artifacts of SVMs. We would expect a sparse kernel logistic regression to behave similarly. It would be interesting to do a similar incremental kernel thing with other kernel-based classifiers.

Advertisements