CS229:Support Vector Machine
Support Vector Machine
To deal with a classification problem with a non-linear decision boundary, we may process the data and get some new features, and then applying logistic regression to find a high-dimensional "linear" boundary, but choosing the new features we want is painful.
So we put forward SVM algorithm to find a way to mapping the original features to higher dimensional ones, and deal them linearly, like logistic regression does.
It's worth noting that SVM is kind of turn-key algorithm without too many parameters.
Margins
In logistic regression, we would predict "1" or "0" based on whether the hypothesis is greater than 0.5. At the same time, it is reasonable to think the closer \(h(\theta^{T}x)\) is to "1", the higher our degree of confidence that the label is "1".
So a higher confidence means a more stable model, and to maximize the sum of "confidence" indicates that we find a good fit to the training data.
When dealing with the point far from the decision boundary, we are more confident to make the prediction. So our goal is to make the total distance as far as possible.
Notation
We use "-1" and "1" to denote the class labels, and write the classifier as \(h_{w, b}(x)=g\left(w^{T} x+b\right)\), where b takes the role of what was previously \(\theta_{0}\), and \(\omega\) takes the role of \(\left[\theta_{1} \ldots \theta_{n}\right]^{T}\).
Functional and geometric margins
Given a training example \((x^{(i)},y^{(i)})\), we define the functional margin of \((\omega, b)\) with respect to the training example
- \(\hat{\gamma}^{(i)} > 0\) indicates that the prediction is right.
- The bigger \(\hat{\gamma}^{(i)}\) is, the more confident prediction is. That's to say, the smallest case reflects the fitting degree of the model.
So, given a training set, its functional margin is defined as

浙公网安备 33010602011771号