YingyiJeniffer

LOGISTIC REGRESSION

In logistic regression we learn a family of functions from to the interval . However, logistic regression is used for classification tasks: We can interpret as the probability that the label of is . The hypothesis class associated with logistic regression is the composition of a sigmoid function over the class of linear functions. In particular, the sigmoid function used in logistic regression is the logistic function, defined as

(6)

The hypothesis class is therefore (where for simplicity we are using homogenous linear functions):

(7)

Note that when is very large then is close to , whereas if is very small then is close to . Recall that the prediction of the halfspace corresponding to a vector is . Therefore, the predictions of the halfspace hypothesis and the logistic hypothesis are very similar whenever is large. However, when is close to we have that . Intuitively, the logistic hypothesis is not sure about the value of the label so it guesses that the label is with probability slightly larger than . In contrast, the halfspace hypothesis always outputs a deterministic prediction of either or , even if is very close to .

Next, we need to specify a loss function. That is, we should define how bad it is to predict some given that the true label is . Clearly, we would like that would be large if and that (i.e., the probability of predicting ) would be large if . Note that

(8)

Therefore, any reasonable loss function would increase monotonically with , or equivalently, would increase monotonically with . The logistic loss function used in logistic regression penalizes based on the log of (recall that log is a monotonic function). That is,

(9)

Therefore, given a training set , the ERM problem associated with logistic regression is

(10)

The advantage of the logistic loss function is that it is a convex function with respect to ; hence the ERM problem can be solved efficiently using standard methods. We will study how to learn with convex functions, and in particular specify a simple algorithm for minimizing convex functions, in later chapters.


posted on 2016-08-27 17:45  YingyiJeniffer  阅读(160)  评论(0编辑  收藏  举报

导航