摘要:1. Guide More generally, consider a classification or regression problem where wewould like to predict the value of some random variable y as a function ofx. To derive a GLM for this problem, we will make the following threeassumptions about the conditional distribution of y given x and about ourmo.
阅读全文
摘要:1. Guide So far, we’ve seen a regression example, and a classification example. In theregression example, we had y|x; θ ∼ N(μ, σ2), and in the classification one,y|x; θ ∼ Bernoulli(φ), where for some appropriate definitions of μand φasfunctions of x and θ.(μ=θTx, φ=g(θTx))2. The exponential family .
阅读全文
摘要:1. Guide Let'snow talk about a different algorithm for minimizing l(θ).2.Newton method To get us started, lets consider Newton’s method for finding a zero of afunction. Specifically, suppose we have some function f : R → R, and wewish to find a value of θ so that f(θ) = 0. Here, θ ∈ R is a real
阅读全文
摘要:1. Guide Consider modifying the logistic regression method to “force” it tooutput values that are either 0 or 1 or exactly. To do so, it seems natural tochange the definition of g to be the threshold function: If we then let h(x) = g(θT x) as before but using this modified definition ofg,...
阅读全文
摘要:1. Guide Classification:This is just like the regressionproblem, except that the values y we now want to predict take on onlya small number of discrete values. For now, we will focus on the binaryclassification problem in which y can take on only two values, 0 and 1.0 is also called the negative c..
阅读全文
摘要:1. Guide The leftmost figure shows the result of fitting ay = θ0 + θ1x1 to a dataset.We see that the datadoesn’t really lie on straight line, and so the fit is not very good. This is called underfitting.---there is only one feature, it's too few. So, we add an extra feature x12, and fity =...
阅读全文
摘要:1. Guide When faced with a regression problem, why might linear regression, andspecifically why might the least-squares cost function J, be a reasonablechoice? In this section, we will give a set of probabilistic assumptions, underwhich least-squares regression is derived as a very natural algorith.
阅读全文
摘要:1. Guide Gradient descent gives one way of minimizing J. Lets discuss a second wayof doing so, this time performing the minimization explicitly and withoutresorting to an iterative algorithm. In this method, we will minimize J byexplicitly taking its derivatives with respect to the θj ’s, and setti.
阅读全文
摘要:1. Guide Here, the x’s are two-dimensional vectors in R2. For instance, x(i)1 is theliving area of the i-th house in the training set, and x(i)2 is its number ofbedrooms. To perform supervised learning, wo must decide to choose h. As an initual choice, we decide to approximate y as a l...
阅读全文
摘要:1. Guide Suppose we have a dataset giving the living areas and prices of 47 housesfrom Portland, Oregon: We can plot this data: ...
阅读全文