随笔分类 -  Machine Learning

Just for one papaer
Constructing GLMs
摘要:1. Guide More generally, consider a classification or regression problem where wewould like to predict the value of some random variable y as a function ofx. To derive a GLM for this problem, we will make the following threeassumptions about the conditional distribution of y given x and about ourmo. 阅读全文

posted @ 2013-04-14 12:29 BigPalm 阅读(248) 评论(0) 推荐(0)

Generalized Linear Models
摘要:1. Guide So far, we’ve seen a regression example, and a classification example. In theregression example, we had y|x; θ ∼ N(μ, σ2), and in the classification one,y|x; θ ∼ Bernoulli(φ), where for some appropriate definitions of μand φasfunctions of x and θ.(μ=θTx, φ=g(θTx))2. The exponential family . 阅读全文

posted @ 2013-04-14 10:25 BigPalm 阅读(213) 评论(0) 推荐(0)

Newton method
摘要:1. Guide Let'snow talk about a different algorithm for minimizing l(θ).2.Newton method To get us started, lets consider Newton’s method for finding a zero of afunction. Specifically, suppose we have some function f : R → R, and wewish to find a value of θ so that f(θ) = 0. Here, θ ∈ R is a real 阅读全文

posted @ 2013-04-13 12:14 BigPalm 阅读(289) 评论(0) 推荐(0)

The perceptron learning algorithm
摘要:1. Guide Consider modifying the logistic regression method to “force” it tooutput values that are either 0 or 1 or exactly. To do so, it seems natural tochange the definition of g to be the threshold function: If we then let h(x) = g(θT x) as before but using this modified definition ofg,... 阅读全文

posted @ 2013-04-13 11:46 BigPalm 阅读(290) 评论(0) 推荐(0)

Classification and logistic regression
摘要:1. Guide Classification:This is just like the regressionproblem, except that the values y we now want to predict take on onlya small number of discrete values. For now, we will focus on the binaryclassification problem in which y can take on only two values, 0 and 1.0 is also called the negative c.. 阅读全文

posted @ 2013-04-13 11:32 BigPalm 阅读(388) 评论(0) 推荐(0)

Locally weighted linear regression
摘要:1. Guide The leftmost figure shows the result of fitting ay = θ0 + θ1x1 to a dataset.We see that the datadoesn’t really lie on straight line, and so the fit is not very good. This is called underfitting.---there is only one feature, it's too few. So, we add an extra feature x12, and fity =... 阅读全文

posted @ 2013-04-13 10:39 BigPalm 阅读(543) 评论(0) 推荐(0)

Probabilisic interpretaion
摘要:1. Guide When faced with a regression problem, why might linear regression, andspecifically why might the least-squares cost function J, be a reasonablechoice? In this section, we will give a set of probabilistic assumptions, underwhich least-squares regression is derived as a very natural algorith. 阅读全文

posted @ 2013-04-13 09:44 BigPalm 阅读(193) 评论(0) 推荐(0)

The normal equations
摘要:1. Guide Gradient descent gives one way of minimizing J. Lets discuss a second wayof doing so, this time performing the minimization explicitly and withoutresorting to an iterative algorithm. In this method, we will minimize J byexplicitly taking its derivatives with respect to the θj ’s, and setti. 阅读全文

posted @ 2013-04-12 11:06 BigPalm 阅读(490) 评论(0) 推荐(0)

Linear Regression
摘要:1. Guide Here, the x’s are two-dimensional vectors in R2. For instance, x(i)1 is theliving area of the i-th house in the training set, and x(i)2 is its number ofbedrooms. To perform supervised learning, wo must decide to choose h. As an initual choice, we decide to approximate y as a l... 阅读全文

posted @ 2013-04-12 09:46 BigPalm 阅读(203) 评论(0) 推荐(0)

Supervised learning
摘要:1. Guide Suppose we have a dataset giving the living areas and prices of 47 housesfrom Portland, Oregon: We can plot this data: ... 阅读全文

posted @ 2013-04-12 09:36 BigPalm 阅读(180) 评论(0) 推荐(0)

导航