随笔分类 -  Machine Learning

摘要:sigular value decomposition$A\epsilon R^{m*n}$, A can be decomposed into $A=UDV^T$, where $U\epsilon R^{m*n}, D\epsilon R^{n*n}, V\epsilon R^{n*n}$$D=diagonal[\sigma_i], \sigma_i=sigular value of A$; $U$'s columns are Eigenvectors of $AA^T$; $V$'s columns are Eigenvectors of $A^TA$compute us 阅读全文
posted @ 2012-10-07 11:31 sidereal 阅读(178) 评论(0) 推荐(0)
摘要:stochastic gradient descent is to minimize cost function:$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta)$while gradient ascent is to maximize likelihood function:$\theta_j := \theta_j + \alpha \frac{\partial}{\partial \theta_j}l(\theta)$ 阅读全文
posted @ 2012-09-29 10:55 sidereal 阅读(284) 评论(0) 推荐(0)
摘要:Bernoulli distribution:$y \epsilon \{0,1\}$, $\phi=p(y=1)$$p(y;\phi)=\phi ^y(1-\phi)^{1-y}$the mean of the Bernoulli is given by $\phi$ 阅读全文
posted @ 2012-09-29 10:15 sidereal 阅读(93) 评论(0) 推荐(0)
摘要:x,y $\epsilon R^n$$x^Ty \epsilon R =\sum_{i=1}^n{x_iy_i}$, 注意将此式扩展。X is a matrix of m*n.$X^T X$其第j个对角线元素为$\sum_i X_{ij}^2 $$\sum_i \sum_j X_{ij}^2 =\sum_j (X^T X)_{jj} = tr X^T X$$x^{(i)}$ is a vector of n*1; $\vec y \epsilon R^m$$X^T=[x^{(1)} x^{(2)} ... x^{(m)}]$ m is the number of training set, n 阅读全文
posted @ 2012-09-28 22:59 sidereal 阅读(101) 评论(0) 推荐(0)