09 2012 档案
摘要:stochastic gradient descent is to minimize cost function:$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta)$while gradient ascent is to maximize likelihood function:$\theta_j := \theta_j + \alpha \frac{\partial}{\partial \theta_j}l(\theta)$
阅读全文
摘要:Bernoulli distribution:$y \epsilon \{0,1\}$, $\phi=p(y=1)$$p(y;\phi)=\phi ^y(1-\phi)^{1-y}$the mean of the Bernoulli is given by $\phi$
阅读全文
摘要:x,y $\epsilon R^n$$x^Ty \epsilon R =\sum_{i=1}^n{x_iy_i}$, 注意将此式扩展。X is a matrix of m*n.$X^T X$其第j个对角线元素为$\sum_i X_{ij}^2 $$\sum_i \sum_j X_{ij}^2 =\sum_j (X^T X)_{jj} = tr X^T X$$x^{(i)}$ is a vector of n*1; $\vec y \epsilon R^m$$X^T=[x^{(1)} x^{(2)} ... x^{(m)}]$ m is the number of training set, n
阅读全文

浙公网安备 33010602011771号