$f: R^{m*n} \rightarrow R$, $A \epsilon R^{m*n}$$\nabla_Af(A) \epsilon R^{m*n} = [\frac{\partial f(A)}{\partial A_{ij}}]$$\nabla_x b^Tx = b$ Read More
posted @ 2012-10-01 16:21 sidereal Views(99) Comments(0) Diggs(0)
stochastic gradient descent is to minimize cost function:$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta)$while gradient ascent is to maximize likelihood function:$\theta_j := \theta_j + \alpha \frac{\partial}{\partial \theta_j}l(\theta)$ Read More
posted @ 2012-09-29 10:55 sidereal Views(283) Comments(0) Diggs(0)
Bernoulli distribution:$y \epsilon \{0,1\}$, $\phi=p(y=1)$$p(y;\phi)=\phi ^y(1-\phi)^{1-y}$the mean of the Bernoulli is given by $\phi$ Read More
posted @ 2012-09-29 10:15 sidereal Views(93) Comments(0) Diggs(0)
x,y $\epsilon R^n$$x^Ty \epsilon R =\sum_{i=1}^n{x_iy_i}$, 注意将此式扩展。X is a matrix of m*n.$X^T X$其第j个对角线元素为$\sum_i X_{ij}^2 $$\sum_i \sum_j X_{ij}^2 =\sum_j (X^T X)_{jj} = tr X^T X$$x^{(i)}$ is a vector of n*1; $\vec y \epsilon R^m$$X^T=[x^{(1)} x^{(2)} ... x^{(m)}]$ m is the number of training set, n Read More
posted @ 2012-09-28 22:59 sidereal Views(101) Comments(0) Diggs(0)