ML: Recommender Systems - Collaborative Filtering

Source: Coursera Machine Learning provided by Stanford University Andrew Ng - Machine Learning | Coursera


Recommender Systems - Collaborative Filtering / Low Rank Matrix Factorization

notations:

$n_u$: number of users

$n_p$: number of products

$y^{(i,j)}$: the degree of preference of the $j$-th user to the $i$-th product      $Y = [y^{(i,j)}]_{n_p n_u}$

$r^{(i,j)}$: 1 if the $j$-th user has given preference to the $i$-th product, 0 if not      $R = [r^{(i,j)}]_{n_p n_u}$

$n$: number of attributes of one product

$x^{(i)}$: attributes of the $i$-th product (to be learned), $x^{(i)} \in \mathbb{R}^n$

$$ X = \begin{bmatrix}- & (x^{(1)})^T & - \\- & (x^{(2)})^T & - \\ & \cdots &  \\- & (x^{(n_p)})^T & - \\\end{bmatrix}_{n_p n} $$

$\theta^{(j)}$: parameters for the $j$-th user, $\theta^{(j)} \in \mathbb{R}^n$, with the $i$-th entry denoting the weights for the $i$-th attribute, i.e. $y_{predict}^{(i,j)} = (\theta^{(j)})^T x^{(i)}$

$$ \Theta = \begin{bmatrix}- & (\theta^{(1)})^T & - \\- & (\theta^{(2)})^T & - \\ & \cdots &  \\- & (\theta^{(n_u)})^T & - \\\end{bmatrix}_{n_u n} $$

problem motivation:

Given the users' preferences $Y$ and products' attributes $X$, users' parameters $\Theta$ can be learned (by linear regression). Similarly, given the users' preferences $Y$ and users' parameters $\Theta$, products' attributes $X$ can be learned.

However, in most cases, $X$ and $\Theta$ are both unknown. To deal with this, the algorithm of collaborative filtering can be used - find a set of $X$ and $\Theta$ to minimize the cost function $J$.

algorithm process:

1. mean normalization for original preference data $Y$

2. random initialization for products' attributes $X$ and users' parameters $\Theta$

3. use gradient descent to learn $X$ and $\Theta$ to minimize the cost function $J$

4. use the learned $X$ and $\Theta$ to predict unrated products (i.e. $r^{(i,j)}=0$) or find similar products or users

cost function and gradients:

$$ J(x^{(1)},x^{(2)},\cdots,x^{(n_p)},\theta^{(1)},\theta^{(2)},\cdots,\theta^{(n_u)}) = \frac{1}{2} \sum_{(i,j):r^{(i,j)}=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_p} \sum_{k=1}^{n} (x_k^{i})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} (\theta_k^{(j)})^2 $$

J = sum(sum( ( (X * Theta' - Y) .* R ) .^ 2 )) / 2 + lambda / 2 * sum(sum(X .^ 2)) + lambda / 2 * sum(sum(Theta .^ 2));

$$ \frac{\partial }{\partial x_k^{(i)}}J = \sum_{j: r^{(i,j)}=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})\theta^{(j)}_k + \lambda x^{(i)}_k $$

grad_X = (X * Theta' - Y) .* R * Theta + lambda * X;

$$ \frac{\partial }{\partial \theta_k^{(j)}}J = \sum_{i: r^{(i,j)}=1}((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})x^{(i)}_k + \lambda \theta^{(j)}_k $$

grad_Theta = ((X * Theta' - Y) .* R)' * X + lambda * Theta;

mean normalization:

If a user has not given preference to any product, the first term of the cost function $J$ plays no role, because there is no $r^{(i,j)}=1$ for this user. To minimize the regularization term for $\theta$s, the algorithm will give all zeros. 

To avoid this, do mean normalization for $Y$ - subtract $\mu_i$ from $y^{(i,j)}$ if $r^{(i,j)}=1$, where $\mu_i = \frac{1}{number\ of\ users\ who\ rated\ this\ product} \sum_{j:r^{(i,j)}=1} y^{(i,j)}$.

And $y_{predict}^{(i,j)} = (\theta^{(j)})^T x^{(i)} + \mu_i$. Then a user with no preference will be predicted to give all products an average preference.

finding similar products or users:

To evaluate how similar two products or users are, measure how similar their attributes or parameters are by computing $\left\| x^{(a)} - x^{(b)}\right\|^2$ or $\left\| \theta^{(a)} - \theta^{(b)}\right\|^2$.

posted @ 2022-07-16 19:46  Maaaaax  阅读(42)  评论(0)    收藏  举报