Notes - Coursera MachineLearning by Andrew NG - Week1

Notes - Coursera MachineLearning by Andrew NG - Week1

Week1-2014/03/07-hphp
欢迎赐教、讨论、转载，转载请注明原文地址~

Machine Learning Introduction

GeneralSpeaking - lecture 1,2

Many Application

Amazaon, Netfilx recommend system
Arthur Samuel, made a machine learn how to check. checkers , made a program play chess with itself , and know better of how to win .

Popular

and well .. currently large demands of talents. as one of the top 12 computer skill

Different types of learning algorithms

famous partial methods : supervised learning , unsupervised learning,

Course Main Goal

how to develop the best machine learning systems to get better performance.

Supervise learning - lecture 3

how to pick a model ? straight line or polynomial ?

Regression: Predict continous valued output

Classification problem ,

tumor size Vs malignant

Tumor size , age , Vs malignant or benign ,

could use more features to predict ( or regression ) : uniformity of cell shape , cell size ......

Statistically
compromized : 妥协的

Unsupervised Learning - lecture 4

clustering problem

google news , with one news , several diff urls are laid.

astronomical data analysis

Cocktail party problem

seperate voice source

use octave , could solve the problem quickly and briefly

Linear Regression with one variable

Model representation - lecture 5

Training set : m : number of training examples , x : input , y : output variable ,
y = h(x) , h : hypothesis
How do we represent H?

htheta(x) = theta0 + theta1(x)

univariant -- linear regression (a fancy name)

Cost Function - lecture 6

htheta(x) = theta0 + theta1(x)

how to choose two theta s

choose thetas so that h(x) is close to given examples.

                                       m
minimize =   1/2m Sum ( h(xi) - yi ) 2
theta0, theta 1       1

squared error function -- the most common coss function in regression .

Cost function intuition I - lecture 7

get better intuition what cost function is doing , and why we want to use it.

recap : focus , say briefly

simplified : theta 0 = 0
h(x) = theta1 * x
J(theta1) = 1/2m * Sum[i:1-m](theta1xi - yi) 2
when theta1 = 1 , J ( theta1 ) = 0
theta1 = 0.5 , J ( theta1 ) = 0.5 , J ( 0 ) = 14/6

Cost function intuition II - lecture 8

basic situation

contour plots : outline
theta0, theta1 != (0, x) or (x, 0) , with the cost function act as a 3D bowl , below

using : contour plots ( or contour figures ) .

? using such data , and such model , we could see that there a circle of "similar" point pairs of ( theta0, theta1) ,

on which they act the same, so , can we tell the difference of different pairs ?

Gradient descent algorithm - lecture 9
it is used all over machine learning
could minimizing arbitrary functions besides cost function
Basic Thoughts

- surface ->
EG: start at some point on the surface,
VS
==> start with diff starts , end in diff ends.[it is a property of Gradient descent algorithm ]
detailed description
alpha : learning rate [ if alpha is large , aggressive ]
calculus and derivative
keep in mind : simultaneously, at the same time
Gradient descent intuition - lecture 10
partial derivative vs derivative , different in mathematics , seems same in MachineLearning
tangent , slope of line
converge and diverge : using different alpha

when we get theta1 to a local minimum location , it will stop changing.
gradient descent algorithm could converge to a local minimum
if alpha is fixed , smaller and smaller steps , and no need to decrease alpha

Gradient descent for Linear Regression - lecture 11

use the basic algorithm to get minimum theta0, theta1.
cost function of linear function will always be a bowl-shaped function ==> convex function
above algorithm also called "Batch" Gradient Descent Algorithm
"Batch" : each step use all of the training data.
What's next - lecture 12
Extensions of the Gradient descent / Linear Regression ideas
how to choose alpha ,
how act using more features ?
harder to plot features.
Use Linear Algebra

matrix/ vector addition , substraction, multiplication
matrix inverse, matrics transpose
Matrices and Vectors
Dimension of matrix ： row * column , A_ij = matrix A i^th row , j^th colum
vector : only one column matrix , y_i = the i^th element ; as below 1-index as usual inference

Additions and Substractions
simple addition and substraction
scalar Multiplication
combination of oprands
Multiplication of matrix and vector

Details
mn ---> n 1 ====> m * 1
the i^th row multiply with corresponding column , then add whole up
Linear Regression
-- >
prediction = DataMatrix * Parameters
dfdsfds , the matrix computation will get more computation effecient than the simply loop in previous way.
Matrix - Matrix multiplication
An Example
Details

as the A multiply with B's each vector, and these new vectors combine to a new matrix C

the house affair

a good way to see hypothesis with different theta0, theta1s
great program languages ( top 10 popular language [ ?? How do they tell ] ) ---- have lots of linear algebra libraries
Matrix multiplication properties.
commutative property
ab = ba
does not match in matrix-multiplication ==> A * B != B * A
associative property
a * b * c = a * ( b * c )
A * B * C = A * ( B * C )
proven
Identity Matrix
Denoted I ( I _nn )
For any matrix A , A I = I * A = A
单位矩阵
Matrix inverse and transpose
Inverse
start with Identity : 1
3 * ( 1/ 3 ) or 3 * 3 -1 = 1
0 -1 ---> undefined
SO if A is a mm matrix and , and have an inverse A ^-1 , then -- > A A^-1 = I _(m*m)

How to compute inverse ?
How to tell if a matrix have an inverse ??
such that like [0 0 , 0 0 ] have no inverse
but in machine learning algorithms , we will get to deal with this situation later .
matrix transpose
eg:
terminology : Transpose
intuition

来自为知笔记(Wiz)

posted @ 2014-03-07 21:54 hphp 阅读(1030) 评论(0) 收藏举报

刷新页面返回顶部