李宏毅老师机器学习第二课classification

1.Classification

classification: x->function->class n

how to do classification?

train data for classification:

(x¹,y^{^1}) (x²,y^{^2}) (x³,y^{^3}) (x⁴,y^{^4})

ideal alternatives:

*function (model):

　　x->g(x)->g(x)>0------->class 1

　　　　　->g(x)<0-------->class 2

*loss function

L(f)=∑δ(f(xⁿ)!=y^{^n}) the number of times f get incorrect results on training data

*find the best function

example:perceptron,svm

2.Gaussian distribution

*Gaussian distribution fuction f_{u,_Σ}(x)=(2π)^-1/2Σ^-1/2exp(-1/2(x-u)^TΣ^-1(x-u))

input vector x output:probability of sampling x

the shape of the function determines by vector mean u and covariance matrix Σ

*maxinum likeihood

the Gaussian with any mean u and covariance matrix Σ can generate these point but with different likehood

likehood of a Gaussian with mean u and covariance matrix Σ = the probability of the Gaussion sample x¹,x²,x³.....xⁿ

loss function L(u,Σ)=f_u,Σ(x¹)f_u,Σ(x²)f_u,Σ(x³)_.......f_u,Σ(x⁴)

find best parameters u*,Σ*=argmaxL(u,Σ) u*=1/n∑x_i Σ*=1/n∑(x_i-u*)(x_i-u*)^T

*classification with Gaussion distribution

Naive Bayes P(c₁|x)=P(x|c1)P(c1)/P(x|c2)P(c2)+P(x|c1)P(c1)

P(x|c1):f_u^c1,Σ^c1(x) P(x|c2):f_u^c2,Σ^c2(x)

*Modifying model

use different uc1,uc2,but use the same Σc1, Σc2,due to less parameters, Σ parameters number proportional to (x parameter)²

Modifying ∑_new=(m/m+n)∑_c1+(n/m+n)∑_c2

*model flaw

use Naive Bayes classifier,all the dimensions are independent

*posterior probability:

z=ln(P(x|c₁)P(c₁)/P(x|c₂)P(c₂))

*mathematical derivation

z=wx+b

3.Logistic Regression

P_w,b(c₁|x)=σ(z) z=ln(P(x|c₁)P(c₁)/P(x|c₂)P(c₂))=wx+b σ(z)=1/1+exp(-z)

*step1 function set: f_w,b(x)=P_w,b(c1|x)

*step 2 loss function of Logistic Regression

train data x x¹ x² x³ x⁴.....xⁿ x¹ x² x³ x⁴.....xⁿ

y^ c₁ c₂ c₁ c₁...... c_{2 ——> 1 0 1 1 ......0}

Assume the data is generated based on f_w,b(x)=P_w,b(c₁|x)

L(w,b)=f_w,b(x¹)(1-f_w,b(x²))f_w,b(x³)f_w,b(x⁴).....(1-f_w,b(xⁿ))

L(w,b)=Πf_w,b(xⁱ) w*,b*=argmaxL(w,b)=argmin(-lnL(w,b))

-lnL(w,b)=-lnf_w,b(x¹)-ln(1-f_w,b(x²))-lnf_w,b(x³)-lnf_w,b(x⁴)........-ln(1-f_w,b(xⁿ))

=∑-(y^lnf_w,b(xⁱ)+(1-y^)(ln(1-f_w,b(xⁱ)))) cross entropy between two Bernoulli distribution

*step3find the best function

δlnf_w,b(xⁿ)/δw_i=(1-σ(z))x_i

δln(1-f_w,b(xⁿ))/δw_i=-σ(z)

δlnL(w,b)/δw_i=∑-(y^{^n}-f_w,b(xⁿ))x_iⁿ

4.Multi-class classification

*softmax

c1:w¹,b₁ z₁=w¹+b₁ ——> e^z₁/∑e^z_j

c2:w²,b₂z₂=w²+b₂ ——>e^z₂/∑e^z_j

c3:w³,b₃z₃=w³+b₃ ——>e^z₂/∑e^z_j

softmax z_i——>e^z_i/∑e^z_i

probability of softmax: 0<y_i<1 ∑y_i=1

——>z₁ ——>softmax——>y₁loss fuction y^{^}₁=[1 0 0]^T

x ——>z₂ ——>softmax——>y₂ <————> y^{^}₂=[1 0 0]^T

——>z₃——>softmax——>y₃ -∑y^{^}_ilny_iy^{^}₃=[1 0 0]^T

*once Logistic Regression can transformat feature

*cascading logistic regression models

x₁ ——>z₁——>softmax——>x₁^'

——>z₃——>softmax——>y

x₂ ——>z₂——>softmax——>x₂^'

feature transformat Neual classification

posted on 2020-11-05 20:02 真正的小明被占用了阅读(202) 评论(0) 收藏举报

刷新页面返回顶部

真正的小明被占用了

李宏毅老师机器学习第二课classification

公告

导航