Information gain and entropy

Suppose X can have one of m values… V1, V2,  … Vm

P(X=V1) = p1, .... , P(X=Vm) = pm

What’s the smallest possible number of bits, on average, per symbol, needed to transmit a stream of symbols drawn from X’s distribution?

It’s

“High Entropy” means X is from a uniform (boring) distribution
“Low Entropy” means X is from varied (peaks and valleys) distribution

 

 

 

Specific Conditional Entropy H(Y|X=v)

The entropy of Y among only those records(samples) in which X has value v

Conditional Entropy H(Y|X)

The average specific conditional entropy of Y

 

Information Gain

IG(Y|X) = I must transmit Y.
How many bits on average would it save me if both ends of the line knew X?

IG(Y|X) = H(Y) - H(Y | X)

Example:
• H(Y) = 1
• H(Y|X) = 0.5
• Thus IG(Y|X) = 1 – 0.5 = 0.5

 

 

 

posted on 2012-05-08 16:03  Never more  阅读(489)  评论(0)    收藏  举报

导航