Information gain and entropy
Suppose X can have one of m values… V1, V2, … Vm
P(X=V1) = p1, .... , P(X=Vm) = pm
What’s the smallest possible number of bits, on average, per symbol, needed to transmit a stream of symbols drawn from X’s distribution?
It’s

“High Entropy” means X is from a uniform (boring) distribution
“Low Entropy” means X is from varied (peaks and valleys) distribution

Specific Conditional Entropy H(Y|X=v)
The entropy of Y among only those records(samples) in which X has value v
Conditional Entropy H(Y|X)
The average specific conditional entropy of Y


Information Gain
IG(Y|X) = I must transmit Y.
How many bits on average would it save me if both ends of the line knew X?
IG(Y|X) = H(Y) - H(Y | X)
Example:
• H(Y) = 1
• H(Y|X) = 0.5
• Thus IG(Y|X) = 1 – 0.5 = 0.5
posted on 2012-05-08 16:03 Never more 阅读(489) 评论(0) 收藏 举报
浙公网安备 33010602011771号