Author | Message |
---|---|
victormejia
Posts: 40
|
Posted 22:10 Feb 21, 2011 |
Does Entropy always vary between [0,1]? Also, can information gain be negative? |
cysun
Posts: 2935
|
Posted 07:32 Feb 22, 2011 |
No. Min entropy is when all the records in D belong to the same class, in which case Entropy(D) = 0. Max entropy is when each record in D belongs to a different class, in which case Entropy(D) = -1/N * log_2(1/N) * N = log_2(N), which could be larger than 1.
Yes. Consider a dataset D with 1000 c1 records, and 1 c2 record. Entroy(D) should be pretty close to 0. Now split D into D1={999 c1 records}, and D2={1 c1 record and 1 c2 record}, and you'll have a negative info gain. Last edited by cysun at
07:32 Feb 22, 2011.
|
cysun
Posts: 2935
|
Posted 09:40 Feb 22, 2011 |
Actually after a little calculation it shows that the info gain in this case is still positive, because Entropy(D2) needs to be weighted by its size, i.e. 2/1001. So maybe info gain cannot be negative. I need to think this over and see if there's a way to prove it. |
cysun
Posts: 2935
|
Posted 12:58 Feb 22, 2011 |
Kanthi found the proof by some CMU students for the 2-class case (please see attachment). |
victormejia
Posts: 40
|
Posted 10:37 Feb 23, 2011 |
Thanks! I was getting negative info gain, only to find a bug in my code. Now I only get positive info gain. |