Author | Message |
---|---|
victormejia
Posts: 40
|
Posted 08:55 Feb 11, 2011 |
Is it safe to follow the pseudocode that is in the book on pg. 93? What specific algorithm is that? Also, I've heard of an algorithm called C4.5 to generate decision trees. Can we use that also to generate the decision tree? |
cysun
Posts: 2935
|
Posted 09:48 Feb 11, 2011 |
All the decision tree algorithms follow the general steps outlined in the pseudo code on page 293. Specific decision tree implementations differ mainly in split quality measures (e.g. entropy vs. gini index), ways to split (e.g. binary only or not), and optimizations (e.g. pre/post pruning). Feel free to pick a specific decision tree like C4.5 to implement - you are not supposed to use existing source code of course. You can find a list of these algorithms in the Bibliographic Notes on page 378. The two most well-known algorithms are C4.5 and CART. Both of them were on the Top Ten Data Mining Algorithms voted by the participants of the IEEE International Conference on Data Mining in 2006. |
p0941
Posts: 95
|
Posted 12:41 Feb 16, 2011 |
Is this program (homework 3) supposed to run any data set, not just hard coded for the car data? Thanks |
cysun
Posts: 2935
|
Posted 14:45 Feb 16, 2011 |
Your program only needs to work on the car dataset. Implementing a decision tree that works on any dataset is quite a bit more difficult because you'll need to handle numerical attributes. As for hard-coding dataset specific information in your program, you can certainly hard-code the attributes and their possible values (i.e. car.c45-names), but obviously you should not hard-code the records (i.e. car.data) or any information derived from the records in your program. |