Abstract | This paper addresses an important problem related to the use of induction systems in analyzing real world data. The problem is the quality and reliability of the rules generated by the systems. We discuss the significance of having a reliable and efficient rule quality measure. Such a measure can provide useful support in interpreting, ranking and applying the rules generated by an induction system. A number of rule quality and statistical measures are selected from the literature and their performance is evaluated on four sets of semiconductor data. The primary goal of this testing and evaluation has been to investigate the performance of these quality measures based on:(i) accuracy, (ii) coverage, (iii) positive error ratio, and (iv) negative error ratio of the rules elected by each measure. Moreover, the sensitivity of these quality measures to different data distributions is examined. In conclusion, we recommend Cohen's statistic as being the best quality measure examined for the domain. Finally, we explain some future work to be done in this area. |
---|