"On Identifying Critical Nuggets Of Information During Classification Tasks,"
IEEE Trans. On Knowledge and Data Engineering,
Vol. xx, No. x, pp. xxx-xxx, xxxx. Early print on May 23, 2012.
Sathiaraj, D., and
E. Triantaphyllou *
*: E. Triantaphyllou is the Corresponding Author
In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific
important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and
improving classification results by reducing false positive and false negative errors. This work introduces the idea of critical nuggets,
proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the search space for finding
critical nuggets, and isolates and validates critical nuggets from some real world data sets. It seems that only a few subsets may qualify
to be critical nuggets, underlying the importance of finding them. The proposed methodology can detect them. This work also identifies
certain properties of critical nuggets and provides experimental validation of the properties. Experimental results also helped validate
that critical nuggets can assist in improving classification accuracies in real world data sets.
Data Mining, Classification, Critical Nuggets, Outliers, Classification Accuracy, Class Boundary, Duality.