"On Identifying Critical Nuggets Of Information During Classification Tasks,"

IEEE Trans. On Knowledge and Data Engineering, Vol. xx, No. x, pp. xxx-xxx, xxxx. Early print on May 23, 2012.

Sathiaraj, D., and E. Triantaphyllou ^*

*: E. Triantaphyllou is the Corresponding Author

Abstract:
In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the search space for finding critical nuggets, and isolates and validates critical nuggets from some real world data sets. It seems that only a few subsets may qualify to be critical nuggets, underlying the importance of finding them. The proposed methodology can detect them. This work also identifies certain properties of critical nuggets and provides experimental validation of the properties. Experimental results also helped validate that critical nuggets can assist in improving classification accuracies in real world data sets.

Key Words:
Data Mining, Classification, Critical Nuggets, Outliers, Classification Accuracy, Class Boundary, Duality.

Download this paper as a PDF file.

(Size = 0.70 MB)

Visit Dr. Triantaphyllou's homepage.