"A Feature Mining Based Approach for the Classification of Text Documents Into Disjoint Classes"

Information Processing and Management, Vol. 38, No. 4, (April 17, 2002, issue), pp. 583-604, 2002.

Salvador Nieto Sanchez, Evangelos Triantaphyllou, and Donald Kraft

Abstract:
This paper studies the problem of how to automatically classify text documents into two mutually exhaustive and exclusive classes. A typical application of this new classification problem is in the declassification of previously considered secret documents. The importance of this problem is expressed by the severe consequences of classifying a document in the wrong category. This problem was studied by using the Vector Space Model (VSM) and a new data mining approach called the One Clause At a Time (OCAT) algorithm. The results of computational experiments on a sample of 2,897 text documents from the TIPSTER collection indicate that the latter algorithm has many advantages over the first algorithm in solving this new classification problem.

Key Words:
Document Classification, Document Indexing, Vector Space Model, Data Mining, OCAT Algorithm, Machine Learning.

Download this paper as a PDF file.

(size = 1,028 KB; Article in Press version)

Visit Dr. Triantaphyllou's homepage.