"A Feature Mining Based Approach for the Classification of Text Documents Into Disjoint Classes"
Information Processing and Management,
Vol. 38, No. 4, (April 17, 2002, issue), pp. 583-604, 2002.
Salvador Nieto Sanchez, Evangelos Triantaphyllou,
and Donald Kraft
This paper studies the problem of how to automatically classify
text documents into two mutually exhaustive and exclusive
classes. A typical application of this new classification
problem is in the declassification of previously considered
secret documents. The importance of this problem is expressed by
the severe consequences of classifying a document in the wrong
category. This problem was studied by using the Vector Space
Model (VSM) and a new data mining approach called the One Clause
At a Time (OCAT) algorithm. The results of computational
experiments on a sample of 2,897 text documents from the TIPSTER
collection indicate that the latter algorithm has many advantages
over the first algorithm in solving this new classification
Document Classification, Document Indexing, Vector Space Model,
Data Mining, OCAT Algorithm, Machine Learning.