"A Feature Mining Based Approach for the Classification of Text Documents Into Disjoint Classes"
Information Processing and Management,
Vol. 38, No. 4, (April 17, 2002, issue), pp. 583-604, 2002.
Salvador Nieto Sanchez, Evangelos Triantaphyllou,
and Donald Kraft
Abstract:
This paper studies the problem of how to automatically classify
text documents into two mutually exhaustive and exclusive
classes. A typical application of this new classification
problem is in the declassification of previously considered
secret documents. The importance of this problem is expressed by
the severe consequences of classifying a document in the wrong
category. This problem was studied by using the Vector Space
Model (VSM) and a new data mining approach called the One Clause
At a Time (OCAT) algorithm. The results of computational
experiments on a sample of 2,897 text documents from the TIPSTER
collection indicate that the latter algorithm has many advantages
over the first algorithm in solving this new classification
problem.
Key Words:
Document Classification, Document Indexing, Vector Space Model,
Data Mining, OCAT Algorithm, Machine Learning.