Comps. and IE Journal Cover Data Mining and Knowledge Discovery in Industrial Engineering

A Special Issue of the Journal
Computers and Industrial Engineering
Vol. 43, No. 4, September 2002


Evangelos Triantaphyllou, T. Warren Liao, and S. Sitharama Iyengar
Guest Editors




TABLE OF CONTENTS

  1. Feature Extraction Using Rough Set Theory and Genetic Algorithms - An Application For The Simplification Of Product Quality Evaluation
    by Lian-Yin Zhai, Li-Pheng Khoo, and Sai-Cheong Fok (See Abstract)


  2. A Scalable, Incremental Learning Algorithm for Classification Problems
    by Nong Ye and Xiangyang Li (See Abstract)


  3. A Fuzzy Curved Search Algorithm for Neural Network Learning
    by Peitsang Wu (See Abstract)


  4. Multiscale Approximation MEthods (MAME) to Locate Embedded Consecutive Subsequences - Its Applications in Statistical data Mining and Spatial Statistics
    by Xiaoming Huo (See Abstract)


  5. Simple Association Rules (SAR) and the SAR-Based Rule Discovery
    by Guoqing Chen, Qiang Wei, De Liu, and Geert Wets (See Abstract)


  6. Mining Fuzzy Association Rules for Classification Problems
    by Y.-C. Hu, R.-S., and G.-H. Tzeng (See Abstract)


  7. Visual Exploration of Production Data Using Small Multiples Design with Non-Uniform Color Mapping
    by Tien-Lung Sun, and Wen-Lin Kuo (See Abstract)


  8. A Data Mining Approach For Improving Polycythemia vera Diagnosis
    by Mehmed Kantardzic, Benjamin Djulbegovic, and Hazem Hamdan (See Abstract)


  9. Data Mining Techniques For Improved WSR-88D rainfall Estimation
    by T. B. Trafalis, A. Whitea, B. Santosa, and M. B. Richman (See Abstract)


  10. Knowledge Discovery Techniques for Predicting Country Investment Risk
    by Irma Becerra-Fernandez, Stelios H. Zanakis, and Steven Walczak (See Abstract)


  11. Customer's Time-Variant Purchase Behavior and Corresponding Marketing Strategies: An Online Retailer's Case
    by Sung Ho Ha, Sung Min Bae, Sang Chan Park (See Abstract)


  12. Data Mining Corrosion From Eddy Current Non-Destructive Tests
    by Donald E. Brown, and John R. Brence (See Abstract)


  13. DIVA: A Visualization System for Exploring Document Databases For Technology Forecasting
    by Steven Morris, Zheng Wu, Camille DeYong, Sinan Salman, Dagmawi Yemenu (See Abstract)





ABSTRACTS:

     
  1. FEATURE EXTRACTION USING ROUGH SET THEORY AND GENETIC ALGORITHMS - AN APPLICATION FOR THE SIMPLIFICATION OF PRODUCT QUALITY EVALUATION


  2. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 661-676.

    by Lian-Yin Zhai, Li-Pheng Khoo, and Sai-Cheong Fok

    School of Mechanical and Production Engineering
    Nanyang Technological University
    50 Nanyang Avenue
    Singapore 639798

    E-mail: mlyzhai@ntu.edu.sg
    E-mail: mlpkhoo@ntu.edu.sg
    E-mail: mscfok@ntu.edu.sg
    Web: http://www.ntu.edu.sg/MPE/Divisions/manufacturing/Faculty/mlpkhoo.htm


    ABSTRACT:
    Feature extraction is an important aspect in data mining and knowledge discovery. In this paper an integrated feature extraction approach, which is based on rough set theory and genetic algorithms, is proposed. Based on this approach, a prototype feature extraction system is established and illustrated in an application for the simplification of product quality evaluation. The prototype system successfully integrates the capability of rough set theory in handling uncertainty with a robust search engine, which is based on a genetic algorithm. The results show that it can remarkably reduce the cost and time consumed on product quality evaluation without compromising the overall specifications of the acceptance tests.

    KEY WORDS: Feature extraction, Rough sets, Genetic algorithms, Knowledge extraction.

    up


     
  3. A SCALABLE, INCREMENTAL LEARNING ALGORITHM FOR CLASSIFICATION PROBLEMS


  4. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 677-692.

    by Nong Ye, and Xiangyang Li

    P. O. Box 875906
    Department of Industrial Engineering
    Arizona State University
    Tempe, Arizona, 85287, USA

    E-mail: nongye@asu.edu
    Phone: (480) 965-7812, fax: 480-965-8692
    Web: http://ceaspub.eas.asu.edu/ye/


    ABSTRACT:
    In this paper a novel data mining algorithm, Clustering and Classification Algorithm-Supervised (CCA-S), is introduced. CCA-S enables the scalable, incremental learning of a non-hierarchical cluster structure from training data. This cluster structure serves as a function to map the attribute values of new data to the target class of these data, that is, classify new data. CCA-S utilizes both the distance and the target class of training data points to derive the cluster structure. In this paper, we first present problems with many existing data mining algorithms for classification problems, such as decision trees, artificial neural networks, in scalable and incremental learning. We then describe CCA-S and discuss its advantages in scalable, incremental learning. The testing results of applying CCA-S to several common data sets for classification problems are presented. The testing results show that the classification performance of CCA-S is comparable to the other data mining algorithms such as decision trees, artificial neural networks and discriminant analysis.

    KEY WORDS: Data mining, Classification, Incremental learning, Scalability.

    up


     
  5. A FUZZY CURVED SEARCH ALGORITHM FOR NEURAL NETWORK LEARNING


  6. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 693-702.

    by Peitsang Wu

    Department of Industrial Engineering and Management
    I-Shou University,
    Kaohsiung County
    Taiwan 84008, ROC

    E-mail: pwu@isu.edu.tw
    Web: http://www.im.isu.edu.tw/pwu/


    ABSTRACT:
    In this paper we develop a curved search algorithm which uses second-order information, for the learning algorithm for a supervised neural network. With the objective of reducing the training time, we introduce a fuzzy controller for adjusting the first and second-order approximation parameters in the iterative method to further reduce the training time and to avoid the spikes in the learning curve which sometimes occurred with the fixed step length. Computational results indicate a significant reduction in training when comparing with the delta learning rule.

    KEY WORDS: Neural Networks, Fuzzy Control, Curved-Search Algorithm, Back Propagation Learning.

    up


     
  7. MULTISCALE APPROXIMATION METHODS (MAME) TO LOCATE EMBEDDED CONSECUTIVE SUBSEQUENCES -- ITS APPLICATIONS IN STATISTICAL DATA MINING AND SPATIAL STATISTICS


  8. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 703-720.

    by Xiaming Huo

    School of Industrial and Systems Engineering
    Georgia Institute of Technology
    Atlanta, GA 30332-0205

    Web: http://www.isye.gatech.edu/~xiaoming/
    E-mail: xiaoming@isye.gatech.edu


    ABSTRACT:

          In statistical data mining and spatial statistics, many problems (such as detection and clustering) can be formulated as optimization problems whose objective functions are functions of consecutive subsequences. Some examples are (1) searching for a high activity region in a Bernoulli sequence, (2) estimating an underlying boxcar function in a time series, and (3) locating a high concentration area in a point process. A comprehensive search algorithm always ends up with a high order of computational complexity. For example, if a length-$n$ sequence is considered, the total number of all possible consecutive subsequences is ${n+1 \choose 2} \approx n^2/2$. A comprehensive search algorithm requires at least 0(n2) numerical operations.
          We present a multiscale-approximation-based approach. It is shown that most of the time, this method finds the exact same solution as a comprehensive search algorithm does. The derived Multiscale Approximation MEthods (MAMEs) have low complexity: for a length-$n$ sequence, the computational complexity of an MAME can be as low as $O(n)$. Numerical simulations verify these improvements.
          The MAME approach is particularly suitable for problems having large size data. One known drawback is that this method does not guarantee the exact optimal solution in every single run. However, simulations show that as long as the underlying subjects possess statistical significance, a MAME find the optimal solution with probability almost equal to one.

    KEY WORDS: Data mining, maximum likelihood estimate, multiscale approximation.

    up


     
  9. SIMPLE ASSOCIATION RULES (SAR) AND THE SAR-BASED RULE DISCOVERY


  10. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 721-734.

    by Guoqing Chen1, Qiang Wei1, De Liu2, Geert Wets3

    1: School of Economics and Management
    Tsinghua University
    Beijing 100084 CHINA

    E-mail: chengq@em.tsinghua.edu.cn

    2: Center for Research on E-Commerce
    University of Texas at Austin,
    Austin, TX 78712


    3: Limburg University
    Universitaire Campus Bld D
    3590 Diepenbeek
    BELGIUM


    ABSTRACT:
    Association rule mining is one of the most important fields in data mining and knowledge discovery in databases (KDD). Rules explosion is a problem of concern, as conventional mining algorithms often produce too many rules for decision makers to digest. Instead, this paper concentrates on a smaller set of rules, namely, a set of simple association rules (SAR) each with its consequent containing only a single attribute. Such a rule set can be used to derive all other association rules, meaning that the original rule set based on conventional algorithms can be "recovered" from the simple rules without any information loss. The number of simple rules is much less than the number of all rules. Moreover, corresponding algorithms are developed such that certain forms of rules can be generated in a more efficient manner based on simple rules.

    KEY WORDS: Data mining, Knowledge discovery from databases (KDD), Simple association rules.

    up


     
  11. MINING FUZZY ASSOCIATION RULES FOR CLASSIFICATION PROBLEMS


  12. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 735-750.

    by Y.-C. Hu1, R.-S, Chen1, and G.-H. Tzeng2 (Correspdonding Author)

    1: Institute of Information Management
    National Chiao Tung University
    Hsinchu 300
    Taiwan ROC

    E-mail: ghtzeng@cc.nctu.edu.tw


    2: Institute of Management of Technology
    National Chiao Tung University
    Hsinchu 300
    Taiwan ROC


    ABSTRACT:
    The effective development of data mining techniques for the discovery of knowledge from training samples for classification in industrial engineering is necessary in applications such as group technology. This paper proposes a learning algorithm, which can be viewed as a knowledge acquisition tool, to effectively discover fuzzy associative classification rules. The consequence part of each rule is one class label. The proposed learning algorithm consists of two phases: one to generate large fuzzy grids from training sample by fuzzy partitioning in each attribute, and the other to generate fuzzy associative classification rules by large fuzzy grids. The proposed learning algorithm is implemented by scanning training samples stored in a database only once and applying a sequence of Boolean operations to generate fuzzy grids and fuzzy rules; therefore, it can be easily extended to discover other types of fuzzy association rules. The simulation results from the iris data demonstrate that the proposed learning algorithm can effectively derive fuzzy associative classification rules.

    KEY WORDS: Data mining, Knowledge acquisition, Classification problems, Association rules.

    up


     
  13. VISUAL EXPLORATION OF PRODUCTION DATA USING SMALL MULTIPLES DESIGN WITH NON-UNIFORM COLOR MAPPING


  14. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 751-764.

    by Tien-Lung Sun1, and Wen-Lin Kuo2

    1: Department of Industrial Engineering and Management
    Yuan-Ze University
    Nei-Li
    Taiwan, R.O.C.
    Web: http://cadcam.iem.yzu.edu.tw/Professor/TLS/index-English.htm

    E-mail: tsun@saturn.yzu.edu.tw

    2: Department of Business Administration
    Chihlee Institute of Commerce
    Banchiau
    Taiwan, R.O.C.

    E-mail: wenlin.kuo@msa.hinet.net


    ABSTRACT:
    Visual data mining may overcome some of the flexibility problem often suffered by computer-centered data mining approaches. This can happen because human beings are introduced to the information discovery loop to take advantage of their natural strength in creative thinking and rapid visual pattern recognition to discover information not defined a priori and to perform approximated reasoning that computer algorithms are hard to do. This paper presents a novel visual exploration approach for mining abstract, multi- dimensional data stored in tables in a relational database. The visual image is constructed by converting each table into a visualization unit, called a table graph, and then by assembling these table graphs together to form a small multiples design. Different types of non-uniform color mappings to render this small multiples design could be automatically generated by minimizing the weight differences of colors in the visual image. These non-uniform color mappings are designed in such a way that the adjacent glyphs in a table graph that have near underlying values will be assigned with the same color. As such, visual patterns not able to see under the traditional uniform color mapping could be revealed. This enables the users to examine the input tables from different perspectives. The proposed flexible visualization method has been applied to generate visual images from which the users could quickly and easily compare the machine idle cost performances of alternative master production plans.

    KEY WORDS: Visual data mining, Data visualization, Production management.

    up


     
  15. A DATA MINING APPROACH FOR IMPROVING POLYCYTHEMIA VERA DIAGNOSIS


  16. Computers and Industrial Engineering, Vol. 43, No. 4 pp. 765-774.

    by Mehmed Kantardzic1, Benjamin Djulbegovic2, and Hazem Hamdan1

    1: Computer Engineering and Computer Science Department
    J. B. Speed Scientific School
    University of Louisville
    Louisville, KY 40292

    Phone: (502) 852-3703
    E-mail: mmkant01@athena.louisville.edu

    2: Division of Blood and Bone Marrow Transplant
    H.Lee Moffitt Cancer
    Center & Research Institute
    University of South Florida
    Tampa, FL


    ABSTRACT:
    This paper presents a data mining approach to the extraction of new decision rules for Polycythemia Vera (PV) diagnosis, based on a reduced and optimized set of lab parameters. Ten laboratory and other clinical findings (8 parameters from the PVSG criteria + Sex and HCT) on 431 PV patients from the original PVSG cohort, and records on 91 patients with other myeloproliferative disorders that can be easily misdiagnosed with PV, were included in this study. Significant differences were not found in the correctness of diagnostic classification of patients using either a trained artificial-neural network (ANN) (98.1%) or a support vector machine (SVM) (95%) versus using PVSG diagnostic criteria, which are considered as a "gold- standard" for the diagnosis of PV. Reducing the original parameters of our dataset to only four parameters: HCT, PLAT, SPLEEN and WBC, we still have obtained good classification results. New rules for improved differential diagnosis of PV are specified based on these four parameters. These rules may be used as a complement to the standard PVSG criteria, particularly in the differential diagnosis between PV and other myeloproliferative syndromes.

    KEY WORDS: Polycythemia Vera, Feature Extraction, Artificial Neural Networks, Support Vector Machines, Decision Rules, N-dimensional Visualization.

    up


     
  17. DATA MINING TECHNIQUES FOR IMPROVED WSR-88D RAINFALL ESTIMATION


  18. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 775-786.

    by T. B. Trafalis1, A. White1, B. Santosa1, and M. B. Richman2

    1: School of Industrial Engineering
    The University of Oklahoma
    202 W. Boyd, Ste 124
    Norman, OK 73019

    E-mail: ttrafalis@ou.edu
    E-mail: Andy.White@noaa.gov
    E-mail: bsant@ou.edu

    2: School of Meteorology
    The University of Oklahoma
    100 E. Boyd, Ste 1310
    Norman, OK 73019

    E-mail: mrichman@ou.edu


    ABSTRACT:
    The main objective of this paper is to utilize data mining and an intelligent system, Artificial Neural Networks (ANNs), to facilitate rainfall estimation. Ground truth rainfall data are necessary to apply intelligent systems techniques. A unique source of such data is the Oklahoma Mesonet. Recently, with the advent of a national network of advanced radars (i.e., WSR-88D), massive archived data sets have been created generating terabytes of data. Data mining can draw attention to meaningful structures in the archives of such radar data, particularly if guided by knowledge of how the atmosphere operates in rain producing systems. The WSR-88D records digital database contains three native variables: velocity, reflectivity, and spectrum width. However, current rainfall detection algorithms make use of only the reflectivity variable, leaving the other two to be exploited. The primary focus of the proposed research is to capitalize on these additional radar variables at multiple elevation angles and multiple bins in the horizontal for precipitation prediction. Linear regression models and feed-forward ANNs are used for precipitation prediction. Rainfall totals from the Oklahoma Mesonet are utilized for the training and verification data. Results for the linear modeling suggest that, taken separately, reflectivity and spectrum width models are highly significant. However, when the two are combined in one linear model, they are not significantly more accurate than reflectivity alone. All linear models are prone to under-prediction when heavy rainfall occurred. The ANN results of reflectivity and spectrum width inputs show that a 250-5-1 architecture is least prone to under- prediction of heavy rainfall amounts. When a three-part ANN was applied to reflectivity based on light, moderate to heavy rainfall, in addition to spectrum width, it estimated rainfall amounts most accurately of all methods examined.

    KEY WORDS: Back-propagation, Clustering, Data Mining Applications, Dimensionality Reduction, Exploratory Data Analysis, Feed-forward Neural Networks, Mean-Square Error, Neural Network Architectures, Pattern Recognition, Principal Component Analysis, Rainfall Estimation.

    up


     
  19. KNOWLEDGE DISCOVERY TECHNIQUES FOR PREDICTING COUNTRY INVESTMENT RISK


  20. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 787-800.

    by Irma Becerra-Fernandez1, Stelios H. Zanakis1 (Corresponding Author), and Steven Walczak2

    Florida International University
    Decision Sciences & Information Systems Department
    College of Business Administration
    Miami, FL 33199

    Phone (305)348-2830
    E-mail: zanakis@fiu.edu
    E-mail: becferi@fiu.edu

    2: University of Colorado at Denver,
    College of Business
    Denver, CO 80217-3364

    E-mail: swalczak@carbon.cudenver.edu


    ABSTRACT:
    This paper presents the insights gained from applying knowledge discovery in databases (KDD) processes for the purpose of developing intelligent models, used to classify a country's investing risk based on a variety of factors. Inferential data mining techniques, like C5.0, as well as intelligent learning techniques, like neural networks, were applied to a dataset of fifty-two countries. The dataset included 27 variables (economic, stock market performance/risk and regulatory efficiencies) on 52 countries, whose investing risk category was assessed in a Wall Street Journal survey of international experts. The results of applying KDD techniques to the dataset are promising, and successfully classified most countries as compared to the experts' classifications. Implementation details, results, and future plans are also presented.

    KEY WORDS: Data mining, Knowledge discovery, Country investing risk.

    up


     
  21. CUSTOMER'S TIME-VARIANT PURCHASE BEHAVIOR AND CORRESPONDING MARKETING STRATEGIES: AN ONLINE RETAILER'S CASE


  22. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 801-820.

    by Sung Ho Ha, Sung Min Bae, and Sang Chan Park

    Dept. of Industrial Engineering
    Korean Advanced Institute of Science and Technology
    KOREA

    Web: http://captain.kaist.ac.kr
    E-mail: hash@major.kaist.ac.kr
    E-mail: loveiris@major.kaist.ac.kr
    E-mail: sangpark@cais.kaist.ac.kr


    ABSTRACT:
    The traditional customer relationship management (CRM) studies are mainly focused on CRM in a specific point of time. The static CRM and derived knowledge of customer behavior could help marketers to redirect marketing resources for profit gain at the given point in time. However, as time goes on the static knowledge becomes obsolete. Therefore, application of CRM to an online retailer should be done dynamically in time. Though the concept of buying-behavior-based CRM was advanced several decades ago, virtually little application of the dynamic CRM has been reported to date. In this paper, we propose a dynamic CRM model utilizing data mining and a Monitoring Agent System (MAS) to extract longitudinal knowledge from the customer data and to analyze customer behavior patterns over time for the retailer. Furthermore, we show that longitudinal CRM could be usefully applied to solving several managerial problems, which any retailer may face.

    KEY WORDS: Customer Relationship Management, Data Mining, Electronic Commerce, Marketing Strategy, Markov Chains.

    up


     
  23. DATA MINING CORROSION FROM EDDY CURRENT NON-DESTRUCTIVE TESTS


  24. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 821-840.

    by Donald E. Brown1, and John R. Brence2

    1: Department of Systems and Information Engineering
    University of Virginia
    Charlottesville, VA 22903

    Phone: (804) 924-5393
    E-mail: brown@virginia.edu


    2: Department of Systems Engineering
    United States Military Academy
    West Point, NY 10996

    Phone: (845) 938-5535
    E-mail: fj672@usma.edu


    ABSTRACT:
    Quicker, more effective methods of corrosion prediction and classification can help to ensure a safe and operational transportation system for both civilian and military sectors. This is especially critical now as transportation providers attempt to meet the increased expense of repairing aging aircraft with smaller budgets. These budget constraints make it imperative to find corrosion and to correctly determine the appropriate time to replace corroded parts. If the part is replaced too soon, the result is wasted resources. However, if the part is not replaced soon enough, it could cause a catastrophic accident. The discovery of models that limit the possibility of a costly accident while optimizing resource utilization would allow transportation providers to efficiently focus their maintenance efforts. While our concern in this study was with aircraft, the results will also be useful to other transportation providers. This paper describes the discovery and comparison of empirical models to predict corrosion damage from non-destructive test (NDT) data. The NDT data were derived from eddy current (EC) scans of the United States Air Force's (USAF) KC-135 aircraft. While we might suspect a link between NDT results and corrosion, up until now this link has not been formally established. Instead, the NDT data have been converted into false color images that are analyzed visually by maintenance operators. The models we discovered are quite complex and suggest data mining approaches we can sometimes more effectively handle noisy data through more complex models rather than simpler ones. Our results also show that while a variety of modeling techniques can predict corrosion with reasonable accuracy, regression trees are particularly effective in modeling the complex relationships between the eddy current measurements and the actual amount of corrosion.

    KEY WORDS: To be filled in soon...


    up


     
  25. DIVA: A VISUALIZATION SYSTEM FOR EXPLORING DOCUMENT DATABASES FOR TECHNOLOGY FORECASTING


  26. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 841-xxx.

    by Steven Morris1, Zheng Wu1, Camile DeYong2, Sinan Salman2, and Dagmawi Yemenu2

    1: Dept. of Electrical and Computer Engineering
    202 Engineering So.
    Oklahoma State University
    Stillwater, OK 74078

    E-mail: samorri@okstate.edu
    FAX: (405) 744-9198

    2: Industrial Engineering and Management
    322F Engineering North
    Oklahoma State University
    Stillwater, OK 74078


    ABSTRACT:
    DIVA (for Database Information Visualization and Analysis system) is a computer program which helps perform bibliometric analysis of collections of scientific literature and patents for technology forecasting. Documents, drawn from the technological field of interest, are visualized as clusters on a two dimensional map, permitting exploration of the relationships among the documents and document clusters and also permitting derivation of summary data about each document cluster. Such information, when provided to subject matter expects performing a technology forecast, can yield insight into trends in the technological field of interest. This paper discusses the document visualization and analysis process: acquisition of documents, mapping documents, clustering, exploration of relationships, and generation of summary and trend information. Detailed discussion of DIVA exploration functions is presented and followed by an example of visualization and analysis of a set of documents about chemical sensors.

    KEY WORDS: Technology forecasting, Information visualization, Knowledge discovery in databases (KDD), Data mining, Citation analysis, Document mapping, Bibliometrics, Scientometrics.

    up





Dr. Triantaphyllou's Homepage

Dr. Triantaphyllou's Books / Special Issues web site     A new site in Dr. Triantaphyllou's Homepage

Dr. Liao's Homepage

Dr. Iyengar's Homepage





Send suggestions / comments to Dr. E. Triantaphyllou (trianta@lsu.edu).