Here is some summary information regarding the topic and this new book. By scrolling down on this page one can access much more detailed information regarding this book.

The importance of having efficient and effective methods for data mining and knowledge discovery (DM) is rapidly growing. This is due to the wide use of fast and affordable computing power and data storage media and also the gathering of huge amounts of data in almost all aspects of human activity and interest. While numerous methods have been developed, the focus of this book presents algorithms and applications using one popular method that has been formulated in terms of binary attributes, i.e., by Boolean functions defined on several attributes that are easily transformed into rules that can express new knowledge.

This book presents methods that deal with key data mining and knowledge discovery issues in an intuitive manner, in a natural sequence, and in a way that can be easily understood and interpreted by a wide array of experts and end users. The presentation provides a unique perspective into the essence of some fundamental DM issues, many of which come from important real life applications such as breast cancer diagnosis.

Applications and algorithms are accompanied by extensive experimental results and are presented in a way such that anyone with a minimum background in mathematics and computer science can benefit from the exposition. Rigor in mathematics and algorithmic development is not compromised and each chapter systematically offers some possible extensions for future research.

A Unique Book on Data Mining and Mathematical Logic. Data Mining and Knowledge Discovery Via Logic-Based Methods: Theory, Algorithms, and Applications


by Evangelos Triantaphyllou


This monograph is just published (June 2010)! The publisher is Springer. It is 350 pages long.
ISBN-10: 1441916296
ISBN-13: 978-1441916297





TABLE OF CONTENTS

Click here to download the Table of Contents as a PDF file


Foreword........................................................vii
Preface..........................................................xi
Acknowledgments................................................xvii 
List of Figures...............................................xxvii
List of Tables.................................................xxxi

___________________________________________________
PART I  Theory and Algorithmic Issues 
___________________________________________________

Chapter 1
INTRODUCTION......................................................3
1.1   	What is Data Mining and Knowledge Discovery?..............3
1.2	Some Potential Application Areas for Data Mining 
        and Knowledge Discovery...................................4
1.2.1 		Applications in Engineering.......................5
1.2.2		Applications In Medical Sciences..................5
1.2.3		Applications In the Basic Sciences................6
1.2.4		Applications In Business..........................6
1.2.5		Applications In the Political and 
		Social Sciences...................................7
1.3	The Data Mining and Knowledge Discovery Process...........7
1.3.1		Problem Definition................................7
1.3.2		Collecting the Data...............................9
1.3.3		Data Preprocessing...............................10
1.3.4		Application of the Main Data Mining and
		Knowledge Discovery Algorithms...................11
1.3.5		Interpretation of the Results of the Data 
		Mining and Knowledge Discovery Process...........12
1.4  	Four Key Research Challenges in Data Mining and 
	Knowledge Discovery......................................12
1.4.1		Collecting Observations about the Behavior 
		of a System......................................13
1.4.2  		Identifying Patterns from Collections of Data....14
1.4.3  		Which Data to Consider for Evaluation Next?......17
1.4.4  		Do Patterns Always Exist in Data?................19
1.5  	Concluding Remarks.......................................20
Move UP to the Top of the Webpage


Chapter 2
INFERRING A BOOLEAN FUNCTION FROM POSITIVE 
AND NEGATIVE EXAMPLES............................................21
2.1  	Introduction.............................................21
2.2	Some Background Information..............................22
2.3	Data Binarization........................................26
2.4   	Definitions and Terminology..............................29
2.5	Generating Clauses from Negative Examples Only...........32
2.6	Clause Inference as a Satisfiability Problem.............33
2.7	An SAT Approach for Inferring CNF Clauses................34
2.8	The One Clause At a Time (OCAT) Concept..................35
2.9   	A Branch-and-Bound Approach 
	for Inferring a Single Clause............................38
2.10   	A Heuristic for Problem Preprocessing....................45
2.11	Some Computational Results...............................47
2.12  	Concluding Remarks.......................................50
APPENDIX  	
The SAT Formulation for the Illustrative Example 
(the CNF case)...................................................52
Move UP to the Top of the Webpage


Chapter 3
A REVISED BRANCH-AND-BOUND APPROACH FOR INFERRING A
BOOLEAN FUNCTION FROM EXAMPLES...................................57
3.1	Some Background Information..............................57
3.2	The Revised Branch-and-Bound Algorithm...................57
3.2.1  		Generating a Single CNF Clause...................58
3.2.2		Generating a Single DNF Clause...................62
3.3	Some Computational Results...............................64
3.4	Concluding Remarks.......................................69
Move UP to the Top of the Webpage


Chapter 4
SOME FAST HEURISTICS FOR INFERRING A BOOLEAN FUNCTION
FROM EXAMPLES....................................................73
4.1	Some Background Information..............................73
4.2	A Fast Heuristic for Inferring a Boolean Function 
	From Complete Data.......................................75
4.3	A Fast Heuristic for Inferring a Boolean Function 
	From Incomplete Data.....................................80
4.4	Some Computational Results...............................84
4.4.1 		Results for the RA1 Algorithm on the 
		Wisconsin Cancer Data............................86
4.4.2	  	Results for the RA2 Heuristic on the 
		Wisconsin Cancer Data with Some Missing Values...91
4.4.3 		Comparison of the RA1 Algorithm and the B&B  
		Method by Using Large Random Data Sets...........92
4.5 	Concluding Remarks.......................................98
Move UP to the Top of the Webpage


Chapter 5
AN APPROACH TO GUIDED LEARNING OF BOOLEAN FUNCTIONS.............101
5.1  	Some Background Information.............................101
5.2	Problem Description.....................................104
5.3 	The Proposed Approach...................................105
5.4   	On the Number of Candidate Solutions....................110
5.5 	An Illustrative Example.................................111
5.6  	Some Computational Results..............................113 
5.7   	Concluding Remarks......................................122
Move UP to the Top of the Webpage


Chapter 6
AN INCREMENTAL LEARNING ALGORITHM FOR INFERRING 
BOOLEAN FUNCTIONS...............................................125
6.1  	Some Background Information.............................125
6.2  	Problem Description.....................................126
6.3  	Some Related Developments...............................127
6.4  	The Proposed Incremental Algorithm......................130
6.4.1		Repairing a Boolean Function that 
     		Incorrectly Rejects a Positive Example..........131
6.4.2		Repairing of a Boolean Function that 
  		Incorrectly Accepts a Negative Example..........133
6.4.3		Computational Complexity of the Algorithms 
		for the ILE Approach............................134
6.5  	Experimental Data.......................................134
6.6  	Analysis of the Computational Results...................135
6.6.1	 	Results on the Classification Accuracy..........136
6.6.2		Results on the Number of Clauses................139
6.6.3		Results on the CPU Times........................141
6.7  	Concluding Remarks......................................144
Move UP to the Top of the Webpage


Chapter 7
A DUALITY RELATIONSHIP BETWEEN BOOLEAN FUNCTIONS IN CNF 
AND DNF DERIVABLE FROM THE SAME TRAINING EXAMPLES...............147
7.1	Introduction............................................147
7.2 	Generating Boolean Functions in CNF and DNF.............147
7.3	An Example of Deriving Boolean Functions in 
	CNF and DNF.............................................148
7.4 	Some Computational Results..............................149
7.5  	Concluding Remarks......................................150
Move UP to the Top of the Webpage


Chapter 8
THE REJECTABILITY GRAPH OF TWO SETS OF EXAMPLES.................151
8.1	Introduction............................................151
8.2 	The Definition of the Rejectability Graph...............152
8.2.1	 	Properties of the Rejectability Graph...........153
8.2.2		On the Minimum Clique Cover of the 
		Rejectability Graph.............................155 
8.3   	Problem Decomposition...................................156
8.3.1		Connected Components............................156
8.3.2		Clique Cover....................................157
8.4     An Example of Using the Rejectability Graph.............158
8.5	Some Computational Results..............................160
8.6   	Concluding Remarks......................................170
Move UP to the Top of the Webpage

___________________________________________________
PART II  Application Issues 
___________________________________________________


Chapter 9
THE RELIABILITY ISSUE IN DATA MINING: THE CASE OF 
COMPUTER-AIDED BREAST CANCER DIAGNOSIS..........................173
9.1 	Introduction............................................173
9.2  	Some Background Information on Computer-Aided 
	Breast Cancer Diagnosis.................................173
9.3  	Reliability Criteria....................................175
9.4  	The Representation / Narrow Vicinity Hypothesis.........178
9.5 	Some Computational Results..............................181
9.6	Concluding Remarks......................................183
APPENDIX  I 
Definitions of the Key Attributes...............................185 
APPENDIX  II 
Technical Procedures............................................187
9.A.1	The Interactive Approach................................187
9.A.2	The Hierarchical Approach...............................188
9.A.3	The Monotonicity Property...............................188
9.A.4	Logical Discriminant Functions..........................189
Move UP to the Top of the Webpage


Chapter 10
DATA MINING AND KNOWLEDGE DISCOVERY BY MEANS OF 
MONOTONE BOOLEAN FUNCTIONS......................................191
10.1  	Introduction............................................191
10.2 	Background Information..................................193
10.2.1 		Problem Descriptions............................193
10.2.2 		Hierarchical Decomposition of Variables.........196
10.2.3 		Some Key Properties of Monotone 
		Boolean Functions...............................197
10.2.4 		Existing Approaches to Problem 1................201
10.2.5 		An Existing Approach to Problem 2...............203
10.2.6 		Existing Approaches to Problem 3................204
10.2.7 		Stochastic Models for Problem 3.................204
10.3 	Inference Objectives and Methodology....................206
10.3.1 		The Inference Objective for Problem 1...........206
10.3.2 		The Inference Objective for Problem 2...........207
10.3.3 		The Inference Objective for Problem 3...........208
10.3.4	Incremental Updates for the Fixed 					
	Misclassification Probability Model.....................208
10.3.5 		Selection Criteria for Problem 1................209
10.3.6 		Selection Criteria for 
		Problems 2.1, 2.2, and 2.3......................210
10.3.7 		Selection Criterion for Problem 3...............210
10.4 	Experimental Results....................................215
10.4.1 		Experimental Results for Problem 1..............215
10.4.2 		Experimental Results for Problem 2..............217
10.4.3		Experimental Results for Problem 3..............219
10.5 	Summary and Discussion..................................223
10.5.1 		Summary of the Research Findings................223
10.5.2 		Significance of the Research Findings...........225
10.5.3 		Future Research Directions......................226
10.6 	Concluding Remarks......................................227
Move UP to the Top of the Webpage


Chapter 11
SOME APPLICATION ISSUES ON MONOTONE BOOLEAN FUNCTIONS...........229
11.1	Some Background Information.............................229
11.2 	Expressing Any Boolean Function in Terms of 
	Monotone Ones...........................................229
11.3	Formulations of Diagnostic Problems as the Inference  
	of Nested Monotone Boolean Functions....................231
11.3.1		An Application to a Reliability 
		Engineering Problem.............................231
11.3.2		An Application to the Breast Cancer 
		Diagnosis Problem...............................232
11.4 	 	Design Problems.................................233
11.5 	 	Process Diagnosis Problems......................234
11.6	Three Major Illusions in the Evaluation of 
	the Accuracy of Data Mining Models......................234
11.6.1	 	First Illusion:  The Single Index 
		Accuracy Rate...................................235
11.6.2		Second Illusion:  Accurate Diagnosis 
		without Hard Cases..............................235
11.6.3		Third Illusion:  High Accuracy on Random 
		Test Data Only..................................236
11.7 	Identification of the Monotonicity Property.............236
11.8	Concluding Remarks......................................239
Move UP to the Top of the Webpage


Chapter 12
MINING OF ASSOCIATION RULES.....................................241
12.1  	Some Background Information.............................241
12.2  	Problem Description.....................................243
12.3 	Methodology.............................................244
12.3.1 		Some Related Algorithms.........................244
12.3.2  	Alterations to the RA1 Algorithm................245
12.4  	Computational Experiments...............................247
12.5  	Concluding Remarks......................................255
Move UP to the Top of the Webpage


Chapter 13
DATA MINING OF TEXT DOCUMENTS...................................257
13.1		Some Background Information.....................257
13.2		A Brief Description of the Document 
		Clustering Process..............................259
13.3		Using the OACT Approach to Classify 
		Text Documents..................................260
13.4		An Overview of the Vector Space Model...........262
13.5 	A Guided Learning Approach for the Classification of
		Text Documents..................................264
13.6		Experimental Data...............................265
13.7		Testing Methodology.............................267
13.7.1		The Leave-One-Out Cross Validation..............267
13.7.2		The 30/30 Cross Validation......................267
13.7.3		Statistical Performance of Both Algorithms......267
13.7.4		Experimental Setting for the Guided Learning 
		Approach........................................268
13.8	Results for the Leave-One-Out and the 30/30 Cross  
	Validations.............................................269
13.9	Results for the Guided Learning Approach................272
13.10  	Concluding Remarks......................................275
Move UP to the Top of the Webpage


Chapter  14
FIRST CASE STUDY:  PREDICTING MUSCLE FATIGUE FROM EMG SIGNALS...277
14.1 	Introduction............................................277
14.2	General Problem Description.............................277
14.3	Experimental Data.......................................279
14.4	Analysis of the EMG Data................................280
14.4.1		The Effects of Load and Electrode Orientation...280
14.4.2		The Effects of Muscle Condition, 
		Load, and Electrode.............................280
14.5	A Comparative Analysis of the EMG Data..................281
14.5.1 		Results by the OCAT / RA1 Approach..............282
14.5.2 		Results by the Fisher's Linear 
		Discriminant Analysis...........................283
14.5.3  	Results by Logistic Regression..................284
14.5.4 		A Neural Network Approach.......................285
14.6	Concluding Remarks......................................287
Move UP to the Top of the Webpage


Chapter 15
SECOND CASE STUDY:  INFERENCE OF DIAGNOSTIC RULES FOR 
BREAST CANCER...................................................289
15.1	Introduction............................................289
15.2	Description of the Data Set.............................289
15.3 	Description of the Inferred Rules.......................292
15.4 	Concluding Remarks......................................296
Move UP to the Top of the Webpage


Chapter 16
A FUZZY LOGIC APPROACH TO ATTRIBUTE FORMALIZATION: ANALYSIS 
OF LOBULATION FOR BREAST CANCER DIAGNOSIS.......................297
16.1 	Introduction............................................297
16.2 	Some Background Information on Digital
	Mammography.............................................297
16.3	Some Background Information on Fuzzy Sets...............299
16.4  		Formalization with Fuzzy Logic..................300
16.5		Degrees of Lobularity and Microlobularity.......306
16.6 	Concluding Remarks......................................308	
Move UP to the Top of the Webpage
 

Chapter 17 
CONCLUSIONS....................................................309
17.1	General Concluding Remarks.............................309
17.2	Twelve Key Areas of Potential Future Research on Data
	Mining and Knowledge Discovery from Databases..........310
17.2.1 		Overfitting and Overgeneralization.............310
17.2.2 		Guided Learning................................311
17.2.3		Stochasticity..................................311
17.2.4 		More on Monotonicity...........................311
17.2.5		Visualization..................................311
17.2.6 		Systems for Distributed Environments...........312
17.2.7 		Developing Better Exact Algorithms 
		and Heuristics.................................312
17.2.8 		Hybridization	and Other Algorithmic Issues...312
17.2.9 		Systems with Self Explanatory Capabilities.....313
17.2.10 	New Systems for Image Analysis.................313
17.2.11		Systems for Web Applications...................313
17.2.12		Developing More Applications...................314
17.3	Epilogue...............................................314
Move UP to the Top of the Webpage


References.....................................................317


Subject Index..................................................335


Author Index...................................................345


About the Author...............................................349

Move UP to the Top of the Webpage
 

Visit Dr. Triantaphyllou's Homepage
Dr. Triantaphyllou's Books / Special Issues web site     a new site!

Send suggestions / comments to Dr. E. Triantaphyllou (trianta@lsu.edu).