Pattern Recognition Algorithms for Data Mining

Front Cover
CRC Press, 2004 M05 27 - 280 pages
This valuable text addresses different pattern recognition (PR) tasks in a unified framework with both theoretical and experimental results. Tasks covered include data condensation, feature selection, case generation, clustering/classification, and rule generation and evaluation. Organized into eight chapters, the book begins by introducing PR, data mining, and knowledge discovery concepts. The authors proceed to analyze the tasks of multi-scale data condensation and dimensionality reduction. Then they explore the problem of learning with support vector machine (SVM), and conclude by highlighting the significance of granular computing for different mining tasks in a soft paradigm.

From inside the book

Contents

Introduction
1
Multiscale Data Condensation
29
Unsupervised Feature Selection
59
Active Learning Using Support Vector Machine
83
Roughfuzzy Case Generation
103
Roughfuzzy Clustering
123
Rough SelfOrganizing Map
149
Classification Rule Generation and Evaluation using Modular Roughfuzzy MLP
165
Role of SoftComputing Tools in KDD
201
Data Sets Used in Experiments
211
References
215
Index
237
About the Authors
243
Copyright

Other editions - View all

Common terms and phrases

Popular passages

Page 3 - Knowledge discovery in databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [1].
Page 10 - ... intelligent data analysis are not yet mature [1]. KDD can be regarded as one of the prime function of data mining that is a new generation of information processing technology. Data mining involves fitting models to or determining pattern from observed data. The fitted models play the role of inferred knowledge. Deciding whether the model reflects useful knowledge or not is a part of the overall KDD process for which subjective human judgment is usually required. Typically, a data mining algorithm...
Page 107 - More formally, an information system can be noted as a pair (U, A) , where U is a non-empty finite set of objects called the universe and A is a non-empty finite set of attributes...
Page 2 - The massive databases that we are talking about are generally characterized by the presence of not just numeric, but also textual, symbolic, pictorial and aural data. They may contain redundancy, errors, imprecision, and so on. KDD is aimed at discovering natural structures within such massive and often heterogeneous data. Therefore PR plays a significant role in KDD process. However, KDD is being visualized as not just being capable of knowledge discovery using generalizations and magnifications...
Page 6 - ... recursive nature of a grammar. A grammar (rewriting) rule can be applied any number of times, so it is possible to express in a very compact way some basic structural characteristics of an infinite set of sentences. Of course, the practical utility of such an approach depends upon our ability to recognize the simple pattern primitives and their relationships represented by the composition operations. The various relations or composition operations defined among subpatterns can usually be expressed...
Page 75 - SFS is a bottom-up search procedure where one feature at a time is added to the current feature set. At each stage, the feature to be included in the feature set is selected among the remaining available features which have not been added to the feature set.
Page 223 - M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and AI Verkamo. Finding interesting rules from large sets of discovered association rules.
Page 107 - B. Given this information, let the following problem be posed: • Identify the collection of infected students. Clearly, there cannot be a unique answer. But any set / that is given as an answer, must contain B and at least one student from each class comprising B. In other words, it must have B as its lower approximation and B as its upper approximation.
Page 5 - The problem of classification is basically one of partitioning the feature space into regions, one region for each category of input.
Page 128 - STING is a grid-based multiresolution clustering technique in which the spatial area is divided into rectangular cells. There are usually several levels of such rectangular cells corresponding to different levels of resolution, and these cells form a hierarchical structure: each cell at a high level is partitioned to form a number of cells at the next lower level. Statistical information regarding the attributes in each grid cell (such as the mean, maximum, and minimum values) is precomputed and...

References to this book

About the author (2004)

Pal, Sankar K.; Mitra, Pabitra

Bibliographic information