Data Mining, Southeast Asia Edition
Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.
Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data— including stream data, sequence data, graph structured data, social network data, and multi-relational data.
Concept hierarchies can be used in an alternative form of data reduction where
we replace low-level data (such as raw values for age) with higher-level concepts
(such as youth, middle-aged, or senior). This form of data reduction is the topic ...
Sometimes, each value xi in a set may be associated with a weight wi, for i = 1,...,
N. The weights reflect the significance, importance, or occurrence frequency ... A
major problem with the mean is its sensitivity to extreme (e.g., outlier) values.
Data cleaning (or data cleansing) routines attempt to fill in missing values,
smooth out noise while identifying outliers, and correct inconsistencies in the
data. In this section, you will study basic methods for data cleaning. Section 2.3.1
looks at ...
Fill in the missing value manually: In general, this approach is time-consuming
and may not be feasible given a large data set with many missing values. 3. Use
a global constant to fill in the missing value: Replace all missing attribute values ...
Process. Missing values, noise, and inconsistencies contribute to inaccurate data
. So far, we have looked at techniques for handling missing data and for
smoothing data. “But data cleaning is a big job. What about data cleaning as a
What people are saying - Write a review
8 Mining Stream TimeSeries and Sequence Data
9 Graph Mining Social Network Analysis and Multirelational Data Mining
10 Mining Object Spatial Multimedia Text and Web Data
11 Applications and Trends in Data Mining
An Introduction to Microsofts OLE DB for Data Mining