Data Mining, Southeast Asia Edition

Elsevier, 2006 M04 6 - 800 pages

Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.

Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data— including stream data, sequence data, graph structured data, social network data, and multi-relational data.

A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business data
Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning
Dozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects
Complete classroom support for instructors at www.mkp.com/datamining2e companion site

Preview this book »

From inside the book

Page xiv
... Partitioning Methods 401 7.4.1 Classical Partitioning Methods: k-Means and k-Medoids 402 7.4.2 Partitioning Methods in Large Databases: From k-Medoids to CLARANS 407 Hierarchical Methods 408 7.5.1 Agglomerative and Divisive Hierarchical ...

Page 38
... partitions, which are processed in parallel. The results from the partitions are then merged. Moreover, the high cost of some data mining processes promotes the need for incremental data mining algorithms that incorporate database ...

Page 51
... partitioning the data into smaller subsets, computing the measure for each subset, and then merging the results in order to arrive at the measure's value for the original (entire) data set. Both sum() and count() are distributive ...

Page 56
... partition Thus the computation of the variance and standard deviation is scalable in large databases. Graphic ... Partitioning rules for constructing histograms for numerical attributes are discussed in Section 2.5.4. In an equal-width ...

Page 63
... Partition into (equal-frequency) bins: Bin 1:4, 8, 15 Bin 2:21, 21, 24 Bin 3:25, 28, 34 Smoothing by bin means: Bin ... partitioned into equal-frequency bins of size 3 (i.e., each bin contains three values). In smoothing by bin means ...

Where's the rest of this book?

Selected pages

1 Introduction	1

2 Data Preprocessing	47

An Overview	105

4 Data Cube Computation and Data Generalization	157

5 Mining Frequent Patterns Associations and Correlations	227

6 Classification and Prediction	285

7 Cluster Analysis	383

8 Mining Stream TimeSeries and Sequence Data	467

9 Graph Mining Social Network Analysis and Multirelational Data Mining	535

10 Mining Object Spatial Multimedia Text and Web Data	591

11 Applications and Trends in Data Mining	649

An Introduction to Microsofts OLE DB for Data Mining	691

Bibliography	703

Copyright

Other editions - View all

Data Mining: Concepts and Techniques
Jiawei Han
No preview available - 2006

Common terms and phrases

accuracy aggregate algorithm AllElectronics applications approach Apriori association rules attribute Bayesian categorical cells Chapter class label classification cluster analysis clustering methods concept hierarchy constraints contains correlation count crosstab cuboid data analysis data cube data marts data mining system data set data streams data warehouse database systems decision tree defined described dimension table dimensions distribution document efficient example Figure frequent itemsets frequent patterns function given graph graph mining iceberg cube input k-medoids machine learning measure minimum support mining process multidimensional multimedia data multiple multirelational neural network node objects OLAP on-line outliers partition pattern mining performed prediction pruning regression relational database represented retrieval ROLAP scalable schema Section selection sequence similar snowflake schema space spatial data specified statistical stored stream data structure subset Suppose target techniques tion training tuples transaction transformation tuples typically values variables vector visualization

Popular passages

Page 230 - More formally, let / = (ih i'2 im) be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T c /• Each item is a binary variable representing whether an item was bought.‎

Appears in 64 books from 1975-2008

Page iii - Stored Procedures: A Complete Guide to SQL/PSM Jim Melton Principles of Multimedia Database Systems VS Subrahmanian Principles of Database Query Processing for Advanced Applications Clement T. Yu and Weiyi Meng Advanced Database Systems Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, VS Subrahmanian, and Roberto Zicari Principles of Transaction Processing Philip A. Bernstein and Eric Newcomer Using the New DB2: IBMs Object-Relational Database System Don Chamberlin Distributed...‎

Appears in 46 books from 1995-2006

Page 704 - Lin, HS Sawhney, and K. Shim. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases.‎

Appears in 116 books from 1987-2008

Page 328 - Figure 1, includes an input layer, one or more hidden layers, and an output layer. The nodes in each layer are connected to each node in the adjacent layer.‎

Appears in 142 books from 1961-2008

Page 414 - Thus the time complexity of the algorithm is O(n) where n is the number of posts in a given configuration.‎

Appears in 28 books from 1959-2008

Page iv - Interfaces, & the Incremental Approach Michael L. Brodie and Michael Stonebraker Atomic Transactions Nancy Lynch, Michael Merritt, William Weihl, and Alan Fekete Query Processing for Advanced Database Systems Edited by Johann Christoph Freytag, David Maier, and Gottfried Vossen Transaction Processing: Concepts and Techniques Jim Gray and Andreas Reuter Understanding the New SQL: A Complete Guide Jim Melton and Alan R.‎

Appears in 64 books from 1993-2006

Page 731 - E. Osuna, R. Freund and F. Girosi, "An improved training algorithm for support vector machines,‎

Appears in 42 books from 1990-2007

Page 735 - G. Sheikholeslami, S. Chatterjee, and A. Zhang. Wavecluster: A multi-resolution clustering approach for very large spatial databases.‎

Appears in 41 books from 1995-2007

Page 706 - B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, Models and issues in data stream systems, in Proceedings of the 2002 ACM Symposium on Principles of Database Systems, June 2002, pp.‎

Appears in 62 books from 1994-2007

Page 718 - M. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints.‎

Appears in 24 books from 1995-2007

References to this book

Geographic Data Mining and Knowledge Discovery
Harvey J. Miller,Jiawei Han
No preview available - 2003

Statistik
Günter Bamberg,Franz Baur,Michael Krapp
No preview available - 2008

All Book Search results »

About the author (2006)

Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.

Jian Pei is currently a Canada Research Chair (Tier 1) in Big Data Science and a Professor in the School of Computing Science at Simon Fraser University. He is also an associate member of the Department of Statistics and Actuarial Science. He is a well-known leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications. He is recognized as a Fellow of the Association of Computing Machinery (ACM) for his “contributions to the foundation, methodology and applications of data mining and as a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for his “contributions to data mining and knowledge discovery. He is the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE), a director of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM), and a general co-chair or program committee co-chair of many premier conferences.

Micheline Kamber is a researcher with a passion for writing in easy-to-understand terms. She has a master's degree in computer science (specializing in artificial intelligence) from Concordia University, Canada.

Bibliographic information

Title	Data Mining, Southeast Asia Edition The Morgan Kaufmann Series in Data Management Systems
Authors	Jiawei Han, Jian Pei, Micheline Kamber
Edition	2
Publisher	Elsevier, 2006
ISBN	0080475582, 9780080475585
Length	800 pages

Export Citation	BiBTeX EndNote RefMan

About Google Books - Privacy Policy - Terms of Service - Information for Publishers - Report an issue - Help - Google Home

Books