## Proceedings of the Sixth SIAM International Conference on Data MiningThe Sixth SIAM International Conference on Data Mining continues the tradition of presenting approaches, tools, and systems for data mining in fields such as science, engineering, industrial processes, healthcare, and medicine. The datasets in these fields are large, complex, and often noisy. Extracting knowledge requires the use of sophisticated, high-performance, and principled analysis techniques and algorithms, based on sound statistical foundations. These techniques in turn require powerful visualization technologies; implementations that must be carefully tuned for performance; software systems that are usable by scientists, engineers, and physicians as well as researchers; and infrastructures that support them. |

### Contents

On the Necessary and Sufficient Conditions of a Meaningful Distance Function for High Dimensional | 12 |

Transform Regression and the Kolmogorov Superposition Theorem | 35 |

Deriving Private Information from Randomly Perturbed Ratings | 59 |

Automated Knowledge Discovery from Simulators | 82 |

Mining Control Flow Abnormality for Logic Error Isolation | 106 |

An Efficient Method for Generating | 130 |

KMeans Clustering over a Large Dynamic Network | 153 |

Contents | 154 |

Algorithm and Analysis | 407 |

Mining Frequent Closed ltemsets Out of Core | 419 |

Local L2Thresholding Based Data Mining in PeertoPeer Systems | 430 |

Collaborative information Extraction and Mining from Multiple Web Documents | 442 |

Collaborative Document Clustering | 453 |

Cluster Description Formats Problems and Algorithms | 464 |

Bayesian KMeans as a MaximizationExpectation Algorithm | 474 |

Cone Cluster Labeling for Support Vector Clustering | 484 |

Exploring Prototypes for Classification | 176 |

A Semantic Approach for Mining Hidden Links from Complementary and Noninteractive | 200 |

Mining Frequent Agreement Subtrees in Phylogenetic Databases | 222 |

Trend Relational Analysis and GreyFuzzy Clustering Method | 234 |

The Connected kCenter Problem | 246 |

Weighted Clustering Ensembles | 258 |

Clustering in the Presence of BridgeNodes | 270 |

A TopDown Row Enumeration | 282 |

Mining Frequent Patterns by Differential Refinement of Clustered Bitmaps | 294 |

Discovery of Coevolvlng Spatial Event Sets | 306 |

Efficient Algorithms for Sequence Segmentation | 316 |

DensityBased Clustering over an Evolving Data Stream with Noise | 328 |

A Random Walks Method for Text Classification | 340 |

Efficient Mining of Temporally Annotated Sequences | 348 |

A Framework for Local Supervised Dimensionality Reduction of High Dimensional Data | 360 |

Segmentation and Dimensionality Reduction | 372 |

Probabilistic Multistate SplitMerge Algorithm for Coupling Parameter Estimates | 384 |

item Sets That Compress | 395 |

A New PrivacyPreserving Distributed kClustering Algorithm | 494 |

Detecting the Change of Clustering Structure in Categorical Data Streams | 504 |

Transductive Denoising and Dimensionality Reduction Using Total Bregmon Regression | 514 |

Fast Optimal Bandwidth Selection for Kernel Density Estimation | 524 |

On Approximate Solutions to Support Vector Machines | 534 |

inference of Node Replacement Recursive Graph Grammars | 544 |

Health Monitoring of a Shaft Transmission System via Hybrid Models of PCR and | 554 |

A Systematic CrossComparison of Sequence Classifiers | 564 |

GraphBased Methods f0r orbit Classification | 574 |

Profiling Protein Families from Partially Aligned Sequences | 584 |

A Novel Framework for Incorporating Labeled Examples into Anomaly Detection | 594 |

Using Compression to identity Classes of lnauthentic Texts | 604 |

Spatial Weighted Outlier Detection | 614 |

Mining Weighted Interesting Patterns with a Strong Weight andor Support Affinity | 624 |

Finding Sequential Patterns from Massive Number of Spatiotemporal Events | 634 |

645 | |

