1、Data MiningFall 2006Chapter 8 Cluster AnalysisZhi-Hua ZhouDepartment of Computer Science otherwise exitHierarchical methodsa hierarchical method creates a hierarchical decomposition of the given set of data objectstwo schemes: agglomerative: bottom-up, starts with each object forming a separate grou
2、p divisive: top-down, starts with all the objects in the same clusterrepresentatives: AGNES (AGgglomerative NESting) DIANA (DIvisive ANAlysis) BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) CURE (Clustering Using REpresentatives) ROCK (RObust Clustering using linKs) CHAMELEONAG
3、NESAGgglomerative NESting an agglomerative methodStep1: every object is placed into a cluster of its ownStep2: merge the clusters according to the minimum Euclidean distance between the nearest objects in the clustersStep3: if arriving a “whole” cluster, exit; otherwise go to Step 2DIANADIvisive ANA
4、lysis a divisive methodStep1: all the objects are placed in one clusterStep2: split the clusters according to the maximum Euclidean distance between the nearest objects in the clustersStep3: if each cluster contains only one object, exit; otherwise go to Step 2Density-based methodsa density-based me
5、thod creates clusters by continuing growing a cluster so long as the density of the data objects in the neighborhood exceeds some thresholdrepresentatives: DBSCAN (Density-Based Spatial Clustering of Applications with Noise) OPTICS (Ordering Points To Identify the Clustering Structure) DENCLUE (DENs
6、ity-based CLUstEring) CLIQUE (CLustering In QUEst)basic idea: for each object of a cluster, the neighborhood of a given radius (called -neighborhood) has to contain at least a minimum number of objects (MinPts)key concepts: an object P whose -neighborhood containing no less than MinPts number of obj
7、ects is a core object with respect to and MinPts an object M is directly density-reachable from object P with respect to and MinPts if M is within the -neighborhood of P which contains at least a minimum number of points, MinPts an object Q is density-reachable from object P with respect to and MinP
8、tsif there is a chain of objects p1, , pn, p1= P and pn= Q, pi+1is directly density-reachable from pi with respect to and MinPts an object S is density-connected to object R with respect to and MinPts if there is an object O such that both S and R are density-reachable from O with respect to and Min
9、PtsDBSCANGrid-based methoda grid-based method quantizes the object space into a finite number of cells which form a grid structure, and then performs clustering operations on the grid structurerepresentatives: STING (STatistical INformation Grid) WaveCluster CLIQUE (CLustering In QUEst)STING the spa
10、tial area is divided into rectangular cells there are usually several levels of cells corresponding to different levels of resolution a cell at a high level is partitioned to form a number of cells at the next lower levelModel-based methoda model-based method hypothesizes a model for each of the clu
11、sters, and finds the best fit of the data to that modeltwo schemes: statistical method: uses probability measures neural network method: use competitive excitative and inhibitive mechanismsrepresentatives: COBWEB CLASSIT AutoClass Competitive learning SOM (Self-Organizing Maps)To master data mining,you should read moreand practice moreMore Game OverHope you enjoy the course