dbscan
This is a pseudocode for DBSCAN from "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise".
And I have tried the pseudocode from the WIKI page. I prefer this.
ExpandCluster(SetOfPoints, Point, ClId, Eps, MinPts) : Boolean;
seeds:=SetOfPoints.regionQuery(Point,Eps);
IF seeds.size<MinPts THEN // no core point
SetOfPoint.changeClId(Point,NOISE);
RETURN False;
ELSE // all points in seeds are density-
// reachable from Point
SetOfPoints.changeClIds(seeds,ClId);
seeds.delete(Point);
WHILE seeds <> Empty DO
currentP := seeds.first();
result := SetOfPoints.regionQuery(currentP, Eps);
IF result.size >= MinPts THEN
FOR i FROM 1 TO result.size DO
resultP := result.get(i);
IF resultP.ClId IN {UNCLASSIFIED, NOISE} THEN
IF resultP.ClId = UNCLASSIFIED THEN
seeds.append(resultP);
END IF;
SetOfPoints.changeClId(resultP,ClId);
END IF; // UNCLASSIFIED or NOISE
END FOR;
END IF; // result.size >= MinPts
seeds.delete(currentP);
END WHILE; // seeds <> Empty
RETURN True;
END IF
END; // ExpandCluster
DBSCAN (SetOfPoints, Eps, MinPts)
// SetOfPoints is UNCLASSIFIED
ClusterId := nextId(NOISE);
FOR i FROM 1 TO SetOfPoints.size DO
Point := SetOfPoints.get(i);
IF Point.ClId = UNCLASSIFIED THEN
IF ExpandCluster(SetOfPoints, Point, ClusterId, Eps, MinPts) THEN
ClusterId := nextId(ClusterId)
END IF
END IF
END FOR
END; // DBSCAN
This can cluster the data based on density(or distance) consider noise, it's no complicated to implement.
I see that the most important thing that to get good result is to get good eps and minpts. and if your data is complicate the distance of two data point have so much influence on the result.
Good luck.
浙公网安备 33010602011771号