geometric median

The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions.

也就是说,中位数就是一个数组里到所有其他数据点的距离之和达到最小值的点。n维的也一样。

一维的中位数满足这个性质,证明的话可以用反证法。可以证明的到的是,中位数往左一点或者往右一点都会造成距离之和增加,所以中位数是到其他点的距离之和最小。

$Geometric Median =\underset{y \in \mathbb{R}^n}{\operatorname{arg\,min}} \sum_{i=1}^m \left \| x_i-y \right \|_2$

然后,问题来了。。。

Q:Given set of points in 2d grid space. Find a grid point such that sum of distance from all the points to this common point is minimum.

eg: p1: [0, 0] p2: [3, 0] p3: [0, 3]

ans: r: [0,0]

sum: 0 + 3 + 3 = 6

这题naive 方法就是$O(n^2)$,求出所有点到其他点的距离之和,再取最小。

 这里指的是曼哈顿距离。manhattan distance. 欧式距离不好求,网上人家直接用kmeans。。

参考:

  • http://stackoverflow.com/questions/12934213/how-to-find-out-geometric-median
  • http://stackoverflow.com/questions/12905663/given-list-of-2d-points-find-the-point-closest-to-all-other-points/12905913#12905913

对于曼哈顿距离,可以先通过预处理,算出在x轴上,每个点到其他x的值的距离之和,这个开销在O(nlgn+2*n)。y轴的同理。

现在我们就能够在O(1)得到所有点到其他点的距离之和(曼哈顿距离)。所以就能够在O(n)中求出最小值了。(最大值都行啊)

 1 bool compareByX(const Point &p1, const Point &p2) {
 2     return p1.x < p2.x;
 3 }
 4 
 5 bool compareByY(const Point &p1, const Point &p2) {
 6     return p1.y < p2.y;
 7 }
 8 
 9 int maxDistance(vector<Point> &points) {
10     if (points.empty()) return 0;
11     sort(points.begin(), points.end(), compareByX);
12     int n = points.size();
13     vector<int> xdistances(n, 0), ydistances(n, 0);
14     for (int i = 1; i < n; ++i) {
15         xdistances[i] = xdistances[i - 1] + i * (points[i].x - points[i - 1].x);
16     }
17     int right = 0;
18     for (int i = n - 2; i >= 0; --i) {
19         right = right + (n - i - 1) * (points[i + 1].x - points[i].x);
20         xdistances[i] += right;
21     }
22 
23     // preprocessing based on y
24     sort(points.begin(), points.end(), compareByY);
25     for (int i = 1; i < n; ++i) {
26         ydistances[i] = ydistances[i - 1] + i * (points[i].y - points[i - 1].y);
27     }
28 
29     int top = 0;
30     for (int i = n - 2; i >= 0; --i) {
31         top = top + (n - i - 1) * (points[i + 1].y - points[i].y);
32         ydistances[i] += top;
33     }
34 
35     int max = 0;
36     for (int i = 0; i < n; ++i) {
37         if (xdistances[i] + ydistances[i] > max) {
38             max = xdistances[i] + ydistances[i];
39         }
40     }
41     return max;
42 }

 q神好叼,给他mock interview的时候答出O(n)的。

posted @ 2014-10-11 19:33  linyx  阅读(712)  评论(0编辑  收藏  举报