Find Median from Data Stream

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Examples: 

[2,3,4] , the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) - Add a integer number from the data stream to the data structure.
  • double findMedian() - Return the median of all elements so far.

For example:

add(1)
add(2)
findMedian() -> 1.5
add(3) 
findMedian() -> 2

非常经典的题目,找数据流中的中位数。

这题在剑指offer上有非常详细的分析,从如何选择数据容器(操作复杂度出发)最终选择了最大堆和最小堆的组合。每次插入复杂度位O(logn),取中位数的复杂度为O(1).可以说是所有数据结构组合中比较优的情况。这题的难点在于插入。我们需要维护最大堆和最小堆的大小相差不超过1。所以插入数字时需要相间插入,前面插入了哪个堆,则当前插入另外一个堆。这样保证了数据的平衡。另外最大堆代表左半部分数字,最小堆代表右半部分数字,需要维持最小堆的所有数字都大于等于最大堆的数字。所以如果我们选择了在已有数字数目是偶数的情况下,插入最小堆,则我们可以先插入最大堆,将最大堆堆顶的元素放入最小堆。则可以一直维护最大堆和最小堆的相互关系。在已有数字数目为奇数,插入最大堆时是一样的。

代码如下:

class MedianFinder:
    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.minheap = []
        self.maxheap = []
        

    def addNum(self, num):
        """
        Adds a num into the data structure.
        :type num: int
        :rtype: void
        """
        if (len(self.maxheap) + len(self.minheap)) % 2: #already odd numbers in array,min heap
                heapq.heappush(self.minheap, num)
                heapq.heappush(self.maxheap, -1*heapq.heappop(self.minheap))
        else:
                heapq.heappush(self.maxheap, -1*num)
                #keep all the value in minheap larger than maxheap
                heapq.heappush(self.minheap, -1*heapq.heappop(self.maxheap))
    def findMedian(self):
        """
        Returns the median of current data stream
        :rtype: float
        """
        if (len(self.maxheap) + len(self.minheap)) % 2:
            return self.minheap[0]
        else:
            return (self.minheap[0] - self.maxheap[0])/2.0

 

posted on 2016-07-08 16:31  Sheryl Wang  阅读(224)  评论(0编辑  收藏  举报

导航