480. Sliding Window Median

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.
Examples: 
[2,3,4] , the median is 3
[2,3], the median is (2 + 3) / 2 = 2.5
Given an array nums, there is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position. Your job is to output the median array for each window in the original array.
For example,
Given nums = [1,3,-1,-3,5,3,6,7], and k = 3.
Window position                Median
---------------               -----
[1  3  -1] -3  5  3  6  7       1
 1 [3  -1  -3] 5  3  6  7       -1
 1  3 [-1  -3  5] 3  6  7       -1
 1  3  -1 [-3  5  3] 6  7       3
 1  3  -1  -3 [5  3  6] 7       5
 1  3  -1  -3  5 [3  6  7]      6
Therefore, return the median sliding window as [1,-1,-1,3,5,6].
Note: 
You may assume k is always valid, ie: k is always smaller than input array's size for non-empty array.


https://leetcode.com/problems/sliding-window-median/discuss/96348/Java-solution-using-two-PriorityQueues

Almost the same idea of Find Median from Data Stream https://leetcode.com/problems/find-median-from-data-stream/

1. Use two Heaps to store numbers. maxHeap for numbers smaller than current median, minHeap for numbers bigger than and equal to current median. A small trick I used is always make size of minHeap equal (when there are even numbers) or 1 element more (when there are odd numbers) than the size of maxHeap. Then it will become very easy to calculate current median.
2. Keep adding number from the right side of the sliding window and remove number from left side of the sliding window. And keep adding current median to the result.

 

Approach #2 Two Heaps! (Lazy Removal) [Accepted]

Intuition

The idea is the same as Approach #3 from 295. Find Median From Data Stream. The only additional requirement is removing the outgoing elements from the window.

Since the window elements are stored in heaps, deleting elements that are not at the top of the heaps is a pain.

Some languages (like Java) provide implementations of the PriorityQueue class that allow for removing arbitrarily placed elements. Generally, using such features is not efficient nor is their portability assured.

Assuming that only the tops of heaps (and by extension the PriorityQueue class) are accessible, we need to find a way to efficiently invalidate and remove elements that are moving out of the sliding window.

At this point, an important thing to notice is the fact that if the two heaps are balanced, only the top of the heaps are actually needed to find the medians. This means that as long as we can somehow keep the heaps balanced, we could also keep some extraneous elements.

Thus, we can use hash-tables to keep track of invalidated elements. Once they reach the heap tops, we remove them from the heaps. This is the lazy removal technique.

An immediate challenge at this point is balancing the heaps while keeping extraneous elements. This is done by actually moving some elements to the heap which has extraneous elements, from the other heap. This cancels out the effect of having extraneous elements and maintains the invariant that the heaps are balanced.

NOTE: When we talk about keeping the heaps balanced, we are not referring to the actual heap sizes. We are only concerned with valid elements and hence when we talk about balancing heaps, we are referring to count of such elements.

Algorithm

  • Two priority queues:

    1. A max-heap lo to store the smaller half of the numbers
    2. A min-heap hi to store the larger half of the numbers
  • A hash-map or hash-table hash_table for keeping track of invalid numbers. It holds the count of the occurrences of all such numbers that have been invalidated and yet remain in the heaps.

  • The max-heap lo is allowed to store, at worst, one more element more than the min-heap hi. Hence if we have processed kk elements:

    • If k = 2*n + 1 \quad (\forall \, n \in \mathbb{Z})k=2n+1(nZ), then lo is allowed to hold n+1n+1 elements, while hi can hold nnelements.
    • If k = 2*n \quad (\forall \, n \in \mathbb{Z})k=2n(nZ), then both heaps are balanced and hold nn elements each.

    This gives us the nice property that when the heaps are perfectly balanced, the median can be derived from the tops of both heaps. Otherwise, the top of the max-heap lo holds the legitimate median.

NOTE: As mentioned before, when we are talking about keeping the heaps balanced, the actual sizes of the heaps are irrelevant. Only the count of valid elements in both heaps matter.

  • Keep a balance factor. It indicates three situations:

    • balance = 0=0: Both heaps are balanced or nearly balanced.
    • balance &lt; 0<0: lo needs more valid elements. Elements from hi are moved to lo.
    • balance &gt; 0>0: hi needs more valid elements. Elements from lo are moved to hi.
  • Inserting an incoming number in_num:

    • If in_num is less than or equal to the top element of lo, then it can be inserted to lo. However this unbalances hi (hi has lesser valid elements now). Hence balance is incremented.

    • Otherwise, in_num must be added to hi. Obviously, now lo is unbalanced. Hence balance is decremented.

  • Lazy removal of an outgoing number out_num:

    • If out_num is present in lo, then invalidating this occurrence will unbalance lo itself. Hence balance must be decremented.
    • If out_num is present in hi, then invalidating this occurrence will unbalance hi itself. Hence balance must be incremented.

    • We increment the count of this element in the hash_table table.

    • Once an invalid element reaches either of the heap tops, we remove them and decrement their counts in the hash_table table.
    • c++ code 
    • vector<double> medianSlidingWindow(vector<int>& nums, int k)
      {
          vector<double> medians;
          unordered_map<int, int> hash_table;
          priority_queue<int> lo;                                 // max heap
          priority_queue<int, vector<int>, greater<int> > hi;     // min heap
      
          int i = 0;      // index of current incoming element being processed
      
          // initialize the heaps
          while (i < k)
              lo.push(nums[i++]);
          for (int j = 0; j < k / 2; j++) {
              hi.push(lo.top());
              lo.pop();
          }
      
          while (true) {
              // get median of current window
              medians.push_back(k & 1 ? lo.top() : ((double)lo.top() + (double)hi.top()) * 0.5);
      
              if (i >= nums.size())
                  break;                          // break if all elements processed
      
              int out_num = nums[i - k],          // outgoing element
                  in_num = nums[i++],             // incoming element
                  balance = 0;                    // balance factor
      
              // number `out_num` exits window
              balance += (out_num <= lo.top() ? -1 : 1);
              hash_table[out_num]++;
      
              // number `in_num` enters window
              if (!lo.empty() && in_num <= lo.top()) {
                  balance++;
                  lo.push(in_num);
              }
              else {
                  balance--;
                  hi.push(in_num);
              }
      
              // re-balance heaps
              if (balance < 0) {                  // `lo` needs more valid elements
                  lo.push(hi.top());
                  hi.pop();
                  balance++;
              }
              if (balance > 0) {                  // `hi` needs more valid elements
                  hi.push(lo.top());
                  lo.pop();
                  balance--;
              }
      
              // remove invalid numbers that should be discarded from heap tops
              while (hash_table[lo.top()]) {
                  hash_table[lo.top()]--;
                  lo.pop();
              }
              while (!hi.empty() && hash_table[hi.top()]) {
                  hash_table[hi.top()]--;
                  hi.pop();
              }
          }
      
          return medians;
      }

       

    • java using two pq, remove() takes O(n)
      
      Almost the same idea of Find Median from Data Stream https://leetcode.com/problems/find-median-from-data-stream/
      
      Use two Heaps to store numbers. maxHeap for numbers smaller than current median, minHeap for numbers bigger than and equal to current median. A small trick I used is always make size of minHeap equal (when there are even numbers) or 1 element more (when there are odd numbers) than the size of maxHeap. Then it will become very easy to calculate current median.
      Keep adding number from the right side of the sliding window and remove number from left side of the sliding window. And keep adding current median to the result.
      public class Solution {
          PriorityQueue<Integer> minHeap = new PriorityQueue<Integer>();
          PriorityQueue<Integer> maxHeap = new PriorityQueue<Integer>(
              new Comparator<Integer>() {
                  public int compare(Integer i1, Integer i2) {
                      return i2.compareTo(i1);
                  }
              }
          );
          
          public double[] medianSlidingWindow(int[] nums, int k) {
              int n = nums.length - k + 1;
          if (n <= 0) return new double[0];
              double[] result = new double[n];
              
              for (int i = 0; i <= nums.length; i++) {
                  if (i >= k) {
                  result[i - k] = getMedian();
                  remove(nums[i - k]);
                  }
                  if (i < nums.length) {
                  add(nums[i]);
                  }
              }
              
              return result;
          }
          
          private void add(int num) {
          if (num < getMedian()) {
              maxHeap.add(num);
          }
          else {
              minHeap.add(num);
          }
          if (maxHeap.size() > minHeap.size()) {
                  minHeap.add(maxHeap.poll());
          }
              if (minHeap.size() - maxHeap.size() > 1) {
                  maxHeap.add(minHeap.poll());
              }
          }
          
          private void remove(int num) {
          if (num < getMedian()) {
              maxHeap.remove(num);
          }
          else {
              minHeap.remove(num);
          }
          if (maxHeap.size() > minHeap.size()) {
                  minHeap.add(maxHeap.poll());
          }
              if (minHeap.size() - maxHeap.size() > 1) {
                  maxHeap.add(minHeap.poll());
              }
          }
          
          private double getMedian() {
          if (maxHeap.isEmpty() && minHeap.isEmpty()) return 0;
              
          if (maxHeap.size() == minHeap.size()) {
              return ((double)maxHeap.peek() + (double)minHeap.peek()) / 2.0;
          }
          else {
                  return (double)minHeap.peek();
          }
          }
      }

       

posted on 2018-11-08 02:19  猪猪&#128055;  阅读(182)  评论(0)    收藏  举报

导航