使用优先队列寻找中位数&&用stack 实现 queue && ADT 的应用

Next, Suppose we would like to invent a new ADT called MedianFinder which is a collection of integers and supports finding the median of the collection.

MedianFinder

add(x); // adds x to the collection of numbers

median(); // returns the median from a collection of numbers Describe

how you could implement this ADT by using existing Java ADTs as building blocks. What’s the most efficient implementation you can come up with?

import java.util.Comparator;
import java.util.PriorityQueue;

public class MedianFinder {
    private PriorityQueue<Integer> smallerHalf;
    private PriorityQueue<Integer> largerHalf;

    public MedianFinder() {
        smallerHalf = new PriorityQueue<>(Comparator.reverseOrder());
        largerHalf = new PriorityQueue<>();
    }

    public void add(int x) {
        if (smallerHalf.isEmpty() || x <= smallerHalf.peek()) {
            smallerHalf.add(x);
        } else {
            largerHalf.add(x);
        }
        balanceHeaps();
    }

    private void balanceHeaps() {
        if (smallerHalf.size() > largerHalf.size() + 1) {
            largerHalf.add(smallerHalf.poll());
        } else if (largerHalf.size() > smallerHalf.size()) {
            smallerHalf.add(largerHalf.poll());
        }
    }

    public double median() {
        if (smallerHalf.size() == largerHalf.size()) {
            return (smallerHalf.peek() + largerHalf.peek()) / 2.0;
        } else {
            return smallerHalf.peek();
        }
    }

    public static void main(String[] args) {
        MedianFinder test = new MedianFinder();
        test.add(8);
        test.add(7);
        test.add(6);
        test.add(5);
        test.add(4);
        test.add(3);
        test.add(2);
        double median = test.median();
        System.out.println(median);
    }
}

这段代码实现了一个名为 `MedianFinder` 的类，用于找到一系列整数的中位数。在这个类中，我们使用了两个优先队列（`PriorityQueue`），`smallerHalf` 和 `largerHalf` 分别存储较小的一半和较大的一半数字。优先队列可以很方便地获取、添加和删除元素。

1. `public MedianFinder()`: 构造函数，初始化两个优先队列。`smallerHalf` 为大顶堆（通过使用 `Comparator.reverseOrder()` 反转默认的小顶堆），存储较小的一半数字；`largerHalf` 为小顶堆，存储较大的一半数字。

2. `public void add(int x)`: 向集合中添加一个整数 `x`。根据 `x` 和 `smallerHalf` 堆顶元素的大小关系，将 `x` 添加到适当的堆中。然后调用 `balanceHeaps()` 方法来平衡两个堆。

3. `private void balanceHeaps()`: 平衡两个堆的大小。确保两个堆的大小之差不超过 1，这样中位数可以在常数时间内找到。如果 `smallerHalf` 的大小比 `largerHalf` 大 2 个或以上，那么将 `smallerHalf` 的堆顶元素移动到 `largerHalf`。反之，如果 `largerHalf` 的大小比 `smallerHalf` 大，那么将 `largerHalf` 的堆顶元素移动到 `smallerHalf`。

4. `public double median()`: 计算并返回中位数。如果两个堆的大小相同，那么中位数是两个堆顶元素的平均值；如果 `smallerHalf` 的大小比 `largerHalf` 大 1，那么中位数就是 `smallerHalf` 的堆顶元素。

5. `public static void main(String[] args)`: 主方法，用于测试 `MedianFinder` 类的功能。向 `MedianFinder` 对象中添加一些整数，然后计算并打印中位数。

这个实现利用了数据结构优先队列（`PriorityQueue`），可以在对数时间内向集合中添加整数，并在常数时间内找到中位数。这种方法在处理大量数据时非常高效。

下面是一个示例，展示了在类的成员变量中创建比较器的方式：

import java.util.Comparator;
import java.util.PriorityQueue;

public class CustomPriorityQueueExample {
    private Comparator<String> lengthComparator; // 类的成员变量

    public CustomPriorityQueueExample() {
        lengthComparator = new LengthComparator(); // 在构造函数中创建比较器实例
    }

    public void processElements() {
        PriorityQueue<String> pq = new PriorityQueue<>(lengthComparator);

        pq.add("Java");
        pq.add("Python");
        pq.add("C");
        pq.add("JavaScript");

        while (!pq.isEmpty()) {
            String element = pq.poll();
            System.out.println(element);
        }
    }

    private class LengthComparator implements Comparator<String> {
        @Override
        public int compare(String s1, String s2) {
            return Integer.compare(s1.length(), s2.length());
        }
    }

    public static void main(String[] args) {
        CustomPriorityQueueExample example = new CustomPriorityQueueExample();
        example.processElements();
    }
}

在这个示例中，我们在类的成员变量中创建了一个比较器`lengthComparator`。这个比较器用于比较字符串的长度。

在构造函数`CustomPriorityQueueExample()`中，我们实例化了`lengthComparator`比较器。然后，在`processElements()`方法中，我们创建了一个新的`PriorityQueue`实例，并将比较器传递给它。

在私有内部类`LengthComparator`中，我们实现了`Comparator<String>`接口，并覆盖了`compare()`方法来定义字符串之间的比较逻辑。

通过将比较器作为类的成员变量，我们可以在需要时重复使用相同的比较器实例，而无需每次都创建一个新的比较器。

在`main()`方法中，我们创建了一个`CustomPriorityQueueExample`实例，并调用`processElements()`方法来演示使用自定义比较器的`PriorityQueue`。

这个示例展示了如何在类的成员变量中创建比较器，以提供可重用的比较器逻辑，并在需要时使用该比较器来排序元素。

import java.util.Stack;
public class QueueExample<T>{
    //stack last in first out
    //queue first in first out, finish push and poll methods
    //E poll(): 移除并返回队列的头部元素，如果队列为空，则返回null。
    //push(e) which adds an element,
    //E pop(): 移除并返回栈顶的元素。
    private Stack<T> inbox;
    private Stack<T> outbox;
    
    public QueueExample() {
        inbox = new Stack<T>();
        outbox = new Stack<T>();
    }

    public T poll() {
        if(outbox.isEmpty()) {
            while(!inbox.isEmpty()) {
                outbox.push(inbox.pop());
        }
    }
    return outbox.pop();
}


    public void push(T item) {
        inbox.push(item);    
    }

    public boolean isEmpty() {
        return inbox.isEmpty() && outbox.isEmpty(); 
    }
}

判断给定整数数组A中是否存在两个数的和等于给定的目标值k。

import java.util.HashSet;

public class SumPairChecker {
    public static boolean hasSumPair(int[] A, int k) {
        HashSet<Integer> numSet = new HashSet<>();
        for (int num : A) {
            int target = k - num;
            if (numSet.contains(target)) {
                return true;
            }
            numSet.add(num);
        }
        return false;
    }

    public static void main(String[] args) {
        int[] array = {1, 2, 3, 4, 5};
        int target = 7;
        boolean hasPair = hasSumPair(array, target);
        System.out.println(hasPair);
    }
}

Find the k most common words in a document. Assume that you can represent this as an array of Strings, where each word is an element in the array. You might find using multiple data structures useful.

import java.util.*;

public List<String> findKMostCommonWords(String[] words, int k) {
    Map<String, Integer> wordFrequency = new HashMap<>();
    for (String word : words) {
        wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 1);
    }

    PriorityQueue<Map.Entry<String, Integer>> minHeap =
            new PriorityQueue<>(Comparator.comparingInt(Map.Entry::getValue));

    for (Map.Entry<String, Integer> entry : wordFrequency.entrySet()) {
        minHeap.offer(entry);
        if (minHeap.size() > k) {
            minHeap.poll();
        }
    }

    List<String> result = new ArrayList<>();
    while (!minHeap.isEmpty()) {
        result.add(0, minHeap.poll().getKey());
    }

    return result;
}

`PriorityQueue<Map.Entry<String, Integer>> minHeap` 表示创建了一个优先队列（最小堆），其中元素的类型为 `Map.Entry<String, Integer>`。

`Map.Entry` 是一个表示键值对的接口，在这种情况下，键是字符串（String），值是整数（Integer）。因此，`Map.Entry<String, Integer>` 表示一个键为字符串、值为整数的键值对。

`PriorityQueue` 是Java中的一个队列实现，它可以根据元素的优先级进行排序。在这种情况下，我们创建了一个最小堆，即优先级最低的元素位于队列的前面。

`minHeap` 是我们创建的优先队列的实例，用于存储键值对，其中键是单词（字符串），值是该单词在文档中的频率（整数）。通过指定 `Comparator.comparingInt(Map.Entry::getValue)` 作为优先队列的比较器，我们基于频率对键值对进行升序排序，以便在堆的顶部保留频率最低的元素。

这样，我们可以使用 `minHeap` 来跟踪文档中频率最低的k个单词。

这段代码实现了在给定的单词数组中找到出现频率最高的k个单词的功能。下面对代码进行逐行分析：

1. 创建一个空的 `wordFrequency` HashMap，用于存储单词及其出现的频率。
2. 遍历输入的 `words` 数组：
- 如果 `word` 已经存在于 `wordFrequency` 中，使用 `getOrDefault` 方法获取其当前频率并将其加1。
- 如果 `word` 不存在于 `wordFrequency` 中，将其添加到 `wordFrequency` 中，并将其初始频率设置为1。
3. 创建一个优先队列 `minHeap`，用于存储键值对 `Map.Entry<String, Integer>`，根据频率进行升序排序。
4. 遍历 `wordFrequency` 的每个键值对：
- 将当前键值对 `entry` 添加到 `minHeap` 中。
- 如果 `minHeap` 的大小超过了k，即队列中的元素个数大于k，则通过 `poll` 方法移除优先队列的顶部元素，即最小频率的元素。
5. 创建一个空的 `result` 列表，用于存储最终的结果。
6. 当 `minHeap` 非空时，重复以下步骤：
- 使用 `poll` 方法从 `minHeap` 中取出一个元素（频率最低的键值对）。
- 将该元素的键（即单词）添加到 `result` 列表的开头，以便保持最终结果的频率降序排序。
7. 返回 `result` 列表，其中包含了出现频率最高的k个单词，按频率降序排列。

总体来说，该代码使用HashMap来计算每个单词的频率，并使用优先队列（最小堆）来跟踪频率最低的k个单词。它通过一次遍历和两次堆操作，以时间复杂度为O(n log k)的效率找到了结果。

for (Map.Entry<String, Integer> entry : wordFrequency.entrySet()) {
        minHeap.offer(entry);
        if (minHeap.size() > k) {
            minHeap.poll();
        }
    }

这段代码是在遍历 `wordFrequency` 中的每个键值对 `entry`，将其添加到 `minHeap` 优先队列中，并确保队列的大小不超过k。

具体的执行过程如下：

1. 对于 `wordFrequency` 中的每个键值对 `entry`，执行以下操作：
2. 使用 `minHeap.offer(entry)` 将当前键值对 `entry` 添加到优先队列 `minHeap` 中。
3. 接着，检查优先队列的大小是否超过了k（`minHeap.size() > k`）。
- 如果队列大小超过k，说明当前队列中保存的元素个数已经超过了k个，需要将频率最低的元素移除。
- 使用 `minHeap.poll()` 方法移除优先队列的顶部元素，即频率最低的键值对。
- 这样，优先队列中始终保留着频率最高的k个键值对，而频率较低的键值对会被移除。
- 这样做的目的是确保在遍历完 `wordFrequency` 的所有键值对后，优先队列中只剩下频率最高的k个单词。
4. 重复执行步骤2和步骤3，直到遍历完 `wordFrequency` 中的所有键值对。

通过这段代码的执行，我们可以得到一个优先队列 `minHeap`，其中保存了频率最高的k个单词的键值对，按照频率的升序进行排序。这样，我们可以通过从队列顶部开始，依次取出元素，得到频率降序排列的最常见的k个单词。

 public static void topFivePopularWords(String[] words, int k) {
    Map<String, Integer> counts = new HashMap<>();
    for (String word : words) {
    if (!counts.containsKey(word)) {
        counts.put(word, 1);
    } else {
        counts.put(word, counts.get(word) + 1);
    }
    }
    PriorityQueue<String> pq = new PriorityQueue<>(new Comparator<String>() {
        @Override
        public int compare(String a, String b) {
        return counts.get(b) - counts.get(a);
    }
    });
    for (String word : counts.keySet()) {
        pq.add(word);
    }
    for (int i = 0; i < k; i++) {
        System.out.println(pq.poll());
    }
}

posted @ 2023-05-15 21:49 哎呦_不想学习哟~ 阅读(46) 评论(0) 收藏举报

刷新页面返回顶部

xuenima

使用优先队列寻找中位数&&用stack 实现 queue && ADT 的应用

公告