算法第二章上机实践报告

1. Question / 实践题目

2. Analysis / 问题描述
The task is to find the median of the union of two sequences in a non-descending order with the same length, with a \(O(\log N)\) time complexity.

The inputs contain 3 lines. The length \(N\) (\(0<N\leq100000\)) of the sequences on the first line, two \(N\)-length non-descending sequences on the second line and the three line respectively, with their elements of int type separated by spaces.

The output is the median.

And here's how we can solve the problem.

Without the requirement of the \(O(\log N)\) time complexity, the easiest way to solve the problem is to merge two linear lists together, and calculate their median with the equation \(\lfloor\frac{N+1}{2}\rfloor\).

Taking the time complexity \(O(\log N)\) into account, we have to find another way. For the requirement to find the MEDIAN, that is, the MIDDLE element of the union, and the \(O(\log N)\) time complexity, the binary search which is also about the MIDDLE and the \(\log N\) is likely to be a way to figure out the task.

Different from the original binary search, which processes a sequence each time, we could process the two sequences together in each recursion, to meet the \(\log N\) requirement.

3. Algorithm / 算法描述
Assume that the first sequence is seqA and the second seqB. Using the binary search, the key algorithm to find the median may be:

Calculate the median of seqA (medianA)
Calculate the median of seqB (medianB)
if the first element is greater than the last element in two sequences respectively
  return the smaller one in the first elements
else
  if medianA == medianB
    the median is medianA (or medianB)
  else if the sequence has a greater median
    take its left part (including the median) as the new sequence
  else if the sequence has a smaller median
    take its right part (including the median) as the new sequence

Concretely, let's take sample 1 for an example.

The original state of the two sequences of sample 1 is:

seqA: 1 3 5 7 9
seqB: 2 3 4 5 6

And the medianA is \(5\), the medianB is \(4\). Since \(5 > 4\), we take the left part of seqA and the right part of the other's.

seqA: [1 3 5] 7 9 (median: 5)
seqB: 2 3 [4 5 6] (median: 4)

↓

seqA_1: 1 3 5
seqB_1: 4 5 6

Then calculate two medians again and use the same strategy, we get

seqA_1: 1 [3 5] (median: 3)
seqB_1: [4 5] 6 (median: 5)

↓

seqA_2: 3 5
seqB_2: 4 5

And then

seqA_2: [3 5] (median: 3)
seqB_2: [4] 5 (median: 4)

↓

seqA_3: 3 5
seqB_3: 4

In the step that seqA_2, seqB_2 -> seqA_3, seqB_3, we notice that the resulting sequences don't have the same length. The strategy above works well for sequences with an odd number length, but that's not the case for sequences with an even number length. To avoid this problem, we can omit the smaller median of the original sequence, which won't affect the result. And instead the subsequences will be like:

seqA_3: 5
seqB_3: 4

So far, for seqA_3:

3 (the first element) == 3 (the last element)

and for seqB_3:

4 (the first element) == 4 (the last element)

Thus the smaller element in {4, 5}, that is, 4, is returned as the median.

But why should we pick the smaller one as the median? Here's the explanation.

To calculate the median of the union of two 1-length sequences, we should merely take the smaller element from the sequences, since the union of the two 1-length sequences will be like \(\min\{a, b\}, \max\{a, b\}\), and the median, which is the first element here, is \(\min\{a, b\}\). Generally speaking, the median of a 2-length non-descending sequence is its first element, that is, the element on the left.

Concretely, for the example above, the union of the two sequences will be

4 5

According to the method, we pick the smaller one, 4, as the median. Let's see if it's correct. Calculating the median with the equation \(\lfloor\frac{N+1}{2}\rfloor\), that is, \(\lfloor\frac{1 + 1}{2}\rfloor\), we can get the median 1, which is actually the smaller one of the two numbers.

From the process above we see omitting the smaller median of the original sequence really does not affect the result. Let's find out the reason.

Let's assume that we have 2 non-descending sequences with the length of 2N:

a_1 a_2 ... a_N ... a_2N
b_1 b_2 ... b_N ... b_2N

Let's say a_N < b_N and N > 2. According to the strategy introduced above:

a_1 a_2 ... [a_N ... a_2N] (length of elements in the braces: N + 1)
[b_1 b_2 ... b_N] ... b_2N (length of elements in the braces: N)

↓

a_N a_N+1 ... a_2N (length: N+1)
      b_1 ... b_N  (length: N)

Then the next step is to calculate the medians of the two sequences. Since N > 2 here, no matter whether we omit a_N or not, we can get the same median of the first sequence, which won't be the first element a_N.

Then let's consider the condition that N = 1. Now the sequences will be:

a_1 a_2
b_1 b_2

Since a_1 < b_1,

[a_1 a_2]
[b_1] b_2

↓

a_1 a_2
b_1

And since

a_1 < a_2
a_1 < b_1

The union will be:

a_1 a_2 / b_1 b_1 / a_2

Therefore, a_1 won't be the median, no matter the values of a_2 and b_1, thus can be omitted.

Let's gain a deeper understanding by exploring the second sample.

The original sequences are:

seqA: -100 -10 1 1 1 1 (median: 1)
seqB: -50    0 2 3 4 5 (median: 2)

And after the first recursion, the two subsequences will be like:

seqA_1: 1 1 1 1
seqB_1: -50 0 2

Since the length of the sequences are 2 now, we should omit the smaller median of the two sequences, which is 1 here.

seqA_1:   1 1 1
seqB_1: -50 0 2

Since the length of the two sequences are the same now, we can continue the process.

seqA_1:  [1 1] 1  (median: 1)
seqB_1: -50 [0 2] (median: 0)

↓

seqA_2:  [1 1]
seqB_2:  0 [2]

↓  (omit the smaller median)

seqA_3:  1
seqB_3:  2

Therefore, the median is the smaller one between {1, 2}, that is, 1.

With discussions above, our algorithm now will be:

Calculate the median of seqA (medianA)
Calculate the median of seqB (medianB)
if the first element is greater than the last element in two sequences respectively
  return the smaller one in the first elements
else
  if medianA == medianB
    the median is medianA (or medianB)
  else if the sequence has a greater median
    take its left part (including the median) as the new sequence
  else if the sequence has a smaller median
    if the sequence has an odd length
      take its right part (including the median) as the new sequence
    else if the sequence has an even length
      take its right part (excluding the median) as the new sequence

And below is the implementation of the algorithm written in c++.

The recursion part:

int biSearchMedian(int *a, int aLeft, int aRight, int *b, int bLeft, int bRight)
{
	// calculate the 2 medians of the 2 sequences
	int medianA = (aRight + aLeft) / 2;
	int medianB = (bRight + bLeft) / 2;

	if (a[aLeft] >= a[aRight] && b[bLeft] >= b[bRight])  
		return a[aLeft] < b[bLeft] ? a[aLeft] : b[bLeft];
	else
	{
		// find the median recursively
		if (a[medianA] == b[medianB])
			return a[medianA];
		else if (a[medianA] < b[medianB])
		{
			if ((aRight - aLeft + 1) % 2 == 1)  // an odd length
				return biSearchMedian(a, medianA, aRight, b, bLeft, medianB);
			else  // an even length
				return biSearchMedian(a, medianA + 1, aRight, b, bLeft, medianB);
		}
		else
		{
			if ((bRight - bLeft + 1) % 2 == 1)  // an odd length
				return biSearchMedian(a, aLeft, medianA, b, medianB, bRight);
			else  // an even length
				return biSearchMedian(a, aLeft, medianA, b, medianB + 1, bRight);
		}
	}
}

And the completed code:

#include<iostream>
using namespace std;

int biSearchMedian(int *a, int aLeft, int aRight, int *b, int bLeft, int bRight)
{
	// calculate the 2 medians of the 2 sequences
	int medianA = (aRight + aLeft) / 2;
	int medianB = (bRight + bLeft) / 2;

	if (a[aLeft] >= a[aRight] && b[bLeft] >= b[bRight])  
		return a[aLeft] < b[bLeft] ? a[aLeft] : b[bLeft];
	else
	{
		// find the median recursively
		if (a[medianA] == b[medianB])
			return a[medianA];
		else if (a[medianA] < b[medianB])
		{
			if ((aRight - aLeft + 1) % 2 == 1)  // an odd length
				return biSearchMedian(a, medianA, aRight, b, bLeft, medianB);
			else  // an even length
				return biSearchMedian(a, medianA + 1, aRight, b, bLeft, medianB);
		}
		else
		{
			if ((bRight - bLeft + 1) % 2 == 1)  // an odd length
				return biSearchMedian(a, aLeft, medianA, b, medianB, bRight);
			else  // an even length
				return biSearchMedian(a, aLeft, medianA, b, medianB + 1, bRight);
		}
	}
}

int a[100000];
int b[100000];
int main(void)
{
	int n;  // the length
	cin >> n;

	// receive two sequences
	for (int i = 0; i < n; i++)
		cin >> a[i];
	for (int i = 0; i < n; i++)
		cin >> b[i];

	// calculate the median using the binary search algorithm
	int median = biSearchMedian(
		a, 0, n - 1,
		b, 0, n - 1);

	// output the median
	cout << median;

	return 0;
}

4. Another solution / 另一个方法
When the length of the two sequences is even, there is another solution to handle the problem . Let's say the solution we talked about is the solution A, and this new one the solution B. Instead of omitting the median of the longer subsequence, this solution adds one more element to the shorter subsequence, which is beside the median in the original sequence. However, it is not efficient, for that it needs more steps to find the median and leads to a special case where the length of the sequences is 1.

Take the second sample as an example. The original sequences are:

seqA: -100 -10 1 1 1 1
seqB: -50    0 2 3 4 5

And according to the strategy discussed above

seqA: -100 -10 [1 1 1 1] (median: 1)
seqB: [-50    0 2] 3 4 5 (median: 2)

↓

seqA_1: 1 1 1 1
seqB_1: -50 0 2

And the solution B here adds 3 to the second subsequence:

seqA_1:   1 1 1 1
seqB_1: -50 0 2 3

But the problem of this solution is that the shortest length of the resulting sequences is 2. Since when the length of the sequnces is 2, after we calculate the new medians and split the sequences, there will be one subsequence with a length of 1, and the other a length of 2. Concretely, if we continue the process of the second sample with the solution B:

seqA_1:   [1 1] 1 1
seqB_1: -50 [0 2 3]

↓

seqA_2: 1 1
seqB_2: 0 2 3

According to the strategy of the solution B, we add one more 1 to the first sequence:

seqA_2: 1 1 1
seqB_2: 0 2 3

And continue:

seqA_2: 1 [1 1]
seqB_2: [0 2] 3

↓

seqA_3: [1] 1
seqB_3: [0 2]

And the length of the sequences will remain 2. Since \(0 < 1\) and [0, 2] will be the subsequence of the second sequence and [1] will be the subsequence of the first sequence. And according to the solution B, a 1 will be added to the first subsequence, leading to the length of the sequences unchanged. Thus, the recursion will be finite and a stack overflow error will be raised.

To avoid this problem, the solution B has to stop the recursion when the length of the sequences is 2, and finds out the median in 4 elements. Here's the code of the solution B written in c++:

int biSearchMedian(int *a, int aLeft, int aRight, int *b, int bLeft, int bRight)
{
	// calculate the 2 medians of the 2 sequences
	int medianA = (aRight + aLeft) / 2;
	int medianB = (bRight + bLeft) / 2;

	if (aRight - aLeft == 1 || bRight - bLeft == 1)  // notice the condition
	{
    // notice the solution to the 2-length sequences

		// find the median directly if the length of the subsequences is 2
		if (a[aLeft] < b[bLeft])
			return a[aRight] < b[bLeft] ? a[aRight] : b[bLeft];
		else
			return b[bRight] < a[aLeft] ? b[bRight] : a[aLeft];
	}
	else
	{
		// find the median recursively
		if (a[medianA] == b[medianB])
			return a[medianA];
		else if (a[medianA] < b[medianB])
		{
			if ((aRight - aLeft + 1) % 2 == 1)  // an odd length
				return biSearchMedian(a, medianA, aRight, b, bLeft, medianB);
			else  // an even length
				return biSearchMedian(a, medianA, aRight, b, bLeft, medianB + 1);
		}
		else
		{
			if ((bRight - bLeft + 1) % 2 == 1)  // an odd length
				return biSearchMedian(a, aLeft, medianA, b, medianB, bRight);
			else  // an even length
				return biSearchMedian(a, aLeft, medianA + 1, b, medianB, bRight);
		}
	}
}

Also, the solution B has to handle a case where the length of the input sequences is 1:

if (n == 1)
	{
		int median = a[0] < b[0] ? a[0] : b[0];  // take the smaller one as the median
		cout << median;

		return 0;
	}

Since it shouldn't to be part of the recursion, it is put in the main function.

Therefore, we conclude that the solution B is more complicated than the solution A we discussed above. The different processes of these two solutions are mainly caused by their different strategies when encountering sequences with an even length. Solution A omits the median in the longer subsequence (i.e. the smaller median) and solution B increases the length of the shorter subsequence.

5. T(n) and S(n) / 算法时间及空间复杂度分析(要有分析过程)
Like the binary search, the time complexity of the algorithm is \(O(\log N)\) and space complexity \(O(1)\).

Time complexity:
Divide: Split the sequences into 2 halves, thus \(O(1)\).
Conquer: The same as the binary search, thus \(T(\frac{N}{2})\).
Merge: Not needed here.
Thus

\[\begin{align} T(N) &= T_{conquer} + T_{divide + merge} \\ &= T(\frac{N}{2}) + O(1) \\ &= O(N^{\log1}) + O(1) \\ &= O(N^0) + O(N^0) \\ &= O(N^0 \log N) \\ &= O(\log N) \end{align} \]

Space complexity:
Since the algorithm does not use extra space, the space complexity

\[\begin{align} S(N) &= O(1) \end{align} \]

The same as the binary search algorithm

6. Experience / 心得体会(对本次实践收获及疑惑进行总结)
Originally we concatenated two sequences together and calculated the median with the equation \(\lfloor\frac{N+1}{2}\rfloor\). However, this solution has a time complexity of \(O(N)\), which is greater than the requirement.

And from the discussion of the solution B, we see that some subtle things may make the algorithm better or worse. Boundary conditions worth to be noticed.

Including the task discussed here, I've encountered many questions whose solution is the binary search. When there's info about half, middle, median and \(\log N\) in the question, we can try using the binary search algorithm, which is a good example of the divide-and-conquer strategy.

posted @ 2019-09-22 01:28  Sola~  阅读(157)  评论(0)    收藏  举报