Smaller and Easier - Notes of Divide-and-Conquer


1. Smaller and Easier

The divide-and-conquer strategy is helpful when the original task is complicated to be solved, but its sub-problems are easier, with the same or similar solution to the original task. The strategy repeatedly divides the task into some sub-tasks with a smaller size, and finally a sub-task that is small enough to be solved easily. When the small-enough sub-task is solved, sub-tasks with a bigger size can be solved, and finally the original task is able to be solved. From this description it's not hard to find that there are some recursion parts in the solution. Recursions are always combined with this strategy to solve problems that are complicated but satisfy the conditions above.


2. Recursion

2.1. Introduction

To understand the concept of divide-and-conquer, it's necessary to introduce the concept of recursion. From the perspective of equations, a recursion equation is an equation that has similar terms on both sides of the equation mark "=", though their arguments may be different. Concretely,

f(n) = f(n - 1) + f(n - 2)

is an equation of a recursion since on both sides of the equation mark there are f() functions, though the f() on the left have an argument of n and those on the right of n - 2 and n - 1.

From the perspective of coding, a function can be recursive. A recursive function calls itself in its function body for once or many times, with the same or different arguments. Concretely, let's say f() is a recursive function, its code will be like

int f(int n) {
  // other code

  return f(n - 1) + f(n - 2);
}

Note that the f() function invokes itself in its function body for two times, f(n - 1) and f(n - 2), with different arguments n - 1 and n - 2.

2.2. Common Solution

There are some steps to solve a recursive problem.

  1. find out the equation
  2. convert the equation to code

An equation is needed to reflect the relationship between tasks of bigger sizes and sub-tasks of smaller sizes. And the equation is also helpful for writing the corresponding code. The major part of the code for a recursion problem is often a recursive function, as introduced above. Let's explore the three steps with an easy example: displaying the fibonacci sequence.

2.2.1. Fibonacci

In mathematics, the Fibonacci numbers, commonly denoted Fn form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. The beginning of the sequence is thus:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

reference: Fibonacci number

2.2.2. Find Out the Equation

From the definition above and the beginning of the Fibonacci it's not hard to work out the equation of the Fibonacci sequence. Let's use the notation f(n) to denote the \(n^{th}\) number of the sequence. Since the \(n^{th}\) number is the sum of the two preceding ones, which is the \({n - 1}^{th}\) number and the \({n - 2}^{th}\) number here, the equation to calculate the \(n^{th}\) number will be:

\[f(n) = f(n - 2) + f (n - 1) \]

The equation introduces above is valid when \(n\) is greater than 2. However, When \(n = 0\), \(n - 1 = -1\) and \(n - 2 = -2\), and when \(n = 1\), \(n - 1 = 0\) and \(n - 2 = -1\). Since \(n\) is the index of a number and counts from 0, the negative ones (-1, -2) are invalid. Thus we should work out another equations when \(n = 0\) and \(n = 1\). According to the beginning of the sequence, when \(n = 0\), the number is 0. When \(n = 1\), the number of 1. Thus

\[\begin{aligned} f(n) &= 0 & {n = 0} \\ f(n) &= 1 & {n = 1} \end{aligned} \]

And combine the two equations with the one above, we get

$$ f(n)= \left \{ \begin{aligned} & 0 & {n = 0} \\ & 1 & {n = 1} \\ & f(n - 2) + f(n - 1) & {n > 1} \end{aligned} \right. $$

2.2.3. Convert the Equation to Code

It's also not hard to convert the equation into code. There are usually two steps:

  1. write the initial value of the equation
  2. write the recursive part

The initial value of the equation are some special cases, \(n = 0\) and \(n = 1\) here. When the condition of such cases is true, the recursion stops, and some implementation is executed, such as returning a value. The initial value corresponds to the very small sub-task, which can be solved easily and directly, without further manipulations. And the recursive part recursively calls the function with the same or different arguments. For the task of Fibonacci, the code will be:

long long fibonacci(int n) {
  // the initial value
  if (n == 0 || n == 1) {
    return n;
  }
  else {
    return fibonacci(n - 1) + fibonacci(n - 2);  // the recursive part
  }
}

The recursive function can be written with an if statement and an else statement, where the if statement deals with the initial value (\(n = 0\) and \(n = 1\) here) and the else statement handles the recursive part, as follows:

returnType recursiveFunction(parameterList) {
  if (specialCasesCondition) {
    do something with initialValues
  }
  else {
    recursiveFunction(argumentList);
  }
}

It turns out that writting the frame code above before writting other code helps writting the code correctly.


3. Divide and Conquer

3.1. Introduction

The divide-and-conquer strategy can be used when the original task is hard to be solved directly, and it can be divided into one or several sub-tasks with smaller sizes but the same or similar solutions to the original task. After multiple times of division, the size of the sub-task is small enough to make the sub-task easily be solved directly. Here the small sub-task that can be solved directly is corresponding to the initial value in Convert the Equation to Code.

3.2. Common Solution

There are 3 steps in the divide-and-conquer strategy:
1、Divide the problem into a number of subproblems that are smaller instances of the same problem.

2、Conquer the subproblems by solving them recursively. If the subproblem sizes are small enough, however, just solve the subproblems in a straightforward manner.

3、Merge the solutions to the subproblems into the solution for the original problem.

reference: 分治法 ( Divide And Conquer ) 详解

The strategy will be introduced with the binary sort example.

3.2.1. Binary Sort

Binary sort algorithm is a kind of insertion algorithm using the binary search strategy, with a \(O(NlogN)\) time complexity and a \(O(1)\) space complexity, where \(N\) is the length of the array to be sorted.

When inserting the \(i^{th}\) value, binary search is applied to the preceding \(0\) ~ \(i - 1\) values to find the position where the \(i^{th}\) value is to be inserted. First the median of the \(0\) ~ \(i - 1\) values is compared with the \(i^{th}\) value. If the \(i^{th}\) value is smaller, binary search is used on the first half of the \(0\) ~ \(i - 1\) values, else the second half, until \(left > right\). Then values after the position to be inserted into are shifted right by one, and the \(i^{th}\) value is inserted into the correct position.

reference: 二分排序

3.2.2. Dividing

According to the introduction to the binary sort, the sorted sublist for each recursion is divided into two halves. Though the sorted sublist is divided into two halves, only one of the halves is used to find the position to be inserted, thus we then get one sub-task, with half of the original size. The code for dividing will be:

int mid = (left + right) / 2;

3.2.3. Conquering

The recursive part of the binary sort is binary search. It first calculates the median. If the median is the value to be found, the search stops, and the index after that of the median is returned. If the value to be inserted is smaller, the first half list is to be searched, else the second half. Let left and right point to the boundary of the list to be searched, separately. Once \(left > right\), the left pointer is on the right of right, which is not valid, thus the position left pointing at will be returned, which is the initial value here. Let's say the binary search function is binarySearch(int *a, int left, int right, int i), where a is the array to be searched and i is the value to be inserted, the equation will be:

\[binarySearch(a, left, right, i)= \left \{ \begin{aligned} & left & {left > right} \\ & mid + 1 & {a[i] = a[mid]} \\ & binarySearch(a, left, mid - 1, i) & {a[i] < a[mid]} \\ & binarySearch(a, mid + 1, right, i) & {a[i] > a[mid]} \end{aligned} \right. \]

and its corresponding code is:

	if (left > right)  
		return left;  // the initial value
	else
	{
		if (a[i] == a[mid])
			return mid + 1;
		else if (a[i] < a[mid])
			return binarySearch(a, left, mid - 1, i);  // Conquer
		else
			return binarySearch(a, mid + 1, right, i);  // Conquer
	}

Concretely, say that the sequence is

1 2 5 7 9 3

Apparently the first 5 values are in order. We are going to insert the value 3 into somewhere in the sorted sequence

1 2 5 7 9 | 3

a)    1      2      5      7      9   |   3
     left          mid          right    

   current indices: left = 1, mid = 3, right = 5

   because: 3 (to be inserted) < 5 (a[mid])
   therefore: right = mid - 1 = 2

b)    1      2      5      7      9   |   3
     left  right
     mid

   current indices: left = 1, mid = 1, right = 2

   because: 3 (to be inserted) > 1 (a[mid])
   therefore: left = mid + 1 = 2

c)    1      2      5      7      9   |   3
            left
            right
            mid

   current indices: left = 2, mid = 2, right = 2

   because: 3 (to be inserted) > 2 (a[mid])
   therefore: left = mid + 1 = 3

d)    1      2      5      7      9   |   3
             right  left

   therefore right (index 2) < left (index 3)
   and 3 should be inserted after 2

   therefore index = 3 (left)

3.2.4. Merging

For the task here merging is to insert every value into the sorted list to gradually make the list in order. The form of merging is often a for loop. Here's the merging part:

// Merge
for (int i = 1; i <= len; i++) {  // len: length of the list
	// other code

	// shift values right by one
	int elemTmp = a[i];
	for (int j = i - 1; j >= pos; j--) {  // pos: the position to be inserted into
		a[j + 1] = a[j];
  }

	a[pos] = elemTmp;
}

3.2.5. Time Complexity

Generally, the equation to calculate the time complexity is:

\[\begin{aligned} T(N) &= a T(\frac{N}{b}) + O(N^d) \\ &= O(N^{lob_ba}) + O(N^d) \end{aligned} \]

where \(a T(\frac{N}{b})\) (or O(N^{lob_ba})) is the time for conquering and \(O(N^d)\) is the sum of time for dividing and merging.

and a simple way to calculate is as follows:

\[T(N)= \left \{ \begin{aligned} & O(N^d) & {d > log_b a} \\ & O(N^d log N) & {d = log_b a} \\ & O(N^{log_b a}) & {d < log_b a} \end{aligned} \right. \]

where \(N\) is the size of the task to be finished, \(a\) is the number of sub-tasks, and the size of the sub-task is \(\frac{1}{b}\) of the original task. Note that \(logN\) here is equivalent to \(log_2 N\).

For binary search, since the number of the sub-task is 1 (see here), \(a = 1\). The size of the sublist is half of the original one, thus \(b = 2\). Since calculating the median has a time complexity of \(O(1)\) and the for loop used for merging has a \(O(N^2)\) time complexity, the sum of the time for dividing and merging is \(O(1) + O(N^2)\), which is close to \(O(N^2)\), especially when the amount of data to be processed is large. Hence according the equations above, the time complexity of binary sort is

\[\begin{aligned} T(N) &= 1 * T(\frac{N}{2}) + O(N^2) \\ &= O(N^{log_2 1}) + O(N^2) \\ &= O(N^0) + O(N^2) \\ &= O(N^2) \end{aligned} \]


4. Trade-off

It is a good choice to use the divide-and-conquer strategy for tasks that can be divided into smaller sub-tasks with similar or the same solutions, as discussed above. However, this strategy is not suitable for some problems, especially when they have many sub-tasks with the same inputs (the same argument list for the recursive function), or they need a large amount of recursions. The first kind of problems usually have a very bad time complexity, line \(O(2^N)\) or \(N^{log N}\), which is too slow to solve real-world problems. And the second kind of problems require very deep recursions, which may leads to the risk of stack overflow.

Let's first look at the Fibonacci example to explore the first kind of problems. When we use a recursive function to generate a Fibonacci sequence, we need more time than simply using a loop. Concretely, let's say we are to display the \(6^{th}\) number of the sequence, and the fibonacci function is written shortly as f(n). According to the equation introduced above, we have

f(6) =  f(4) +                             f(5)
     = (f(2) +           f(3)) +          (f(3) +          f(4))
     = ((f(0) + f(1)) + (f(1) + f(2))) + ((f(1) + f(2)) + (f(2) + f(3)))

We can see f(2) and f(3) are calculated more than once. Thus, the recursive method for Fibonacci needs recursive functions with the same argument list to be calculated repeatedly, which requires much more time than just using the for loop. Actually, the recursive way requires a time complexity of about (O(2^N)), which is pretty large. Then let's look at the for-loop method:

long long i = 0, j = 1;
	for (int count = 1; count <= n; count++) {
		// cout << i << endl;
		long long tmp = j;
		j += i;
		i = tmp;
	}
}

Clearly its time complexity is $O(N), which is mush faster than the recursive one.

Next let's see why problems requiring a large amount of recursions may have a risk of stack overflow when applying the divide-and-conquer strategy. We know that when a function is called, an activation record will be pushed into the call stack, and that recursion is actually repeated function calls. When the recursion depth is large, the need for capacity of the memory where the call stack exists may beyond the maximum limitation, resulting in stack overflow.

Hence, it's suggested to think of solutions to problems with their properties.


5. Team Work

I worked with Yang Yizhou in pairs in the practice class of Chapter 2. The first and second questions were mainly written by me, and he found solutions to some bugs. He wrote half of the code of the third question in class and I continued writing it after class. At first we used the method of merging linear lists to solve the thrid question, which didn't satisfy our teacher's requirement that the time complexity of the third question should be \(O(NlogN)\). We turned to another method afterwards. The new method satisfied the requirement, but was not simple. I found another similar but simpler way to solve the problem with the help of the Internet. Working in pairs is more efficient, enabling us to find out bugs and solutions to the bugs.

posted @ 2019-10-12 16:16  Sola~  阅读(142)  评论(0)    收藏  举报