[This series of blog posts will analyze and summarize common data structures and corresponding algorithms, and will provide several relevant first-tier Internet company interview/written test questions in each blog post to consolidate what we have learned and help us fill in the gaps. Project address: https://github.com/absfree/Algo. Due to my limited personal level, there are inevitably some unclear and inaccurate places in the description. I hope you can correct me. Thank you:)] 1. IntroductionBefore further studying data structures and algorithms, we should first master the general methods of algorithm analysis. Algorithm analysis mainly includes the analysis of the time and space complexity of the algorithm, but sometimes we are more concerned about the actual operating performance of the algorithm. In addition, algorithm visualization is a practical skill that helps us understand the actual execution process of the algorithm. This skill is especially useful when analyzing some more abstract algorithms. In this blog post, we will first introduce how to quantify the actual operating performance of the algorithm by designing experiments, and then introduce the analysis method of the algorithm's time complexity. We will also introduce a multiple experiment that can very conveniently predict the performance of the algorithm. Of course, at the end of the article, we will do a few first-line Internet-related interview/written test questions together to consolidate what we have learned and put it into practice. 2. General Methods of Algorithm Analysis1. Quantify the actual performance of the algorithmBefore introducing the time and space complexity analysis method of the algorithm, let us first introduce how to quantify the actual running performance of the algorithm. Here, the quantitative indicator we choose to measure the performance of the algorithm is its actual running time. Usually this running time is related to the scale of the problem that the algorithm is trying to solve. For example, it usually takes longer to sort 1 million numbers than to sort 100,000 numbers. Therefore, when we observe the running time of the algorithm, we must also consider the scale of the problem it solves, and observe how the actual running time of the algorithm increases as the scale of the problem increases. Here we use the example in the book Algorithms (4th Edition) (Douban), and the code is as follows:
The two classes StdIn and StdOut used in the above code are here: https://github.com/absfree/Algo. We can see that the function of the above code is to count the number of three-integer tuples whose sum is 0 in a standard int[] array. The algorithm used is very direct, that is, to traverse the array from the beginning, take three numbers each time, and if the sum is 0, the count is increased by one, and the count value returned is the number of triplets whose sum is 0. Here we take three files containing 1000, 2000, and 4000 integers respectively (these files can be found in the project address above) to test the above algorithm and observe how its running time changes with the growth of the problem size. A direct way to measure the running time of a process is to obtain the current time once before and after the process runs, and the difference between the two is the running time of the process. When our process itself requires a very short execution time, this measurement method may have some errors, but we can reduce or even ignore this error by executing the process multiple times and taking the average. Let's actually measure the running time of the above algorithm. The relevant code is as follows:
We use 1000, 2000, and 4000 integers as input respectively, and the running results are as follows
From the above results, we can see that when the problem size is doubled, the actual running time is about 8 times the original. Based on this phenomenon, we can make a conjecture: the function relationship between the program running time and the problem size N is T(N) = k*(n^3). In this relationship, when n becomes twice as large as before, T(N) becomes eight times as large as before. So does the running time of the ThreeSum algorithm satisfy the above functional relationship with the problem size? We will come back to this question after introducing the relevant content of the algorithm's time complexity. 2. Time complexity analysis of the algorithm(1) Basic conceptsRegarding the time complexity of the algorithm, here we briefly introduce the three related symbolic notations: The first one is called Big O notation, which gives an "asymptotic upper bound" on the running time, that is, the upper limit of the worst-case running time of the algorithm. It is defined as follows: for f(n) and g(n), if there exist constants c and N0 such that for all n > N0, |f(n)| < c * g(n), then f(n) is called O(g(n). The most commonly used notation in our daily algorithm analysis is Big O notation. Below we will introduce the specific method of analyzing the time complexity of an algorithm. If you are not familiar with the concept of Big O notation, I recommend you to read this article: http://blog.jobbole.com/55184/. (2) Analysis method of time complexityIn this section, we will use the ThreeSum program above as an example to introduce the analysis method of algorithm time complexity. For ease of reading, here is the program above:
Before introducing the time complexity analysis method, let's first clarify what the running time of an algorithm depends on. Intuitively, the running time of an algorithm is the sum of the time it takes to execute all program statements. However, in actual analysis, we do not need to consider the running time of all program statements. What we should do is to focus on the most time-consuming part, that is, the operations that are executed most frequently and are the most time-consuming. In other words, before analyzing the time complexity of a program, we must first determine which statements in the program take up most of its execution time, and those operations that are time-consuming but only executed a constant number of times (regardless of the problem size) can be ignored. We select the most time-consuming operation and estimate the time complexity of the algorithm by calculating the number of times these operations are executed. Let's describe this process in detail below. First, we see that the statements in lines 1 and 2 of the above code will only be executed once, so we can ignore them. Then we see that lines 4 to 12 are a three-layer loop, and the loop body contains an if statement. In other words, this if statement is the most time-consuming statement in the above code. Next, we only need to calculate the number of executions of the if statement to estimate the time complexity of this algorithm. In the above algorithm, our problem size is N (the number of elements contained in the input array), and we can also see that the number of executions of the if statement is related to N. It is not difficult to deduce that the if statement will be executed N * (N - 1) * (N - 2) / 6 times, so the time complexity of this algorithm is O(n^3). This also confirms our previous conjecture that the functional relationship between the running time and the problem size (T(n) = k * n ^ 3). From this, we can also know that the time complexity of an algorithm describes how the running time of the algorithm grows as the problem size grows. In daily use, Big O notation is usually not a strict upper limit on the running time of an algorithm in the worst case. Instead, it is used to express the upper limit of the asymptotic performance of an algorithm under normal circumstances. When using Big O notation to describe the upper limit of the running time of an algorithm in the worst case, we usually add the qualifier "worst case". Through the above analysis, we know that the time complexity of the analysis algorithm only requires two steps (one step less than putting an elephant into a refrigerator:)): Analyze the number of times key operations are performed. In the above example, we can see that no matter what the integer array we input is, the number of executions of the if statement remains unchanged, which means that the running time of the above algorithm has nothing to do with the input. However, the actual running time of some algorithms is highly dependent on the input we give. We will introduce this issue below. 3. Expected running time of the algorithmWe can understand the expected running time of an algorithm as how long the algorithm takes to run under normal circumstances. In many cases, we are more concerned about the expected running time of an algorithm rather than the upper limit of the algorithm's running time under the worst-case scenario, because the probability of the worst-case scenario and the best-case scenario occurring is relatively low, and we more often encounter general situations. For example, although the time complexity of the quick sort algorithm and the merge sort algorithm are both O(nlogn), under the same problem scale, quick sort is often faster than merge sort, so the expected running time of the quick sort algorithm is smaller than that of merge sort. However, in the worst case, the time complexity of quick sort becomes O(n^2). The quick sort algorithm is an algorithm whose running time depends on the input. For this problem, we can avoid the worst case by disrupting the order of the input array to be sorted. 4. Rate experimentNext, we will introduce the "multiplier experiment" in the book Algorithms (4th Edition) (Douban). This method can simply and effectively predict the performance of programs and determine the approximate order of magnitude of their running time growth. Before formally introducing the multiplier experiment, let's first briefly introduce the concept of "order of magnitude of growth" (also quoted from the book "Algorithms"): We use ~f(N) to denote all functions whose result approaches 1 when divided by f(N) as N increases. We use g(N)~f(N) to denote that g(N) / f(N) approaches 1 as N increases. Usually the approximation we use is g(N) ~ a * f(N). We call f(N) the order of growth of g(N). Let's take the ThreeSum program as an example. Assume that g(N) represents the number of times the if statement is executed when the input array size is N. According to the above definition, we can get g(N) ~ N ^ 3 (when N tends to positive infinity, g(N) / N^3 approaches 1). So the growth order of g(N) is N^3, that is, the growth order of the running time of the ThreeSum algorithm is N^3. Now, let's formally introduce the multiplier experiment (the following content is mainly quoted from the book "Algorithms" mentioned above, and combined with some personal understanding). First, let's warm up with a small program:
The above code will start with 250, double the size of the ThreeSum problem each time, and output the size of the problem and the corresponding running time after each ThreeSum run. The output of running the above program is as follows:
The reason why the above output is different from the theoretical value is that the actual operating environment is complex and changeable, so there will be many deviations. The way to minimize this deviation is to run the above program multiple times and take the average value. With the above warm-up program as a prelude, we can formally introduce this method of "simple and effective prediction of the execution performance of any program and determine the approximate order of increase of its running time". In fact, its work is based on the above DoublingTest program. The general process is as follows: Develop an [input generator] to generate various possible inputs in real situations. The DoublingRatio procedure is as follows: Running the multiplier program, we can get the following output:
We can see that time/prev does converge to 8 (2^3). So why does the ratio of running time tend to a constant by repeatedly running the program with the input doubling? The answer is the following [Multiplication Theorem]: If T(N) ~ a * N^b * lgN, then T(2N) / T(N) ~2^b. The proof of the above theorem is very simple. We only need to calculate the limit of T(2N) / T(N) when N tends to positive infinity. Among them, "a * N^b * lgN" basically covers the order of growth of common algorithms (a and b are constants). It is worth noting that when the order of growth of an algorithm is NlogN, we will get the order of growth of its running time by about N when we perform a rate test on it. In fact, this is not a contradiction, because we cannot infer that the algorithm conforms to a specific mathematical model based on the results of the rate experiment. We can only roughly predict the performance of the corresponding algorithm (when N is between 16000 and 32000, 14N is very close to NlgN). 5. Amortization analysisConsider the ResizingArrayStack we mentioned earlier in the in-depth understanding of linked lists in data structures, which is a stack that supports dynamic resizing and is implemented with an array at the bottom. Each time we add an element to the stack, we will determine whether the array of the current element is full. If it is full, we will create a new array with a size twice that of the original array and copy all elements from the original array to the new array. We know that when the array is not full, the complexity of the push operation is O(1), and when a push operation fills the array, the work of creating a new array and copying will cause the complexity of the push operation to suddenly rise to O(n). For the above situation, we obviously cannot say that the complexity of push is O(n). We usually think that the "average complexity" of push is O(1), because after all, every n push operations will trigger "copying elements to the new array" once, so these n pushes amortize this cost. For each of the series of pushes, their amortized cost is O(1). This method of recording the total cost of all operations and dividing it by the total number of operations to amortize the cost is called amortized analysis (also called amortized analysis). 3. Try out some practical interview questions from famous companiesEarlier we introduced some techniques for algorithm analysis, so now let’s put what we have learned into practice and solve some interview/written test questions about algorithm analysis from first-tier Internet companies. 【Tencent】The time complexity of the following algorithm is ____ After seeing that this question requires us to analyze the time complexity of the algorithm, the first step we need to do is to determine the key operation. The key operation here is obviously the if statement, so we only need to determine the number of times the if statement is executed. First of all, we can see that this is a recursive process: foo will continue to call itself until the actual parameter of foo is less than or equal to 1, foo will return 1, and then the if statement will no longer be executed. From this we can know that the number of times the if statement is called is n, so the time complexity is O(n). 【JD.com】The time complexity of the following function is ____ This question is obviously more difficult than the previous one, so let's solve it step by step. First, its key operation is the if statement, so we only need to determine the number of times the if statement is executed. The above function will continue to recursively call itself when n > 0. What we need to do is to determine how many times the if statement is executed before reaching the recursive base case (that is, n <= 0). We assume that the number of times the if statement is executed is T(n, m, o), then we can further get: T(n, m, o) = T(n-1, m+1, o) + T(n-1, m, o+1) (when n > 0). We can see that the base case is independent of the parameters m and o, so we can further simplify the above expression to T(n) = 2T(n-1), from which we can get T(n) = 2T(n-1) = (2^2) * T(n-2)......So we can get the time complexity of the above algorithm is O(2^n). 【JD.com】The time complexity of the following program is ____ (where m > 1, e > 0) The key operations of the above algorithm are the two assignment statements in the while statement. We only need to count the number of executions of these two statements. We can see that when x - y > e, the statements in the while statement will be executed, and x = (x + y) / 2 will make x smaller and smaller (when y<<x, executing this statement once will make x become about half of its original value). Assuming that the value of y is fixed at 1, the number of executions of the loop body is ~logm. In fact, y will be assigned to m / x at the beginning of each loop, which is always larger than the value of y in the previous loop. In this way, the value of xy will be smaller, so the time complexity of the above algorithm is O(logm). 【Sogou】Assuming that the computation time of an algorithm can be expressed by the recursive relation T(n) = 2T(n/2) + n, T(1) = 1, then the time complexity of the algorithm is ____ According to the recursive relationship given in the question, we can further obtain: T(n) = 2(2T(n/4) + n/2) + n = ... Expanding the recursive formula further, we can get the time complexity of the algorithm is O(nlogn). I will not post the detailed process here. IV. ReferencesAlgorithms (4th Edition) (Douban) |
<<: Sharing of iOS network layer architecture design
>>: Who are the top ten highest-paid technology companies in the world that are envied and hated?
2020, the sky is full of black swans. The economy...
"Your feet smell terrible, you must have ath...
Editor: Gong Zixin Evening of February 19 Flight ...
[1]. In the Baidu report, there is a Baidu statis...
At the Build conference, Microsoft's own Nokia...
Recently, Lancôme's Spring Festival Garden Pa...
On the morning of October 29, a press conference ...
September 22 is World Rhino Day. Rhinoceros, a cr...
At 1:56 a.m. on May 10, the Long March-7 carrier ...
In the last issue, we introduced the functions of...
What do we think about when we think about writin...
In the past two years, more than a dozen new bran...
"Black and White Ultrasound" and Color ...
[[147539]] If you are already a very good program...
How AI can help companies make better business de...