1. Summary This is the second article in a series on bad code. In this article, I will discuss how to evaluate the pros and cons of code as efficiently and objectively as possible. After publishing the article about bad code (Part 1), I found that this article was unexpectedly popular, and many people also described (tu) the problems in their own code. Recently, our department organized a bootcamp, and I happened to be in charge of the code quality training. During the training course, we spent a lot of time discussing, improving, and perfecting our own codes. Although the fresh graduates were very attentive to the code quality, the quality of the code presented in the end still failed to reach the level of "excellent". The main reason is that they do not understand what good code "should" look like. 2. What is good code? The first step to writing code is to understand what good code is. When I was preparing for the bootcamp course, I was stumped by this question. I tried to use some precise definitions to distinguish "superior", "good" and "bad" codes. However, in the process of summarizing, most of the descriptions of "what is good code" were not practical. 2.1. Definition of good code I just searched for "elegant code" on the Internet and found the following definition: Bjarne Stroustrup, the creator of C++: The logic should be clear and bugs should be hard to hide; Minimal dependencies and easy to maintain; Error handling is based entirely on a clear strategy; The performance is close to optimization, avoiding code confusion and unprincipled optimization; Clean code does one thing. Grady Booch, author of Object-Oriented Analysis and Design: Clean code is simple and straightforward; Clean code that reads like well-written prose; Clean code never obscures the designer's intent, but rather has few abstractions and clear lines of control. Michael Feathers, author of The Art of Modifying Code: Clean code always looks like it was written by someone who cares about its quality. There are no obvious areas for improvement; The author of the code seems to have thought of everything. It seems that what they said makes sense, but it is difficult to refer to when actually judging, especially for newcomers. How to understand "simple, direct code" or "no obvious areas that need improvement"? In practice, many students do face this problem: they are always uneasy about their own code, or they think it is very good, but others think it is bad. There were even a few times when my new classmates and I discussed the code quality standards for several days in a row, but neither of us could convince the other: we both insisted that our own standards for good code were correct. After countless code reviews, I think this picture seems to summarize it better: The evaluation criteria for code quality are somewhat similar to those for literary works. For example, the evaluation of the quality of a novel mainly comes from its readers, and the subjective evaluation of individuals forms a relatively objective evaluation. It does not rely on the number of words or the rhetorical techniques used by the author, which seem to be completely objective but actually have no meaning. But code is a little different from novels. It actually has two readers: computers and programmers. As mentioned in the previous article, even if all programmers can't understand this code, it can be understood and run by computers. Therefore, the definition of code quality needs to be analyzed from two dimensions: the subjective part, which is understood by humans; and the objective part, which is the condition of running on the computer. Since there is a subjective part, there will be individual differences. The evaluation of the same piece of code will lead to different conclusions due to the different levels of people looking at the code. This is also the problem faced by most newcomers: they do not have an executable evaluation standard, so the quality of the code they write is difficult to improve. Some articles about code quality talk about tendencies or principles, which are correct but not very useful in actual guidance. So in this article, I hope to express the criteria for evaluating code in an evaluation method that (I think) has nothing to do with the actual level as much as possible. 2.2. Readable code After weighing for a long time, I decided to prioritize readability: Would a programmer prefer to take over a project with bugs but understandable, or a project without bugs but incomprehensible? If it is the latter, you can close this webpage directly and do something more meaningful to you. 2.2.1.Word-for-word translation Many books on code quality emphasize a point: programs are first for people to read, and then for machines to execute. I agree with this point of view. When evaluating whether a piece of code is understandable to people, I usually ask the author to translate the code word by word into Chinese, try to form sentences, and then read the Chinese sentences to another person who has not seen the code. If the other person can understand it, then the readability of the code is basically qualified. The reason for this judgment is simple: this is how other people understand a piece of code. People who read the code will read it word by word and infer the meaning of the sentence. If the sentence alone cannot understand it, then you need to understand the code in context. If you still cannot understand it simply by connecting the context, you may need to grasp more details of other parts to help infer. In most cases, the more context you need to connect to understand what a piece of code is doing, the worse the quality of the code. The advantage of literal translation is that it allows the author to easily discover assumptions and readability traps that only the author knows and are not reflected in the code. Most of the code that cannot be translated literally is bad code, such as "ms stands for messageService", or "ms.proc() is to send a message", or "tmp stands for the current file". 2.2.2. Follow the convention Conventions include how to organize code and documents, how to write comments, coding style conventions, etc., which are important for future maintenance of the code. There is no mandatory standard for what conventions to follow, but I prefer to follow the conventions of more people. It is generally more reliable to keep the style consistent with the open source project, and secondly, you can also follow the internal coding style of the company. However, if the internal coding style of the company conflicts seriously with the style of the current open source project, it often means that the company's technology tends to be closed or has fallen behind the pace. But in any case, it is always better to follow a convention than to create some rules yourself, which reduces the cost of understanding, communication and maintenance. If a project creates some strange rules, it may mean that the author has not seen enough code. Whether a project follows the conventions often requires the code reader to have some experience, or to use static checking tools such as checkstyle. If you feel you have no idea where to start, then in most cases it shouldn't be a big problem to follow Google: you can refer to Google Code Style, some of which have corresponding Chinese versions. In addition, there is no need to worry about what benefits will be gained from following the agreement. It is like whether it is better to walk on the left or the right. Even if a conclusion is reached, it is meaningless. Most agreements can be followed. 2.2.3. Documentation and comments Documentation and comments are very important parts of a program. They are one of the ways to understand a project. In some scenarios, the two may overlap or intersect (for example, Javadoc can actually be considered a document). The standard for documents is very simple, as long as they can be found and understood. Generally speaking, I am more concerned about the following types of documents: For the introduction of the project, including project functions, authors, directory structure, etc., readers should be able to roughly understand what the project does within 3 minutes. QuickStart for beginners: readers should be able to complete code building and simple use within 1 hour according to the documentation. Detailed description documents for users, such as interface definition, parameter meaning, design, etc., so that readers can understand how to use these functions (or interfaces) through the documents. Some comments are actually documents, such as the javadoc mentioned earlier. This can put the source code and comments together, which is clearer for readers and simplifies the maintenance of many documents. There is another type of comments that are not part of the documentation, such as comments inside a function. The purpose of this type of comments is to explain the author's thinking when coding that the code itself cannot express, such as "Why is XXX not done here" or "Pay attention to XXX here." Generally speaking, I would first be concerned about the number of comments: there should not be too many comments in a function, nor should there be none. My personal experience is that it is normal to see one or two comments after scrolling a few screens. Too many comments may mean that the code itself is not readable, and if there are no comments at all, it may mean that some hidden logic is not explained, and you need to consider adding some comments appropriately. Secondly, the quality of comments should also be considered: on the basis of code readability, comments should provide more information than the code. More documents and comments are not necessarily better, as they may increase maintenance costs. For a discussion on this topic, please refer to the concise section. 2.2.4. Recommended reading Clean Code 2.3. Releasable Code A typical feature of new code is that due to lack of experience in project maintenance, there are always many unexpected aspects of the code. For example, there seems to be nothing unusual during testing, but after the project is released, many unexpected situations are found; when a problem occurs, they don’t know where to start troubleshooting, or they can only keep the system in an unstable state and rely on some coincidences to barely run. 2.3.1. Handling Exceptions Novice programmers generally have no awareness of handling exceptions, but the actual running environment of the code is full of exceptions: servers may crash, networks may time out, users may operate recklessly, and people with bad intentions may maliciously attack your system. My first impression of a piece of code's exception handling ability comes from the coverage of unit tests. Most exceptions are difficult to reproduce in the development or test environment, and even with a professional testing team, it is difficult to simulate all exceptions in an integrated test environment. Unit testing can simulate various abnormal situations in a relatively simple manner. If the unit test coverage of a module is less than 50%, it is hard to imagine that these codes have taken into account the handling of abnormal situations. Even if they have, these exception handling branches have not been verified. How can we expect them to perform well when problems arise in the actual operating environment? 2.3.2. Handling Concurrency Many resumes I received said: proficient in concurrent programming/familiar with multithreading mechanisms, etc. When talking to them, they also talked about locks, mutexes, thread pools, synchronization, semaphores, and a bunch of other terms. However, when given a real scenario, the candidates were asked to write a simple concurrent programming program, but not many of them could write it well. In fact, concurrent programming is indeed difficult. If the difficulty of writing good synchronization code is 5, then the difficulty of concurrent programming can reach 100. This is not alarmist. Many seemingly stable programs may still have problems when facing concurrent scenarios: for example, we recently encountered a Linux kernel crash due to synchronization problems when calling a system function. The key to high-quality concurrent programming is not whether a synchronization strategy is applied, but whether shared resources are protected in the code: Memory access outside of local variables has concurrency risks (such as accessing object properties, accessing static variables, etc.) There are also concurrency risks when accessing shared resources (such as caches, databases, etc.). If the callee is not declared as thread-safe, there is a high probability of concurrency problems (such as Java's hashmap). All operations that rely on timing still have concurrency issues, even if each step is thread-safe (for example, deleting a record first and then reducing the number of records by one). The first three situations can be easily distinguished through the code itself. You just need to cultivate your sensitivity to shared resource calls. However, for the last case, it is often difficult to simply see it by looking at the code, and the two calls that cause concurrency problems may not even be in the same program (for example, two systems read and write a database at the same time, or different modules of a program are called concurrently, etc.). However, as long as there is logic such as "do A first, then do B" in the code that accesses shared resources without locking, you may need to be vigilant. 2.3.3. Optimizing performance Performance is an important indicator for evaluating programmers' abilities, and many programmers are also very fond of talking about the performance of their programs. However, it is difficult to directly see the performance of a program through the code, and it is often necessary to use some performance testing tools or execute it in a real environment to get results. If we only consider the code, there are two ways to evaluate execution efficiency: The time complexity of the algorithm. A program with high time complexity will inevitably have low running efficiency. Single-step operations are time-consuming. Try to avoid single-step operations that are time-consuming, such as accessing databases and IO. In actual work, we also see some programmers who are too keen on optimizing efficiency, which will reduce the readability of the program, increase the complexity, or increase the duration of the project, etc. For such cases, the simple way is to ask the author to explain where the bottleneck of the program is, why there is this bottleneck, and the benefits of optimization. Of course, whether it is under-optimization or over-optimization, the best way to judge performance indicators is to use data rather than just looking at the code. The performance testing part is beyond the scope of this article, so I will not elaborate on it in detail. 2.3.4. Logs The log represents the difficulty of troubleshooting a program when a problem occurs. Experienced programmers have probably encountered this scenario: when troubleshooting a problem, there is a lack of log information, and the value of a certain variable cannot be found, resulting in the inability to analyze where the problem lies. There are three evaluation criteria for logs: Are there enough logs? All exceptions and external calls need to have logs, and the entry, exit and key points of a call link also need to have logs. Whether the log is clearly expressed, including whether it is understandable, whether the style is consistent, etc. The evaluation criteria for this are the same as the readability of the code, so I won't repeat them here. Check whether the log contains enough information, including the call context, external return values, keywords used for query, etc., to facilitate information analysis. For online systems, the number of logs can generally be controlled by adjusting the log level, so as long as the code for printing logs does not cause obstacles to reading, it is basically acceptable. 2.3.5. Further reading Release It!: Design and Deploy Production-Ready Software (Don’t read the Chinese version, the translation is really bad) Numbers Everyone Should Know 2.4. Maintainable Code Compared with the first two types of code, the evaluation criteria for maintainable code is more vague, because it corresponds to the future situation, and it is generally difficult for newcomers to imagine what impact some current practices will have in the future. However, according to my experience, generally speaking, you only need to repeatedly ask two questions: What if he resigns? What if he didn't do that? 2.4.1. Avoid duplication Almost all programmers know that they should avoid copying code, but the phenomenon of copying code inevitably becomes a killer of program maintainability. There are two types of code duplication: intra-module duplication and inter-module duplication. No matter what kind of duplication there is, it indicates to some extent that the programmer has a problem with his skills. The problem of intra-module duplication is more serious. If large amounts of repeated code appear in the same file, it means that he is likely to write any incredible code. You don’t need to read the code over and over again to determine duplication. Generally speaking, modern IDEs provide tools to check for duplicate code with just a few clicks of the mouse. In addition to code duplication, many new programmers who are keen on maintaining code quality are prone to another type of duplication: information duplication. I've seen some newcomers like to write a comment before each line of code, such as:
It seems easy to understand, but a few years later, this code becomes:
Then it might be changed to this:
As the project evolves, useless information will accumulate, and eventually it will become impossible to distinguish which information is valid and which is invalid. If you find several things in your project doing the same thing, such as using comments to describe what the code is doing, or relying on comments to replace version management functions, then these codes cannot be called good codes. 2.4.2. Module division High cohesion within modules and low coupling between modules are the standards followed by most designs. Through reasonable module division, complex functions can be split into smaller functional points that are easier to maintain. Generally speaking, you can preliminarily evaluate whether a module division is reasonable based on the code length. If the length of a class is greater than 2000 lines, or the length of a function is greater than two screens, these are relatively dangerous signals. Another aspect that can reflect the level of module division is dependency. If a module has too many dependencies, or even circular dependencies, it can also reflect that the author has poor planning for the module, and it is very likely that when maintaining the project in the future, a single move will affect the entire project. Generally speaking, there are many tools that can provide dependency analysis, such as the Dependencies Analysis function provided in IDEA. Learning how to use these tools will be of great help in evaluating code quality. It is worth mentioning that in most cases, inappropriate module division will also be accompanied by extremely low unit test coverage: unit tests for complex modules are very difficult to write, or even impossible to complete. Therefore, directly checking the unit test coverage is also a more reliable evaluation method. 2.4.3. Simplicity and abstraction Whenever we talk about code quality, we will inevitably mention adjectives such as concise and elegant. The word concise actually covers a lot of things. Avoiding duplication in code is concise, and designing with sufficient abstraction is concise. All attempts to improve maintainability are actually attempts to do subtraction. Programmers with insufficient programming experience often fail to realize the importance of simplicity and are happy to tinker with complicated things. However, complexity is the natural enemy of code maintainability and a threshold for programmers' abilities. Programmers who have crossed the threshold should be able to control the growing complexity, summarize and abstract the essence of things, and reflect it in their own design and coding. The life cycle of a program is also a process of continuous iteration from simple to complex and then from complex to simple. It is difficult for me to summarize a simple and easy-to-use evaluation standard for this part. It is more like a way of thinking. In addition to understanding, it also requires practice. Read more, think more, and communicate more. Many times, the things that can be simplified will far exceed the original expectations. 2.2.4. Recommended reading Refactoring - Improving the Design of Existing Code Design Patterns: The Foundations of Reusable Object-Oriented Software "Software Architecture Patterns-Understanding Common Architecture Patterns and When to Use Them" 3. Conclusion This article mainly introduces some methods to evaluate the quality of code. Some of these methods are more objective, while others are more subjective. As mentioned before, the evaluation of code quality is a subjective matter. Although this article lists many evaluation methods, in fact, many codes that I think are not problematic are also complained by others. Therefore, this article can only be regarded as a preliminary draft, and more content needs to be supplemented and improved in the future. Although everyone has different tendencies in evaluating code quality, the ability to evaluate code quality can be compared to a programmer's "taste", and the accuracy of the evaluation will increase with the increase of one's own experience. In this process, you need to keep thinking, learning and critical at all times. |
<<: Apple opens the floodgates: Developers can now submit iOS 9 software
>>: Three modes of pair design
It’s still the same Mi fans, the same script, and...
On the morning of March 5, the highly anticipated...
The automotive industry is in a period of great c...
No one in the European auto industry will have an...
CPD Promotion Platform Introduction The vivo App ...
Today I will share with you 80 Douyin e-commerce ...
Faced with the current booming smart TV industry, ...
Speaking of wild watermelon seedlings Hibiscus tr...
Someone once commented on Cai Wensheng: "He ...
Introduction to the resources of the Dragon and T...
Wandoujia has been drawing user portraits ( Perso...
Smart home appliances were once considered a gimm...
Fission is an important part of studying user gro...
The Spring Festival holiday is over and many peop...