Programmer Skill Hierarchy Model

Programmer Skill Hierarchy Model

[[129301]]

Programming skill level

Programming skill level refers to the ability of programmers to design and write programs. This is the foundation of programmers.

Section 0 - Non-Programmers:

When a beginner programmer encounters a problem, he is completely confused and has no idea how to program to solve the problem. In other words, he is still a layman and cannot be called a "programmer". The computer is still a mysterious black box in front of him.

Section 1 - Basic Programmer:

After learning programming for a period of time, when you receive a task, you can write a program to complete the task.

The written code can work under normal circumstances, but in actual operation, various bugs will appear when encountering some special conditions. In other words, you have the ability to develop demo software, but when the developed software is actually delivered to customers for use, you may be scolded by customers.

The programmer has written the program, but he himself doesn't know why it sometimes works properly and sometimes doesn't.

When a bug is encountered during operation or the requirements change, the code needs to be modified or added, and soon the program becomes chaotic, the code bloated, and full of bugs. Soon, even the original developer himself is unwilling to take over the maintenance of the program.

Section 2 - Data Structure:

After a period of programming practice, programmers will realize the meaning of the old saying "data structure + algorithm = program". They will use algorithms to solve problems. Furthermore, they will realize that algorithms are essentially dependent on data structures. Once a good data structure is designed, a good algorithm will also come into being.

It is impossible to grow a good algorithm if the data structure is designed incorrectly.

I remember a foreign sage once said: "Show me your data structure!"

3 paragraphs - object-oriented:

After that, programmers will appreciate the power of object-oriented programming. Most modern programming languages ​​support object-oriented programming. But it does not mean that if you use object-oriented programming language to program, use classes, or even inherit classes, you are writing object-oriented code.

I've seen a lot of procedural code written in Java, Python, and Ruby.

Only when you have mastered interfaces, polymorphism, and the relationship between classes and classes, and between objects, can you truly master object-oriented programming techniques.

Even if you are using a traditional programming language that does not support object-oriented programming, as long as you have "objects" in your mind, you can still develop object-oriented programs.

For example, when I program in C, I consciously use object-oriented techniques to write and design programs. I use struct to simulate classes, and put functions of the same concept together to simulate classes. If you doubt whether you can write object-oriented code in C, you can take a look at the Linux kernel, which is written in C, but you can also see the strong "object" flavor between the lines of its source code.

It is not easy to truly master object-oriented programming techniques.

In my technical career, there are two obstacles that gave me the most headaches.

One hurdle was the transition from Dos to Windows development. For a long time, I couldn't understand the concept of framework. In the Dos era, it was all about calling function libraries, and your program actively called functions. In the Windows era, it was replaced by frameworks. Even your main program is actually called by the framework. The UI thread will get messages from the operating system and send them to your program for processing. The Spring framework, which Java programmers are familiar with, is also such a reverse calling framework.

Nowadays, because the term "framework" seems very high-sounding, many "class libraries"/"function libraries" call themselves "frameworks". In my opinion, this is an abuse of the name.

"Class library"/"function library" means the code I wrote calls them.

"Framework" means that I register callback functions to the framework, and the framework calls the functions I write.

Another hurdle was object-orientation. For a long time I didn't know how to design the relationship between classes, and I couldn't design a class hierarchy well.

I remember reading a book written by a foreign expert. He talked about a very simple and practical object-oriented design technique: "Describe the problem. Then find the nouns in it and use them to build classes. Find the verbs in it and use them to build class methods." Although this technique is quite useful, it is too grassroots. It has no theoretical basis and is not rigorous. If the problem is not described well, the resulting class system will be problematic.

There should be many ways to master object-oriented thinking. I got inspiration from relational databases to understand and master object-oriented design ideas.

In my opinion, a table in a relational database is actually a class, and each row of records is an instance of a class, that is, an object. The relationship between tables is the relationship between classes. O-R mapping technology (such as Hibernate) is used to map from object-oriented code to database tables, which also shows that classes and tables are indeed logically equivalent.

Since database design and class design are equivalent, to design an object-oriented system, you only need to use the design techniques of relational databases.

The relational database table structure design is very simple:

1. Identify the relationship between tables, that is, the relationship between classes. Is it one-to-one, one-to-many, many-to-one, or many-to-many? This is the relationship between classes.

2. Identify the fields of the table. An object certainly has countless attributes (e.g., a person: height, weight, gender, age, name, ID number, driver's license number, bank card number, passport number, Hong Kong and Macau Pass number, work number, medical history, marital history, etc.). When we write a program, we only need to record the attributes we care about. These attributes of concern are the fields of the table, that is, the attributes of the class. "From the vast expanse of water, I take a sip"!

4th paragraph - Design mode:

I once saw this sentence on the Internet: "If you don't have 100,000 lines of code, don't talk to me about design patterns." I agree with it.

I remember the first time I read Gof's book Design Patterns, I found that although I didn't know about design patterns before, I still consciously used some design patterns in the actual programming process. Design patterns are objective laws of programming. They were not invented by anyone, but were first discovered by some early senior programmers.

Without design patterns, you can also write programs that meet the requirements. However, once the subsequent requirements change, your program will not be flexible enough and will be difficult to continue. After the real program is delivered to the customer, there will definitely be further demand feedback. And the development of subsequent versions will definitely increase the requirements. This is a reality that programmers cannot avoid.

When writing UI programs, whether for Web, Desktop, Mobile, or Game, you must use the MVC design pattern. Otherwise, your program will not be able to survive the subsequent changes in UI requirements.

The most important idea of ​​design pattern is decoupling, which can be achieved through interfaces. In this way, if the requirements change in the future, you only need to provide a new implementation class.

The main design patterns are actually object-oriented. Therefore, design patterns can be considered as an advanced stage of object-oriented. Only by mastering design patterns can we be considered to have truly and thoroughly mastered object-oriented design skills.

When I learn a new language (including non-object-oriented languages, such as functional programming languages), I always look at how various design patterns are implemented in this language after understanding its syntax. This is also a trick for learning programming languages.

5th paragraph -- Language expert:

After a period of programming practice, programmers have become quite proficient in a common programming language. Some have even become "language lawyers", good at explaining the usage and pitfalls of the language to other programmers.

Programmers at this stage are often loyal believers in the language they use, and often argue with users of other languages ​​in communities and forums about which language is the best programming language. They believe that the language they use is the best programming language in the world, bar none. They believe that the programming language they use is suitable for all scenarios. They only see hammers, so they will treat all tasks as nails.

6th section - Multilingual experts:

Programmers at this stage have learned and mastered several programming languages ​​due to work or simply out of interest in technology. They have learned the different design ideas of different programming languages ​​and have a better understanding of the strengths and weaknesses of each language.

They now believe that programming language is not the most important thing, programming language is just basic skills.

They will now choose different programming languages ​​to solve problems based on different task requirements or different resources, and will no longer complain about not using a favorite programming language for development.

There are many schools and philosophies of programming languages, and some programming languages ​​support multiple programming paradigms at the same time.

Statically typed programming paradigm

A programming language that uses a statically typed programming paradigm, where variables need to have their types explicitly specified. Representative languages: C, C++, Pascal, Objective-C, Java, C#, VB.NET, Swif, Golang.

The benefits of doing this are:

1. The compiler can find type errors at compile time.

2. The compiler can improve performance by knowing type information during compilation.

This paradigm assumes that the programmer must know the type of the variable. If you don't know the type of the variable, then you can't mess around! When compiling, the program will report an error.

Both Swift and Go are statically typed programming languages, but they do not require explicit type specification, but can be automatically determined by the compiler through inference.

Dynamically typed programming paradigm

A programming language that uses static type programming paradigm, where variables do not need to be explicitly specified. Any variable can point to an object of any type. Representative languages: Python, Ruby, JavaScript.

The philosophy of dynamic typing can be summarized by the concept of duck typing. The duck test proposed by James Whitcomb Riley can be expressed as follows: "When you see a bird walk like a duck, swim like a duck, and quack like a duck, then this bird can be called a duck."

This paradigm assumes that programmers must know the type of a variable and the methods and properties it supports. If you don't know the type of a variable, then you're out of luck! The program will crash when it runs! Who should you blame for the program crash? Blame yourself, you're not a qualified programmer!

The benefits of dynamic typing are:

There is no need to explicitly define interfaces and abstract types. As long as a type supports the required methods and properties, it is OK. The program will be quite flexible and simple. The interface/base class that C++, Java, and C# regard as the lifeblood is nothing in dynamic languages!

The disadvantages are:

1. If the type is incorrect, the compiler cannot find the error and the program crashes at runtime.

2. Because the compiler does not know the type of the variable, it cannot optimize performance.

Object-oriented programming paradigm

Object-oriented programming paradigm, which has been popular since the late 1970s. It supports classes and class instances as modules for encapsulating code. Representative languages: Smalltalk, C++, Objective-C, Java, C#, VB.NET, Swift, Go, Python, Ruby, ActionScript, OCaml.

Early programming languages ​​were all process-oriented. Sequences, conditions, loops, and functions were all built into them. As the size of the code grew, people found it necessary to modularize the code. The code corresponding to a concept was placed in one file, which facilitated concurrent development and code management.

People also discovered the law of "Program = Data Structure + Algorithm". Therefore, the data structure and function corresponding to a concept should be placed in one file. This is the concept of class.

The object-oriented programming paradigm has indeed greatly improved production efficiency, and has therefore been widely used. Therefore, there are many languages ​​that support the object-oriented programming paradigm at the language level.

Although C language does not support object-oriented programming paradigm at the language level, modern C language development will apply object-oriented modular thinking, placing data structures and functions of the same type in one file and using similar naming methods.

After all, C language does not support object-oriented programming at the language level, so many programmers want to add object-oriented support to C language. The representatives are C++ and Objective-C.

C++ is a new language, but most language elements are compatible with C.

Objective-C is fully compatible with C. Objective-C adds a thin layer of syntax sugar to C to support interfaces (that is, classes in other languages) and protocols (that is, interfaces in other languages). Even the initial implementation of Objective-C was a precompiler for the C language. To be honest, except for the added syntax that does not quite conform to the C flow, the object-oriented system design of Objective-C is actually quite sophisticated. Jobs had a keen eye for it in his early years and took Objective-C into his pocket. Because it was closed in the Apple/NextStep system, few people knew about it. With the popularity of the iOs system, Objective-C has become famous all over the world in recent years.

Functional programming paradigm

Functional programming is a programming language invented by mathematicians who believe that programs are mathematical functions. Representative languages: Lisp, Erlang, JavaScript, OCaml, Prog.

Many experts have advocated functional programming languages, believing them to be revolutionary. But I think they overestimate the power of the functional programming paradigm. I don't think the functional programming paradigm is superior to the object-oriented programming paradigm.

Functional programming languages ​​are based on functions, and they do not have the concept of classes. However, their functions are not like those in traditional procedural languages, and they support the concept of "closures".

In my opinion, the functions of functional programming languages, that is, "closures", are actually "classes". As programming languages ​​have developed to this day, they need modularization, that is, they need to combine "data structures" and "algorithms". No matter what language, there is no way out if the programming method does not combine them.

Object-oriented programming languages ​​use classes to combine "data structure" and "algorithm". The core of a class is the "data structure", that is, its "attribute", rather than the "algorithm", its "function". In a class, functions are attached to attributes.

Functional programming languages ​​use closures to combine "data structures" and "algorithms". Functions can access external fields. "Attributes" are attached to "functions".

"Class" is essentially equivalent to "closure". Many object-oriented programming languages ​​now support closures. By observing their codes, we can find that they actually use "class" to implement "closure".

Which is easier to use, "class" or "closure"? Obviously "class".

Closures are more concise, so they are often used to replace anonymous classes in object-oriented programming languages. It is too troublesome to write a class with only one function as a class, so it is better to write it as a closure, which is more concise.

Let me complain about the OCaml language. Its predecessor, Caml, is a pretty good functional language itself, but it added a complete set of object-oriented mechanisms to it, supporting both object-oriented and functional programming paradigms. It can easily cause a brain split like C++.

There are also many object-oriented language fans who find JavaScript annoying and always want to add object-oriented support to JavaScript. ActionScript is one of the attempts. I have used it and it is really not much different from Java.

Let me complain about ExtJS again. When I was choosing a web front-end development framework, I compared ExtJS and JQuery.

ExtJS was obviously developed by Java experts. It used JavaScript to simulate the design concept of Swing and created a UI library.

The developers of JQuery obviously understood the functional programming paradigm of JavaScript, and created a UI library based on the characteristics of JavaScript's dynamic functional programming language, which immediately killed ExtJS.

From the story of ExtJS and JQuery, we can see how important multi-language programming skills are. The author of ExtJS is proficient in and loves Java, so he used JavaScript as a hammer, hitting Java randomly, which was a thankless task.

Functional programming languages ​​also have some tricks such as tail recursion. Tail recursion can avoid stack overflow during recursive calls.

Template Programming Paradigm

Template programming is to use types as parameters, and a set of functions can support any number of types. Representative language: C++.

The need for template programming was invented when C++ was developing container libraries. Because containers need to store objects of any type, there is a need for generics.

C++ template programming is to create corresponding types of code according to the usage in the source code during compilation. In addition to C++, Java and C# also have similar mechanisms called "generics", but their implementation methods are very different from C++ templates. Their compilers do not generate new code, but use forced type conversion to implement it.

In programming languages ​​without templates/generics, how do you store objects in containers? You can store and access objects of common base class types (Java, C#) or void* pointers (C), and cast them to the actual type when you retrieve them. In dynamic type languages, you don't care about types, and it doesn't matter. You can just throw any object into the container and use it directly after you retrieve it.

Some C++ experts have come up with "template metaprogramming" based on templates. Because template programming is done by the C++ compiler, template metaprogramming is to let the compiler calculate, and the result will be calculated after compilation. I don't know what this thing is used for except for research and showing off skills?

summary

I think there are several criteria for whether a language is worth learning:

1. Whether you want to use it or not, you have to learn it. There is no doubt about it. After all, we all have to eat.

2. Does its language feature give you a refreshing feeling? If so, then it is worth the money. For example, Go language abolished exceptions and used multiple return values ​​instead. I agree with this. In fact, I have actively stopped using exceptions for many years. Because, I think since C does not support exceptions and lives well, why do we need exceptions? If an error occurs, return an error code. For irreparable errors, just Abort the program! Moreover, exceptions actually violate the principle of process-oriented programming. A function should have only one entry and one exit. Throwing an exception means there is one more exit.

3. Are you good at a certain field? If you only have a hammer, you can only treat all tasks as nails and hammer them hard. But if there are multiple tools in the toolbox, you will be much more handy when facing different tasks.

7th section - Architecture design

You also need to master the ability of architectural design to design excellent software. There are some techniques for architectural design:

1. Layering

A software is usually divided into:

Presentation layer - UI part

Interface layer - the communication interface part of the background service

Service layer - the actual service part

Storage layer - persistent storage part, stored in files or databases.

Layered software can decouple modules, support parallel development, and is easy to modify and improve performance.

2. SOA

Modules are interconnected through network communication and loosely coupled. Each module can be deployed independently, and deployment instances can be increased to improve performance. Each module can be developed using different languages ​​and platforms, and previously developed services can be reused. SOA, commonly used protocols include WebService, REST, JSON-RPC, etc.

3. Performance bottleneck

1) Change synchronization to asynchronous.

Use memory queues (Redis), workflow engines (JBpm), etc. Memory queues are prone to data loss, but they are fast. The workflow engine will save the request to the database.

By converting synchronous requests to asynchronous requests, basically 99.99% of performance problems can be solved.

2) Use single-machine parallel hardware for processing.

For example, use GPU, FPGA and other hardware to process and improve performance.

3) Use cluster computers for processing.

For example, a Hadoop cluster uses multiple computers to process data in parallel.

In your own software stack, you can also deploy multiple copies of a module and process them in parallel.

4) Use cache to satisfy requests. After the commonly used content is cached, a large number of user requests are just memory reads, and performance will be greatly improved.

Cache is God's algorithm. I remember that its performance is only slightly lower than the best performance, just like you are God and can foresee the future. Now X86 CPU has encountered the main frequency limit, and the main way to improve CPU performance is to increase the cache.

4. Big system, small implementation

Don't panic when you encounter a large system. Divide it into multiple modules, use multiple small programs, and solve it through SOA collaboration. This adheres to the design concept of Unix. A large number of small programs with single purposes have been developed on Unix. It advocates that users use pipelines to allow multiple small programs to collaborate and solve user needs. Of course, pipeline communication has too many restrictions and is not flexible enough. Therefore, now we can use URI and SOA to allow multiple programs to collaborate. Applications on Andorid and iOS now collaborate through URI. Is this also a modern development of Unix design thinking? !

5. Sharding

There is a trend now, which is to get rid of IOE. I-IBM mainframe, O-Oracle database, E-EMC storage. In the past, large systems often used IOE to de-architecture, deploying an Oracle database on the mainframe, and the Oracle database used EMC storage to save data. IOE is the most powerful computer, database and storage today. But they will also be unable to withstand the massive system one day.

Oracle database shares everything and can be run on a computer cluster (server nodes cannot exceed 16). All computer clusters share a single storage.

The movement to eliminate IOE marks the bankruptcy of the ShareEverything model. ShareNothing must be used so that the system can be infinitely expanded.

You can use MySQL database to handle data of any size. The prerequisite is that you know how to use Sharding. Divide a large system into several small systems, and divide them into several cheap servers and storage. A more modern method is to divide them into a large number of virtual machines.

For example, the 12306 website of the Ministry of Railways. We know that train tickets belong to a certain train. So if we divide each train as a unit, we can divide the 12306 website into thousands of modules. A virtual machine can carry several modules. When some trains become performance bottlenecks, they can be migrated to independent virtual machines. Even if some listed services are unavailable in the end, the system will not be completely unavailable.

The 12306 website has only one global part, which is user login. This can be handled by a third party, such as allowing users to log in using WeChat, Weibo, QQ, etc.

You can also implement the user login service yourself. Or use multiple Redis servers in a slicing manner to provide services. The Redis server stores the sessionId and userId, role, permissions and other information of each logged-in user. The sessionId is randomly generated, and some of its bits can be selected to identify which Redis server it is on. After the user logs in, the sessionId is sent to the client. The user sends the sessionId back to the server each time he requests. The server sends the sessionId to the Redis server to query its user information and process the user request. If the sessionId cannot be found on the redis server, the user is asked to log in. Even if all registered users log in at the same time, not much memory is needed. Moreover, when there is too much session memory, the session of the user who logged in first can be deleted, forcing him to log in again. The number of active users at the same time will not be too many.

Domain Knowledge Level

All the previous levels focus on programming skills, which are the basics, and they don’t generate much value. But too many programmers waste too much time on those basics.

Some programmers like to delve into programming languages. Whenever a new programming language comes out or an old language becomes popular, they will invest their energy in researching it. I am one of them, and I have wasted a lot of energy on programming languages ​​and tricks.

I think C++ is a very big pitfall. It was originally developed as an object-oriented C language. Later, template programming was discovered, and template programming and further template metaprogramming were strongly advocated. Recently, new standards such as C++11 and C++14 have been introduced, adding many new things, such as functional programming and type inference. C++ is too complicated, and too many pitfalls consume a lot of programmers' energy. When I use C++, I only use the object-oriented part and the template part, and do not use other overly sophisticated features.

Computer science is a very broad subject. There are many areas of knowledge that need to be studied in depth before we can write valuable programs. Software must be integrated with the industry and put into practice to be valuable. You cannot write valuable programs just by studying programming skills without understanding domain knowledge.

There are many fields in computer science, some of them are listed below:

Storage----block devices, file systems, cluster file systems, distributed file systems, fiber SCSI, iSCSI, RAID, etc.

Network----Ethernet, fiber optic network, cellular network, WIFI, VLAN, etc.

Computer architecture is mainly the CPU instruction set, such as x86, ARM, etc.

USB protocol. Need to know URB packets.

PCI protocol, PCI-E protocol. Modern computer peripherals are all PCI protocol and PCI-E protocol. Graphics cards are now all connected to computers via PCI-E protocol. Relatively speaking, there is a lot less knowledge to learn. To engage in virtualization, you need to have a deep understanding of the PCI protocol.

Image processing - image compression, real-time video encoding, etc.

3D Games

Relational Database

NoSQL Databases

operating system

Distributed operating system

Compilation Principles

Machine learning--now big data is needed!

Understanding these domain knowledge also includes understanding the existing commercial hardware, commercial software, and open source software in the field. In many cases, there are already ready-made tools for the work you want to complete. You can complete the task by using the ready-made tools without developing. Sometimes, you only need to combine existing tools and write some scripts to complete the task.

For example, I wanted to implement a two-way synchronization task. I found an excellent open source software Unison and completed the task successfully by writing a configuration file. No code needed.

Another time, we needed to achieve high availability, and we easily implemented it by calling several open source software using Python.

Write an installer, customize the operating system, and knowing the domain knowledge of the operating system, you can easily get it done by writing a few lines of scripts.

People who do not have domain knowledge may have to carry out a lot of unnecessary development, and may even find out after a long period of development that this is a dead end.

In addition, solid domain knowledge can greatly improve your programming debugging and error-checking abilities. Knowing how compilers and programming language runtimes work will allow you to quickly modify your code based on compilation errors and warning messages.

Knowing the underlying operating mechanism of the operating system, you can quickly find the root cause of runtime errors. For example, I once wrote a Windows upgrade service program. It is a Windows service that needs to execute a DOS script, which will replace the Windows service itself. I found that sometimes the script execution is invalid. After checking for a night, I found that when the Windows service is installed, there will be a permission problem when the script is executed for the first time. The logs are correct, but the actual execution of the script has no effect. But once the Windows service program is started once, it is ok. This must be a problem with the underlying security mechanism of the Windows operating system. Because I don’t know much about the Windows kernel, it took me a long time to find this problem, and I didn’t know the root cause of the problem.

Stage 0 - Newbie in Domain Knowledge

I don't have much knowledge of the field. I use search engines to find some introductory articles about software and hardware in the field, and follow the instructions in the articles to configure and use the software. I can barely use the existing software and hardware.

1st stage - expert in field knowledge

Understand the commonly used hardware in the field, and have an in-depth understanding of the configuration and usage skills of commonly used software in the field. Be able to skillfully build solutions using existing software and hardware, and be able to solve various problems encountered in actual work.

Section 2 - Domain Knowledge Expert

When you not only master the software and tools in this field and know how to use them, but also understand their principles, "knowing why they are so", you will become a knowledge expert in this field.

If you know the principles of network protocols, you can know where the problem may be when there is a problem with the network. Is it a MAC conflict, an IP conflict, or a network loop?

Only when you understand the principles of storage can you understand why one storage method is not suitable for virtualization, another storage method is suitable for virtualization, and another method is suitable for data backup.

Only if you know the PCI protocol, you can know how to virtualize a hardware device.

Only if you know the network card hardware protocol can you simulate a virtual network card that can be used normally by a virtual machine.

Only when you know the video encoding formats and principles can you know which video format occupies the least bandwidth and which video format occupies the least CPU.

Only when you understand the IntelVT/Amd V instruction set can you know how virtualization is implemented.

You understand that a workflow is actually a state machine. When you encounter a complex workflow, you will know how to design a workflow engine that meets the requirements.

3rd paragraph - Scientist

You are an expert in domain knowledge, but your knowledge comes from books and other people.

If you are satisfied with being an expert in the field, you can only copy others and never think of surpassing others. Others may not be willing to tell you their research results. When others tell you, they may have discovered a newer theory and a new generation of products may be released soon.

Scientists are people who explore the unknown, have the courage to innovate, and promote the progress of human society.

Legend has it that a senior executive of Cisco once said half-jokingly: "If Cisco stops developing new technologies, Huawei will lose its direction." This is a mockery of Huawei's only being an expert in domain knowledge, and can only copy but not surpass. I don't know the actual situation of Huawei, but I hope that Huawei has now become a leader.

Irving Jacobs discovered the principle of CDMA code division multiple access and found it to be very promising in communications, so he founded Qualcomm. Qualcomm mainly makes a living from patent licensing fees, and it employs a large number of scientists to conduct research in the field of communications. Some people say that Qualcomm is a patent troll. These people do not understand the value of knowledge. In their eyes, the reasonable price of Windows should be 5 yuan, the price of a CD. The price of an iPhone should be more than 1,000 yuan for a bare phone. Qualcomm is a patent troll, so why don't you troll CDMA and LTE for me to see!

The X86 chip was not designed with virtualization in mind. Therefore, the so-called "virtualization vulnerability" occurs. That is, when some CPU privileged instructions are executed, no exception is thrown in the virtual machine environment, so it is impossible to switch to the host. In this way, virtual machines cannot be run on X86 chips.

VmWare was founded by several American scientists in 1998. They found that they could use binary translation technology to run virtual machines on X86 computers.

Xen virtualization software was also invented by several scientists. They found that as long as the kernel of the virtual machine operating system and the host operating system were modified, the host function could be directly called when the "virtualization vulnerability" instruction needed to be executed, virtualization could be achieved, and the operating performance of the virtual machine could be greatly improved.

Later, Intel added the IntelVT instruction set to its chips, and AMD added the AmdV instruction set to its chips, filling the "virtualization loophole". So there was KVM virtual machine software, which directly used CPU hardware instructions to achieve virtualization.

When KVM executes CPU instructions, it runs directly on the physical CPU, so it is very efficient. However, when the virtual machine runs virtual peripherals, it must use software simulation, so the IO access speed of the virtual machine is very slow.

IBM scientist Rusty Russell created the VirtIO technology by drawing on the R&D experience of Xen. This is to write a set of PCI virtual devices and drivers in the virtual machine. This set of virtual PCI devices has a virtual device memory. The host can access this virtual device memory, and the virtual machine can also access it through the VirtIO driver. In other words, a piece of memory is shared between the virtual machine and the host, which solves the IO performance problem of the virtual machine.

Here’s another search engine story:

A long time ago, I wanted to add a search function to a program. I first used SQL query to implement it, but found it was too slow. Later, I found the open source Lucene project. It uses inverted index technology to create inverted indexes in files, which greatly improves the search speed.

The two founders of Google discovered the secret of links in HTML. They found that they could set the weight of each HTML page through the link relationship of HTML pages. That is the PageRank algorithm. As a result, Google's automatic search engine defeated Yahoo's manually classified search engine.

OK, using reverse index technology and PageRank, as well as a simple HTML crawler robot, we can create a search engine. However, the Internet is huge, and a large number of new web pages are generated every day. It is difficult to build a reverse index for the entire Internet.

Several years later, Google published three more papers: Googlefs, Mapreduce, and Bigtable. So the developers of the Lucene project developed the Hadoop project based on Google's Mapreduce paper. MapReduce uses a large number of computers to store data and perform calculations, and finally summarize the results. Using Hadoop + reverse index + PageRank, you can create a search engine. Companies such as Yahoo and Baidu have developed their own search engines based on Hadoop.

However, the search engines of other companies are still not as effective as Google. We programmers know this best. For example, I always go over the firewall just to Google.

Google Blackboard published some articles by Dr. Wu Jun, which introduced a lot of knowledge about machine learning. From the article, we can know that Google actually uses machine learning to analyze the collected pages. Google obviously will not make this formula public. Even if Google really makes this formula public one day, it can be imagined that Google must have developed a more powerful secret formula, and the effect of the knockoff search engine is still not as good as Google's.

Copying is the only way to innovation. Before becoming a leader in a field, one must go through the stages of learning and imitating. But to become the boss of the industry, to become a Champion, one must have the courage to overtake others, bravely embark on the road of innovation, and become a true scientist, a real expert!

Summarize

Programming ability can be divided into two dimensions: one is the level of programming skills, and the other is the level of domain knowledge.

Some programmers may spend all their energy on improving their programming skills, but have little knowledge of domain knowledge, which is actually extremely harmful in their daily work. Some needs may already have ready-made, open source and free solutions, or they only need to combine several existing software to quickly solve them, but they have to spend a lot of time developing them themselves. In addition, the lack of domain knowledge makes it difficult to quickly locate the root cause of the problem and solve the bug when the program encounters unexpected situations.

<<:  NULL and nullptr and nil and Nil and NSNull

>>:  Four internationalization models for Chinese technology companies

Recommend

Can Haier's "U+ Smart Life" ignite the huge smart home market?

As a major development direction of the future In...

Aiti Tribe Story Collection (11): Programming from 0 to 1

[51CTO.com original article] Aris became interest...

How to set up event prizes like Alipay Koi?

Prizes are inducement conditions that stimulate us...

Nintendo's big bet after a decade, but this time the Switch may not win

I still remember that in 2004 and 2006, facing th...

Marketing Artificial Intelligence Institute: AI for Retail Leaders

Artificial intelligence continues to transform ev...

How can ordinary people use quantum computers?

How can ordinary people get in touch with real qu...

How to make your marketing reach high-value people on Zhihu?

Zhihu is a platform that brings together a group ...