When optimizing to scale to multiple cores…

When optimizing to scale to multiple cores…

When optimizing to scale to multiple cores

"There is no silver bullet in software development. All we can do is choose and balance;"

In the previous article, we talked about the five directions of program optimization under single thread (ref: "Five Directions of Program Optimization"); when the single core is optimized to the extreme value, it is time to multi-task;

It is very clear to think about it. A single task is broken down into multiple tasks, and multiple CPUs are allowed to work at the same time and execute in parallel, so the efficiency is naturally improved.

However, it may not be that simple;

Granularity of task decomposition

First, we need to determine whether our single task can be decomposed. For example, if we want to parse many files, it is easy to divide such a task into multiple tasks. However, if it is a time-consuming serial logic calculation, and the later calculation depends on the previous results, it is difficult to split. This form may need to be split at a higher level.

Data race

Programming is about computing and data. Computing is done in parallel, but the data is still accessed from the same source. Accessing common resources will result in resource competition.

If this is not controlled, the same data may be repeatedly calculated (in multiple read scenarios) or dirty data may be generated (in write-back scenarios).

Introducing locks

In order to make data access orderly, locks need to be introduced to prevent dirty data;

Controlling the granularity of locks is a topic that requires careful consideration;

For example, in scenarios with large amounts of reads and small amounts of writes, using read-write locks can significantly improve efficiency compared to locking everything equally.

In the products we come into contact with daily, databases are masters of locks. When updating data, whether it is locking rows, columns, or tables, the performance of different granularities varies significantly.

Thundering Herd

Consider a scenario where multiple threads are waiting for a lock. If the lock can be obtained, the thread starts working (thread pool)

When the lock is released, waking up multiple threads may cause herd shock;

Solution:

Use a single-threaded solution/processing accpet connection to handle the operation of waiting for the lock, so that only one thread is waiting for the lock at any time;

For more details, see:

In the "Client-Server Programming Method", a thread pool is created in advance, and each thread accepts a section

Data replication

Let each thread use its own data, so that the data is not shared, which can eliminate resource competition and locks;

Duplicate data into multiple copies to reduce competition and allow each user to access their own data;

But this introduces a new problem: if each thread writes back data, how to ensure the consistency of such data?

After all, what they represent is actually a piece of data;

When it comes to data consistency, synchronization between multiple copies of data is a problem;

Data Sharding

Well, let’s change our thinking and not use data replication. We use data sharding. The idea of ​​sharding is easier to think of. Since “computing” is divided into multiple small tasks, data can also be processed in the same way.

The data is divided into shards, each of which stores different contents and has nothing in common;

In this way, there is no data contention in data access, and because the data is different, there is no problem of data consistency synchronization;

However, sharding is far from being as good as imagined;

Sharding causes each thread to see a fragment of data instead of the entire set. This means that the thread can only process this specific part of data. In this way, the calculations between threads lose their interchangeability. Certain tasks can only be processed on specific threads.

And if a task needs to access all the data, it becomes even more complicated;

It turns out that after sharding, we pushed the problem upward to the thread level, and needed to consider the processing at the business logic level;

This may be more complicated;

OK, if you want faster speed and use multi-core processing, you need to face more problems;

When it comes to the architecture level, the problems we face when expanding a single machine to a multi-machine cluster are similar.

Reference: Reading Notes on "Large-Scale Website Technology Architecture" [2] - Architecture Patterns

There is no silver bullet in software development, all we can do is choose and balance;

<<:  Why Google is eating itself

>>:  4 tips to simplify your IT programmer's work life

Recommend

Internet Promotion: How to Master Internet Traffic in 3 Steps?

Traffic is a very critical issue for Internet com...

On Chinese Valentine's Day, let's see the romance in space

It is the Chinese Valentine's Day again, a tr...

Google clearly told us that these will be important improvements in Android 11

[[284651]] Maybe when you read this article, mayb...

Is there an "ecological epic" at 31° north latitude? Its name is "Shennongjia"!

On the mysterious latitude of 31° north, Mount Ev...

BeOS is the operating system that almost killed Apple

[[135855]] In 1997, Apple acquired NeXT, and Jobs...

A comprehensive review of the top ten most classic interactive marketing cases!

Today, I will take you to review the top ten inte...

NAP: Artificial Intelligence and the Future of Work in 2024

In the year since ChatGPT was launched in Novembe...

How much does it cost to customize a transportation mini program in Beijing?

There is no doubt that the topic of mini programs...