Best practices for cache penetration, cache concurrency, and hotspot caching

Best practices for cache penetration, cache concurrency, and hotspot caching

1. Introduction

When we use cache, whether it is Redis or Memcached, we will basically encounter the following three problems:

  • Cache penetration
  • Cache concurrency
  • Cache Invalidation
  • Cache penetration

Note:

What's wrong with the three pictures above?

When we use cache in our projects, we usually check whether the cache exists first. If it exists, we return the cache content directly. If it does not exist, we query the database directly and then return the cache query result. At this time, if a certain data we query does not exist in the cache, it will cause the DB to be queried for every request, so the cache loses its meaning. When the traffic is large, the DB may crash.

So what is a good way to solve this problem?

If someone frequently attacks our application using a non-existent key, this is a vulnerability.

A clever way is to pre-set a value for this non-existent key.

For example, "key", "&&".

When the && value is returned, our application can consider that this is a non-existent key, and then our application can decide whether to continue waiting for access or give up this operation. If we continue to wait for access, after a time polling point, we request this key again. If the value obtained is no longer &&, we can consider that the key has a value at this time, thus avoiding transparent transmission to the database, and thus blocking a large number of similar requests in the cache.

Cache concurrency

Sometimes, if the website has a high number of concurrent visits and a cache fails, multiple processes may query the DB and set the cache at the same time. If the concurrency is really large, this may also cause excessive pressure on the DB and the problem of frequent cache updates.

My current idea is to lock the cache query. If the KEY does not exist, lock it, then query the DB into the cache, and then unlock it. If other processes find that there is a lock, they will wait, and then return data or enter the DB query after unlocking.

This situation is similar to the pre-set value problem mentioned earlier, except that the use of locks will cause some requests to wait.

Cache Invalidation

The main reason for this problem is high concurrency. Usually, when we set the expiration time of a cache, we may set it to 1 minute, 5 minutes, etc. When the concurrency is very high, a lot of caches may be generated at the same time, and the expiration time is the same. At this time, when the expiration time arrives, these caches may become invalid at the same time, and all requests will be forwarded to the DB, which may be overloaded.

So how to solve these problems?

One simple solution is to spread out the cache expiration time. For example, we can add a random value based on the original expiration time, such as 1-5 minutes. In this way, the repetition rate of each cache expiration time will be reduced, making it difficult to cause collective failure events.

The second problem we discussed was for the same cache, and the third problem was for many caches.

In summary:

  • Cache penetration: querying a data that must not exist. For example, in the article table, querying a non-existent id will access the DB every time. If someone maliciously damages it, it is likely to directly affect the DB.
  • Cache failure: If the cache fails within a certain period of time, the pressure on the DB will be prominent. There is no perfect solution, but you can analyze user behavior and try to evenly distribute the failure time points.

Cache avalanche occurs when a large number of cache penetrations occur, such as a large number of concurrent accesses to an invalid cache.

Problem Summary

Question 1:

How to solve DB and cache consistency issues?

A: After modifying the database, did you modify the cache in time? This problem has been practiced before. The situation where the database modification was successful but the cache modification failed was mainly because the cache server was down. The failure to update in time due to network problems can be solved through the retry mechanism. If the cache server is down, the request will naturally not be able to reach it, and the database will be accessed directly. Then, after modifying the database, we cannot modify the cache. At this time, we can put this data in the database and start an asynchronous task to detect whether the cache server is connected successfully. Once the connection is successful, the modified data is taken out from the database in order, and the latest cache value is modified in turn.

Question 2:

I want to ask about cache penetration! For example, a user searches for an article by ID. According to what I said before, the cached KEY is pre-set to a value. If the ID is inserted and it is found to be a pre-set value, such as "&&", what does it mean to continue waiting for access? When will the ID be actually attached with the value the user needs?

A: What I just said is mainly the scenario of back-end configuration and front-end acquisition. If the front-end cannot obtain the corresponding key, it will wait or give up. After the relevant key and value are configured on the back-end configuration interface, the previous key && will naturally be replaced. In the case you mentioned, there should naturally be a process that will set this ID in the cache at a certain moment, and when a new request arrives, the latest ID and value will be obtained.

Question 3:

In fact, if you use redis, I saw a good example the other day, double key, there is a subsidiary key generated at the time to mark the expiration time of data modification, and then reload the data when it is about to expire. If you think there are too many keys, you can put the end time in the main key, and the subsidiary key plays the role of a lock.

A: We have tried this solution before. This solution will generate duplicate data and requires controlling the relationship between the attached key and the key at the same time, which is somewhat complicated in operation.

Question 4:

What is the concept of multi-level cache?

A: Multi-level cache is just like what I mentioned in the article I sent you today. Ehcache and redis are used as secondary cache, just like what I mentioned in the article I wrote before. But there will also be consistency issues. If we need strong consistency, there will be a time difference between cache and database synchronization. So in the specific development process, we must analyze it according to the scenario. The secondary cache solves more problems, such as cache penetration and program robustness. When there is a problem with the centralized cache, our application can continue to run.

Note: The cache mentioned in this article can be understood as Redis.

2. Cache penetration and concurrency solutions

The above article introduces some common ideas about cache penetration and concurrency, but does not clarify the application scenarios of some ideas. Let's continue to explore them in depth. I believe many friends have read many similar articles before, but in the final analysis, there are two problems:

  • How to solve penetration
  • How to solve concurrency

When the concurrency is high, I actually do not recommend using the cache expiration strategy. I prefer that the cache always exists and the data in the cache system be updated through the background system to achieve data consistency. Some friends may question what to do if the cache system crashes. In this way, the database is updated but the cache is not updated, and a consistent state is not achieved.

The idea to solve the problem is:

If the cache fails to update data successfully due to network problems, it is recommended to retry several times. If the update is still not successful, it is considered that the cache system is faulty and unavailable. At this time, the client will insert the KEY of the data into the message system. The message system can filter the same KEY. It only needs to ensure that the same KEY does not exist in the message system. When the cache system is restored, the KEY values ​​are taken out from the mq in turn and the latest data is read from the database to update the cache.

Note: Before updating the cache, there is still old data in the cache, so it will not cause cache penetration.

The following figure shows the whole process of thinking:

After reading the above solutions, many friends will ask questions. What should I do if I am using the cache for the first time or the cache does not have the data I need?

Ideas to solve the problem:

In this scenario, the client reads data from the cache according to the KEY. If the data is read, the process ends. If the data is not read (there may be multiple concurrent requests that have not read the data), the setNX method in the cache system is used to set a value (this method is similar to adding a lock). Requests that are not successfully set will sleep for a while, and requests that are successfully set will read the database to obtain the value. If the value is obtained, the cache is updated and the process ends. The previously sleeping request is awakened at this time and reads data directly from the cache, and the process ends.

After reading this process, I think there will be a loophole here. What should we do if the database does not have the data we need? If it is not processed, the request will cause an infinite loop, constantly querying the cache and database. At this time, we will follow the idea of ​​inserting a NULL string into the cache if no data is read in my previous article, so that other requests can be directly processed according to "NULL" until the background system successfully inserts data into the database and synchronously updates and cleans up the NULL data and updates the cache.

The flow chart is as follows:

Summarize:

In actual work, we often combine the above two solutions to achieve the best effect. Although the second solution will also cause request blocking, it will only occur when it is used for the first time or there is no data in the cache temporarily. It has been tested in production and will not cause problems when the TPS is less than tens of thousands.

3. Hotspot Cache Solution

1. Cache usage background:

Let’s take a case study from the user center to illustrate:

Each user will first obtain his or her own user information, and then perform other related operations. There may be the following scenarios:

  • There will be a large number of the same users repeatedly accessing the project.
  • The same user will frequently access the same module.

2. Thought analysis

Because users are not fixed and the number of users is in the millions or even tens of millions, it is impossible for us to cache all user information. Through the first scenario, we can see some patterns, that is, there are a large number of identical users who visit repeatedly, but we don’t know which users are the repeat visitors.

If a user frequently refreshes and reads items, it will also cause great pressure on the database itself. Of course, we also have relevant protection mechanisms to prevent malicious attacks, which can be controlled from the front end, or there can be blacklist mechanisms, etc., which are not detailed here. If we use cache, how can we control the same user from frequently reading user information?

Please see the following figure:

We will use the cache system to make a sorting queue. For example, for 1,000 users, the system will update the user information time according to the user's access time. The more recently accessed users are ranked higher. The system will regularly filter out the last 200 users, and then randomly take out 200 users from the database to join the queue. In this way, every time a request arrives, it will first get the user information from the queue. If it hits, it will read the user information from another cache data structure based on the userId. If it does not hit, it means that the user's request frequency is not high. The JAVA pseudo code is as follows:

  1. for ( int i = 0; i < times; i++) {
  2. user = new ExternalUser();
  3. user .setId(i+ "" );
  4. user .setUpdateTime(new Date (System.currentTimeMillis()));
  5. CacheUtil.zadd(sortKey, user .getUpdateTime().getTime(), user .getId());
  6. CacheUtil.putAndThrowError(userKey+ user .getId(), JSON.toJSONString( user ));
  7. }
  8.   
  9. Set <String> userSet = CacheUtil.zrange(sortKey, 0, -1);
  10. System. out .println( "[sortedSet] - " + JSON.toJSONString(userSet) );
  11. if(userSet == null || userSet. size () == 0)
  12. return ;
  13.   
  14. Set <Tuple> userSetS = CacheUtil.zrangeWithScores(sortKey, 0, -1);
  15. StringBuffer sb = new StringBuffer();
  16. for (Tuple t:userSetS){
  17. sb.append( "{member: " ).append(t.getElement()).append( ", score: " ).append(t.getScore()).append( "}, " );
  18. }
  19.   
  20. System. out .println( "[sortedcollect] - " + sb.toString(). substring (0, sb.length() - 2));
  21.   
  22. Set <String> members = new HashSet<String>();
  23. for (String uid:userSet){
  24. String key = userKey + uid;
  25. members.add (uid);
  26. ExternalUser user2 = CacheUtil.getObject( key , ExternalUser.class);
  27. System. out .println( "[user] - " + JSON.toJSONString(user2) );
  28. }
  29. System. out .println( "[user] - " + System.currentTimeMillis());
  30.   
  31. String[] keys = new String[members. size ()];
  32. members.toArray(keys);
  33.   
  34. Long rem = CacheUtil.zrem(sortKey, keys);
  35. System. out .println( "[rem] - " + rem);
  36. userSet = CacheUtil.zrange(sortKey, 0, -1);
  37. System. out .println( "[remove - sortedSet] - " + JSON.toJSONString(userSet));

<<:  iOS 11–11.1.2 full jailbreak released: supports all 64-bit devices

>>:  10 Android Browsers to Improve Your Web Browsing Experience in 2018

Recommend

Case Study: 6 Methods of Growth Hacking

The article mainly introduces two strategic imple...

A very stealthy feeling! How can this bird catch fish with an "umbrella"?

For birds that make a living by catching fish, ma...

Detailed explanation of the 618 e-commerce activity plan!

Today we are going to talk about how to implement...

Creative guide to Spring Festival topic marketing!

How to do 5-minute topic marketing during the Spr...

How does Juliang Qianchuan invest in live streaming?

There are two ways to promote Juliang Qianchuan’s...

Can blacklisting Taobao help bring manufacturing back to the United States?

In its quest to attract manufacturing back to Chi...

Why are you not good at marketing?

Erya told me today that she has registered accoun...

E-commerce operations: How to price products?

Let’s start by answering the first question: Why ...