WeChat data structure

WeChat data structure

@Fenng from "Gossip" posted three interview questions for product managers a few days ago. The questions are about WeChat, as follows:

The second question is a technical question. It is a bit unexpected to appear in the interview of product managers, but it is a good question for product design and system analysis. System analysis is also a part of our "Product Managers Learn Technology" series of articles, and it is also part of the content of "sublimating" the technology we are talking about.

Let's try to answer this question below, just to stimulate discussion. If you have a good answer, you can leave us a message for discussion.

What is the basic data structure design of the Moments? How can we set up full-read permissions while taking performance into consideration?

We will not discuss the basic data of messages, such as text, pictures, time, location, etc. These data are basically not related to permissions and performance, and can be understood as separate storage, which is purely technical work. Here we only discuss the data structure related to permissions and performance.

In terms of permission management, WeChat uses "labels" to group users, and the grouping of these labels is consistent with the WeChat address book. In terms of data, a "label" mark is added to each relationship. It should be noted here that although the relationship of WeChat is two-way for users in product use (i.e., following each other), when storing, relationship data is established for two related users, that is, each person has his own "address book". This can be seen by the fact that after deleting one's own friends, one does not delete oneself from other people's address books. This is the basic data of label grouping, which is also the basis for the subsequent permission management of Moments.

For the messages that can be seen in the personal Moments timeline, the general logic is to first obtain all friends' messages, then remove the messages that are not authorized to be read by oneself, and remove the messages of users that one has blocked, and then obtain the timeline that one is currently seeing. If this logic is used, it means that every time you refresh the Moments, you have to go to all the message pools to find the messages of the friends in the above address book, and you have to determine whether the user has the permission to read each message found. This is obviously an inefficient method, not to mention that WeChat has such a large number of visits and data. Therefore, this data structure design is not feasible.

The process of reading each time in Moments under general logic

The general idea to solve this performance problem is to spread the processes that require large amounts of computing to scattered time during normal times. The idea here is: prepare the timeline data required by each user according to the permission settings at ordinary times, and when it is needed (refreshing the circle of friends), directly read the prepared content. Then the answer is: in addition to storing a copy of the basic information such as text, pictures, etc. mentioned above, it is also necessary to store a copy of the timeline data for each user. Note that it is one copy for each user. Of course, the "each copy" here does not need to store complete information, only the message ID and time (may be required). When each person refreshes his or her circle of friends, he or she only needs to read his or her own copy of the data, without having to filter in the message pool or judge user permissions.

How do you implement permission control?

When a user publishes a message, the relevant permissions are set according to the tags mentioned above, and the server will write the message into the timeline of each user who has the permission to receive the message. That is, the permission arrangement is done at the moment the user publishes, rather than waiting until the reading time. This naturally reduces the amount of calculation when reading and improves efficiency.

Permission control during publishing (schematic diagram, it is actually more complicated than this)

As for the sharding of databases and tables, I won't go into details. Just know that it exists. Sometimes this kind of technical design will also limit the design of the product.

So how do we prove that what is said above is reasonable?

Interested students can test it: first send a message with reading permissions, such as allowing people in a certain tag to read it. Then add a new person to this tag. The result is that the new person cannot see the message, because the permissions are divided when the message is posted. The new person joins the tag after the message is posted, so he cannot get the permission allocation opportunity for the message. Although he is in the tag group later, he still cannot see the message.

This is the answer to the above question. In fact, the main consideration is whether the limitations of the technical solution can be taken into account during product design. I posted the above answer on Zhihu, and someone asked: Did the WeChat product team consider this issue when designing, or did it become what it is now after continuous iteration? This is a good question. A good product manager should consider this situation when designing, or at least have a corresponding plan, so as not to be helpless when problems arise or when R&D challenges it. In this case, it doesn’t matter whether WeChat considered it from the beginning or iterated it. For WeChat’s “Friends Circle”, it is originally an iterative product. The earliest permission management was separate from the address book. At that time, it was a pure plug-in mode. Now it shares the group mode with the address book for permission management.

If the impact of the above technologies on product design is still unclear, then here are two more questions (a good product manager should be able to ask questions in addition to answering them^_^):

1. Why can’t messages in Moments be edited but can only be deleted?

I understand that this is the result of a balance between product design and technical implementation. The editing function is not a rigid requirement for the Moments that mainly publish photos and instant messages. However, under the above technical framework, the editing function is technically difficult to implement. Specifically, as we said before, the control of permissions is determined at the time of publishing. If the editing function is added, it means that once the user adjusts the reading permission during editing, the data previously written to the timeline of the user with permission needs to be deleted and rewritten. This is also a huge cost for technical implementation, and a lot of data needs to be updated (the timeline data of all users involved in the message must be updated). Therefore, the result of the balance is that it is better to let users delete and republish, rather than provide the editing function. You may ask again, don’t you need to update the timeline of the relevant people when deleting? First of all, deletion is much simpler than writing, and secondly, the data in the user’s timeline may not really need to be deleted. I won’t explain the specific reasons. If you want to know, please leave us a message for a separate explanation.

2. Will the above-mentioned permission allocation rules for publishing take blocked people into account? In other words, if user A blocks person B's Moments, will the message posted by B be included in the prepared data of A's timeline (not what the user sees in WeChat)?

Let me first tell you my answer: the permission control at the time of posting does not take into account the blocked person. As we said before, when the message is posted, the server will selectively put the message into the timeline of the person who has the permission to read it according to the permission information set by the user. If the blocked person needs to be considered at this time, then the blocked person list of each person with permission to read it must be read, and then the decision on whether to put it into the person's timeline is made based on each person's list. Obviously, this will increase the amount of calculation. Then someone will ask, how to implement the blocking function? There are two ways to implement it. One is that when the user refreshes the Moments, the server will filter out the blocked person's messages from the read timeline data (including the blocked person's messages); the other is that when reading, the server sends it to the client as it is, and the client filters it according to the stored blocking list, and the blocked ones are not displayed to the user. There is almost no difference in the efficiency of the two methods. Based on my experience of using WeChat, I tend to think that this is filtered by the client. In fact, there are ways to verify this, but I won't do it here. This blocking solution is also based on the above-mentioned "dispersing the process that requires a large amount of calculation to the scattered time at ordinary times".

So how do we verify that the above logic about blocking is correct? As mentioned above in the verification of "permission allocation when publishing", people who add tag groups later cannot see the group permission messages that were previously published. Here we can also verify it in a similar way: after blocking a user, all messages of the user cannot be seen, but after unblocking, they can be seen in Moments immediately, including messages that were previously published but not read.

***What I want to say is that as a bystander of WeChat design, the above answer is considered from the perspective of system analysis as a user, and it does not mean that WeChat is indeed such a design idea, but the solution in the answer has been verified as much as possible. The answer does not involve specific technology, it is just a system analysis idea.

I am glad to see that more and more product manager recruitments are beginning to focus on technical capabilities. Some time ago, many "technology"-related test questions appeared in the product manager recruitments of major Internet companies, indicating that the industry has begun to realize the auxiliary role of technical capabilities in product design. Again, technology is not a must for product design, but if it is available, efficiency will be greatly improved.

<<:  The dilemma Chinese Internet companies leave for Apple

>>:  In-depth analysis of Android's custom layout

Recommend

The Sumatran rhinoceros, of which only 30 remain, recently welcomed a newborn!

On March 24, a baby Sumatran rhino was born in In...

Moji "Air Fruit" is in an awkward situation

Since it is an Internet company that wants to make...

6 creative tips for WeChat Moments ads in the catering industry

The spring is beautiful and suitable for travelin...

What is missing for domestic mobile phones to challenge Samsung?

After the release of Samsung S6, the Samsung bran...

How to take demand analysis to the extreme?

Once you have a good product idea and have determ...

How complex is Jack Ma’s Internet finance empire?

Zhejiang Ant Financial Services Group Co., Ltd. (...