How to perform fuzzy query on an encrypted mobile phone number?

How to perform fuzzy query on an encrypted mobile phone number?

Preface

A few days ago, a friend from Knowledge Planet asked me a question: How to fuzzily query an encrypted mobile phone number?

We all know that when designing a system, considering the security of the system, it is necessary to encrypt some of the user's personal privacy information, such as login password, ID number, bank card number, mobile phone number, etc., to prevent the user's personal information from being leaked.

A long time ago, CSDN suffered a SQL injection, which resulted in the leakage of more than 6 million pieces of user information stored in plain text.

Therefore, when we design the system, we must consider encrypting and saving the user's privacy information.

Common symmetric encryption algorithms include AES, SM4, ChaCha20, 3DES, DES, Blowfish, IDEA, RC5, RC6, Camellia, etc.

Currently, the mainstream international symmetric encryption algorithm is AES, while the main one promoted in China is SM4.

No matter which algorithm is used, there is a big difference between the string before encryption and the string after encryption.

For example, the string before encryption: Su San said technology, using the key: 123, the encrypted string is: U2FsdGVkX1+q7g9npbydGL1HXzaZZ6uYYtXyug83jHA=.

How to perform fuzzy query on the encrypted string?

For example: suppose the keyword Su San is queried, the encrypted string is: U2FsdGVkX19eCv+xt2WkQb5auYo0ckyw.

The two encrypted strings generated above seem to be quite different, and there is no way to directly perform fuzzy query using the like keyword in the SQL statement.

So how do we implement the fuzzy query function of encrypted mobile phone numbers?

1 Load into memory once

To implement this function, the first method we may think of is: load the personal privacy data into the memory at one time and cache it, then decrypt it in the memory first, and then implement the fuzzy search function in the code.

picture

The advantage of doing this is that it is relatively simple to implement and the cost is very low.

But the problem is: if there is a lot of personal privacy data, the application server's memory may not be sufficient and OOM problems may occur.

There is another problem: data consistency.

If the user changes his/her mobile phone number and the database is updated successfully, the cache in the memory needs to be updated synchronously, otherwise the result of the user's query may be inconsistent with the actual situation.

For example: the database update was successful, but the cache update in memory failed.

Or your application is deployed on multiple server nodes, and some memory caches are updated successfully, while others are restarting, causing the update to fail.

This solution may not only cause OOM problems in the application server, but also greatly increase the complexity of the system. Overall, it is not worth the cost.

2 Using database functions

Since the database stores encrypted strings, another solution is to use the database function to decrypt them.

We can use MySQL's DES_ENCRYPT function to encrypt and DES_DECRYPT function to decrypt:

 SELECT DES_DECRYPT('U2FsdGVkX1+q7g9npbydGL1HXzaZZ6uYYtXyug83jHA=', '123');

The encryption and decryption of all user privacy information in the application system are implemented at the MySQL layer, and there is no inconsistency in encryption and decryption.

When saving data in this solution, only the data of a single user is operated, the data volume is relatively small, and the performance is good.

However, when fuzzy querying data, it is necessary to use the DES_DECRYPT function each time to decrypt all the data in a certain privacy information field of the user in the database, and then perform fuzzy query using the decrypted data.

If the amount of data in this field is very large, the performance of each query will be very poor.

3. Save in segments

We can split a complete string into multiple small strings.

Take the mobile phone number as an example: 18200256007. It is split into groups of 3 digits. The split strings are: 182, 820, 200, 002, 025, 256, 560, 600, 007, these are 9 groups of data.

Then create a table:

 CREATE TABLE `encrypt_value_mapping` ( `id` bigint NOT NULL COMMENT '系统编号', `ref_id` bigint NOT NULL COMMENT '关联系统编号', `encrypt_value` varchar(255) NOT NULL COMMENT '加密后的字符串' ) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='分段加密映射表'

This table has three fields:

  • id: system number.
  • ref_id: The system ID of the main business table, such as the system ID of the user table.
  • encrypt_value: the encrypted string after splitting.

When the user writes the mobile phone number, the split mobile phone number group data is also written together, which can ensure the consistency of data in the same transaction.

If you want to fuzzy query a mobile phone number, you can directly use the encrypt_value of encrypt_value_mapping to fuzzily query the ref_id of the user table, and then query the user information through ref_id.

The specific sql is as follows:

 select s2.id,s2.name,s2.phone from encrypt_value_mapping s1 inner join `user` s2 on s1.ref_id=s2.id where s1.encrypt_value = 'U2FsdGVkX19Se8cEpSLVGTkLw/yiNhcB' limit 0,20;

In this way, we can easily search for the mobile phone number we want through fuzzy query.

Note that the encrypt_value here uses an equal sign. Since it is an equal value query, the efficiency is relatively high.

Note: The mobile phone number queried through the SQL statement here is encrypted. Before the interface is returned to the front end, it needs to be decrypted uniformly in the code.

For security reasons, you can also use asterisks to add some interference items to the encrypted plain text password to prevent the mobile phone number from being leaked. The final content displayed to the user can be displayed as follows: 182***07.

4 Other fuzzy queries

What should I do if, in addition to the user's mobile phone number, there are other user privacy fields that require fuzzy query?

We can expand the encrypt_value_mapping table and add a type field.

This field indicates the type of data, such as: 1. Mobile phone number 2. ID card 3. Bank card number, etc.

In this way, if there is a business scenario where the ID card and bank card number modules are queried, we can distinguish them by the type field, or we can use this solution to write the data to the encrypt_value_mapping table, and finally query different grouped data according to different types.

If the amount of data in the business table is small, this solution can meet the needs.

However, if the amount of data in the business table is large, one mobile phone number needs to save 9 pieces of data, and one ID card or bank card number also needs to save many pieces of data, which will cause a sharp increase in the data in the encrypt_value_mapping table and may make this table very large.

The final consequence is that query performance is greatly affected.

So, what should we do in this situation?

5. Add fuzzy query field

If the amount of data is large, grouping all user privacy information fields into one table will greatly affect query performance.

So, how to optimize it?

A: We can add fuzzy query fields.

Let’s take mobile phone fuzzy query as an example.

We can add an encrypt_phone field next to the mobile phone number in the user table.

 CREATE TABLE `user` ( `id` int NOT NULL, `code` varchar(20) NOT NULL, `age` int NOT NULL DEFAULT '0', `name` varchar(30) NOT NULL, `height` int NOT NULL DEFAULT '0', `address` varchar(30) DEFAULT NULL, `phone` varchar(11) DEFAULT NULL, `encrypt_phone` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='用户表'

Then when we save the data, we concatenate the grouped data.

Let’s take the mobile phone number as an example:

18200256007 is split into groups of 3 digits. The split strings are: 182, 820, 200, 002, 025, 256, 560, 600, 007, these are 9 groups of data.

After grouping and encryption, the data is separated by commas and concatenated into the following data:,U2FsdGVkX19Se8cEpSLVGTkLw/yiNhcB,U2FsdGVkX1+qysCDyVMm/aYXMRpCEmBD,U2FsdGVkX19oXuv8m4ZAjz+AGhfXlsQk,U2FsdGVkX19VFs60R26BLFzv5nDZX40U,U2F sdGVkX19XPO0by9pVw4GKnGI3Z5Zs,U2FsdGVkX1/FIIaYpHlIlrngIYEnuwlM,U2FsdGVkX19s6WTtqn gdAM9sgo5xKvld,U2FsdGVkX19PmLyjtuOpsMYKe2pmf+XW,U2FsdGVkX1+cJ/qussMgdPQq3WGdp16Q.

In the future, you can directly use SQL to fuzzy query the encrypt_phone field:

 select id,name,phone from user where encrypt_phone like '%U2FsdGVkX19Se8cEpSLVGTkLw/yiNhcB%' limit 0,20;

Note that the encrypt_value here uses like.

Why do we use commas to separate them here?

Answer: This is to prevent direct string concatenation. In extreme cases, the data of two groups do not originally meet the fuzzy search conditions, but when they are concatenated together, some of the conditions are met.

Of course, you can also change the comma to other special characters according to the actual situation.

In addition, similar solutions can be used for other user privacy fields if fuzzy query functions are to be implemented.

Finally, although this article introduces a variety of solutions for encrypting mobile phone numbers to implement fuzzy query functions, we must choose according to the actual business scenario. There is no best solution, only the most suitable one.

<<:  Why does the Android system use Binder as the IPC mechanism?

>>:  Apple releases dual system update!

Recommend

Why do airplane meals taste so bad? It may not be the food that’s to blame

When you are on an airplane, you may have had thi...

Android 9 Pie gesture operation test: Apple iPhone X is still better

I enabled Android's gesture navigation as a b...

Catch single atoms in a "trap" to measure the age of ice cores

The scientific expedition team drilled ice cores ...

Upgraded XO Orgasm Training [Kama Sutra Classroom] - Li Ximo

We have helped more than 300,000 women solve and ...

How many of these 10 tips for promoting mini programs do you know?

If you care about mini programs , then you should...

App promotion plan: free channels and paid channels!

In the initial stage of a startup team, how shoul...

Snowball Irish Thrush Stock Trading Training 2021 Course Video

Snowball Irish Thrush Stock Trading Training 2021...

Master Fang Wenshan shares his lyrics writing course

Master Fang Wenshan responded that he is the best...

“Meituan Takeout” product analysis report!

Take-out has become a must-have for urbanites, sa...