Technical Analysis: How can Apple and Google’s “Health Code” track the epidemic while protecting privacy?

Technical Analysis: How can Apple and Google’s “Health Code” track the epidemic while protecting privacy?

Apple founder Steve Jobs once felt betrayed and was at odds with Google's former chairman Eric Schmidt.

Now that all of humanity is facing the threat of the new coronavirus, the two once-hostile companies have come together to jointly develop a set of contact tracing technology.

Because it can serve the functions of epidemiological tracing and infection risk alert to a certain extent, this technology is also nicknamed the American version of the "health code."

However, the implementation ideas of the Health Code and Contact Tracing are also very different. The former relies more on big data and has a high privacy risk, while the latter is mainly based on Bluetooth and its design reflects considerations for privacy protection.

Today, Silicon Star will show you how Contact Tracing works.

“Three codes in one”, anonymous tracking

Contact Tracing works roughly as follows:

Devices use low-power Bluetooth beacons to act as senders and receivers, exchanging and saving each other's information.

When a person A is diagnosed, the information of A's device is uploaded to a confirmed database in the cloud. All users will download the information from this cloud database every day and check it locally. If B has received the same information before, it can be considered that B is a contact of A.

Of course, if the unique identification information of the device is used directly, there will be greater privacy risks. Therefore, for the purpose of privacy protection, Apple and Google have painstakingly designed a "three-in-one" mechanism.

To make it easier for more readers to understand, we simply refer to these three codes as A, B, and C codes respectively.

First, each user's mobile phone will generate a fixed A code, which will not be uploaded;

Through the A code, the mobile phone can generate a B code every day, which is not uploaded at other times;

Since contact tracing is to be achieved, the mobile phone needs to broadcast to the outside world once every period of time (the API recommends 15 minutes). At this time, the C code generated by B is broadcasted, which is updated every 15 minutes/a broadcast cycle. Usually only this C code will be uploaded.

Since this Contact Tracing is implemented through an API, both iOS and Android devices can broadcast and receive from each other.

Through this design, Apple and Google hope to maximize the guarantee of:

  • The information exchanged is itself anonymized;
  • Even if privacy protection requirements are met, information can still be decrypted and contacts can be located when necessary.

A slightly more detailed explanation:

  • Code A is called Tracing Key. It is generated when Contact Tracing is first started on the phone. It is 32 bytes long, unique to the device, and will not change. Although the name contains "Tracing", Code A is not actually uploaded. It is only saved on the phone and has no identification function. It is only used as input for calculating Code B and Code C in the next step.

The A code is a random number generated using a cryptographic random number generator (CRNG).

When you use the contact tracing software developed by other governments, companies or institutions based on the Apple and Google solutions on your iPhone or Android phone for the first time, your phone will automatically generate such a random and unique A code.

This code has nothing to do with your phone’s known identification codes, such as serial number or MAC address, and will not be uploaded, so there is almost no privacy risk.

The B code is called the Daily Tracing Key, which is derived from the A code using the HKDF function and is 16 bytes long. It is updated every 24 hours.

The B code will not be uploaded when nothing is wrong. Its main function is to serve as input to generate the C code. The B code will only come in handy when a diagnosis is confirmed. If the user is healthy and has no risk of exposure, the B code will also be stored on the phone forever and will not be uploaded.

Assuming that user A has had contact with confirmed case B in the past few days and is at risk of infection, A's B code in these days will be used as "diagnostic keys" and extracted to confirm identity. The B code will only be extracted at this time. In general, no matter what, the B code will not be uploaded by default (automatically).

What is the C code uploaded?

The C code is called the Rolling Proximity Identifier, which is a message authentication code (HMAC) generated by further encrypting the B code. It is 16 bytes long and is broadcast to all surrounding devices every 15 minutes via low-power Bluetooth. Mobile phones installed with Contact Tracing can receive and save this code.

To protect privacy and avoid tracking, the Bluetooth MAC address of a device can change randomly starting from Bluetooth 4.2. In line with this, Contact Tracing will generate a new C code every time the Bluetooth MAC address changes.

In addition to broadcasting, the mobile phone will also save all C codes generated in the past period of time for verification when tracing contacts.

How is contact tracing done?

If you understood the previous section, this one will be easier to understand.

We assume that A is diagnosed, for example, the traceability period is within 14 days, then use the B code of A's mobile phone in the past 14 days as the "diagnostic code" and upload it to the cloud.

All users’ mobile phones will download the diagnosis codes of all confirmed patients from the server once a day, and then calculate them locally using the same encryption algorithm as that used to calculate the C code.

We assume that A and B have been in the same room for a period of time in the past 14 days, so B's phone must have saved the C code from A. If B's ​​phone calculates and the C code appears in the records saved in the past 14 days, the contact tracing and verification is completed.

The Contact Tracing Bluetooth Working Principle released by Apple and Google provides a picture to illustrate this process:

Advantages and Disadvantages

In general, the most direct advantage of this technical implementation method is that it can maximize the protection of user privacy while meeting the functional requirements.

In the white paper on Contact Tracing encryption and Bluetooth working principles, the two companies pointed out that the generation cycle of the first two codes is fixed, which can prevent other applications from obtaining and using them for irrelevant tracking purposes.

The uploaded information does not include geolocation information and is strictly limited to the use of low-power Bluetooth beacons.

The C code is bound to the B code. Without the B code, it is meaningless to obtain the C code.

For example, if a government uses this API to develop a tracking tool, the administrator on the server side cannot know which other users a healthy user has contacted - the administrator can only do this for confirmed patients.

Even if a confirmed patient appears, the calculation of the diagnostic code uploaded by him is only performed on the mobile phones of other users, not on the server.

Since the Bluetooth broadcast and scan intervals are set, and each C code is only 16 bytes long, this technical implementation method is also relatively power-saving and storage-saving. Overall, its power consumption should not be very significant, at least not to an exaggerated level. Of course, the specific broadcast and scan intervals can be set by the operator of the tracking tool, and Apple and Google also recommend that operators consider power consumption.

In the white paper, Apple and Google also warned operators of tracking tools not to extract other irrelevant metadata from users' phones.

This technical implementation also has some disadvantages.

The most direct disadvantage comes from the working principle of Bluetooth itself.

The maximum theoretical distance of low-power Bluetooth can reach 100 meters, and it also has a certain ability to penetrate walls (there is often no problem with a concrete wall, and the barrier effect of metal is stronger).

This means that if you just "brushed past" a stranger a few dozen meters away in the open air with good air flow, or if you were clearly in two different rooms with him, as long as both of your phones were installed with Contact Tracing and had Bluetooth turned on - as long as that person was diagnosed, there is a high chance that you would "unfortunately" become a contact.

In addition, some people should be able to see that Contact Tracing is imperfect: even if the contact is found and notified, this mechanism alone cannot locate him. Only the contact himself knows that he has become a contact. Whether he is willing to actively cooperate with home isolation in the future ultimately depends on his personal will, and the authority is largely powerless.

These features may need to be developed by app developers themselves, but doing so would to some extent violate the original intention of the Contact Tracing privacy protection design.

In response to this, the instructions provided by Apple and Google state that what to do next after the contact receives the notification requires further instructions from the health department's website or app.

There is also a possibility of making trouble, that is, deliberately generating and broadcasting a large number of forged C codes. However, apart from occupying the storage space of the user's mobile phone, this attack method currently has no other visible harm.

Of course, this is still a relatively early technology, and Apple and Google are not the actual implementers. The specific effect of its use depends on the government or other companies that adopt this technology.

The APIs for iOS and Android are already published on both companies' websites.

<<:  Android 11 DP3 new features summary: Independently set the left and right gesture sensitivity of the curved screen

>>:  I've experienced the iPhone SE2 for a day, and I've talked about its pros and cons. I understand it after reading it.

Recommend

Does the more legs an animal has, the more perfect it is?

Recently I saw an interesting topic: "Why do...

Three minefields in titles of new media operations!

I don’t know if you have had similar experiences....

Microbes: We are the engine of soil material circulation

Microorganisms are everywhere in nature, and soil...

Can you share your private Wi-Fi?

The Electronic Frontier Foundation (EFF) has propo...

How can we as a self-media person receive and complete more orders?

In general, I think, first of all, you must dare ...

Who is the suitable group for Wenchang Tower?

Friends who have read the previous introduction t...

Chen Nian's Low-Risk Financial Management Course Video

Chen Nian's Low-Risk Financial Management Cou...