Implementing iPhone X’s FaceID feature with Python and deep learning

For Apple fans, the most discussed topic about the new iPhone X is its unlocking method, the successor to TouchID - FaceID. For the new generation of borderless phones, major mobile phone manufacturers have to develop new ways to unlock the phone to maintain its borderless shape. Some of Apple's competitors continue to use traditional fingerprint recognition sensors, but just put them in other places that do not affect the shape. This time, Apple has taken the lead in the technological trend and innovated it, inventing a simpler and faster method - users only need to look at the phone to complete the unlocking process.

[[222676]]

Thanks to the advanced front-facing depth camera, the iPhone X is able to create a stereo image of the user's face. In addition, an infrared camera is used to capture the image of the user's face, which is to make the system more robust to changes in ambient light, color, etc. After that, through deep learning, the smartphone is able to learn the details of the user's face so well that the phone can automatically recognize the identity and unlock it every time the user looks at the phone. Some people will question the accuracy of this method. After all, a person's fingerprint will not change, while the characteristics of a person's face will change with whether he has a beard, wears glasses, wears makeup, etc. Surprisingly, Apple advertises FaceID as more secure than the TouchID method, and its error rate is only 1:1,000,000.

The whole system process looks simple, it is nothing more than obtaining the user's face image and then using the deep learning method to perform face recognition. I am curious about how the deep learning method is applied in the whole process and how each step is optimized to obtain such a high recognition accuracy. This article will tell you how I use Kears to implement an algorithm similar to FaceID.

Understanding FaceID

“Neural networks give FaceID the ability to do more than just perform simple classification processes”

The first step is to carefully analyze how FaceID works on the iPhone X. Their white paper can help us understand how FaceID works.

When using TouchID, users must first register their fingerprints, which requires pressing on the fingerprint sensor several times. After collecting about a dozen different positions, the smartphone completes the entire registration process. Similarly, FaceID also requires users to first register their facial information. This process is simpler. Users only need to look at the phone and then slowly turn their heads along a circle, so that facial information from different angles can be registered. This extremely fast registration method covers many underlying learning algorithms, which will be introduced one by one below.

For a neural network, performing the classification process means learning to predict whether the face seen by the smartphone is the one registered with it. Therefore, some training data should be used to learn the ability to distinguish "real" or "fake". But in principle, this is different from many deep learning cases. Some traditional deep learning methods use a large training data set to train the model, which will take a lot of time, effort, etc. In addition, for Apple, it will not be adopted to first train a more complex offline "network model" and then migrate it to the phone after training. I believe FaceID is based on something like a Siamese CNN and is trained offline. The face is mapped into a low-dimensional latent space, maximizing the distance between different faces, and the performance of the model is measured using contrastive loss.

From faces to neural networks

A Siamese neural network basically consists of two identical neural networks that share all weights between them. This architecture can learn to calculate distances between specific data types, such as images. The idea is to map the user images into a low-dimensional feature space, similar to an n-dimensional array, through the Siamese neural network, and then train the network to map so that data points from different categories are as close as possible, while data points from the same category are as close as possible. In the final analysis, the network will learn to extract the most meaningful features from the data and compress them into an array, and then create a meaningful mapping. Siamese neural networks can do this, and so can autoencoders.

Using this technique, one can use a large number of faces to train such a network model to recognize which face is most similar to the user. As Apple has done, one can use more difficult image data to enhance the network's robustness to twins, adversarial attacks (masks), etc., so that it has the correct prediction and calculation capabilities. One of the biggest advantages of using this method is that you get a plug-and-play model that can recognize different users, simply by mapping the images taken during the initial setup into the latent space without any further training. In addition, FaceID is able to adapt to changes in all aspects of you: sudden changes (such as glasses, hats, makeup, etc.) and slow changes (beards, etc.). These are calculated based on the new appearance by adding a reference vector in the mapped feature space.

[[222679]]

Implementing FaceID with Kears

For all machine learning projects, the first thing you need is data. Creating your own dataset takes a lot of time and effort. Therefore, this article obtained an RGB-D face dataset by browsing the web. These RGB-D image data are composed of a series of faces facing different directions and with different facial expressions, which is the same data used by iPhone X.

To see the final implementation, you can check out my personal GitHub page, where you can find a Jupyter Notebook. In addition, I used Colab Notebook to complete the experiments in this article.

This paper creates a convolutional neural network model based on SqueezeNet, which takes RGBD face images as input to the network and outputs the distance between the two maps. The contrast loss used in model training ultimately minimizes the distance between photos belonging to the same person and maximizes the distance between photos of different people.

After some training, the network is able to map faces to a 128-dimensional array, which will result in photos of the same person being grouped together and as far away from other people as possible. This means that in order to unlock a mobile device, the network model only needs to calculate the distance between the picture taken during the unlocking process and the picture stored during the previous registration phase. If the distance is below a certain threshold (the smaller the value, the safer), the device is unlocked.

I used the T-SNE algorithm to visualize two of the 128 dimensions, with each color corresponding to a different person. As you can see, the network has learned to group these images. In addition, when using the PCA dimensionality reduction algorithm, the resulting visualization is also very interesting.

experiment

The experiment simulates the entire FaceID process: first, the user's face is registered; then, in the unlocking phase, the model calculates the distance between the face detected during unlocking and the previously registered face, and determines whether it is below a set threshold, and finally determines whether the phone should be unlocked.

Now let's start by enrolling a user: We take a series of photos of the same person from the dataset and simulate the enrollment process. The device computes feature maps for these images and stores them in local memory.

[[222680]]

[[222681]]

Now let’s see what happens if the same user tries to unlock the device. Different poses and facial expressions of the same user all get a lower distance, around 0.30 on average.

Let’s see what happens if different users try to unlock the device. The average distance calculated for the face images of different users is 1.10.

Therefore, using a threshold of around 0.40 should be enough to prevent strangers from unlocking your device.

in conclusion

This article mainly shows the basic working mechanism of FaceID unlocking the machine. The method adopted is based on face mapping and twin convolutional neural network. The Python code of this article can be obtained here. I hope this article is helpful to you.

Author Information

Norman Di palo, a student at the University of Rome, focuses on artificial intelligence and robotics.

<<: Small flaws in iOS native image tagging that you may not have noticed

>>: Should I give up wireless charging to avoid iPhone battery aging?

I studied more than 100 fission activity cases and summarized 6 experiences!

What are the functions of Foshan home furnishing and building materials mini program? How much does it cost to develop a building materials distribution mini program?

In recent days, we have received a lot of inquiri...

He weaves a space computing network by the Zhijiang River

“Which is more important, talent or hard work?” &...

China Passenger Car Association: Wuling Hongguang MINI EV sold 41,255 units in October 2022, a year-on-year decrease of 13.8%

The China Passenger Car Association released the ...

Tesla uses solar power to generate electricity for the entire island, and it only takes 7 hours to fully charge

Recently, according to foreign media reports, Tes...

The World Health Organization announced that it will be completely eradicated by the end of 2023! All about its files and "crimes" are explained at once

Image from: freepik.com Recently, the World Healt...

Implementing iPhone X’s FaceID feature with Python and deep learning

I studied more than 100 fission activity cases and summarized 6 experiences!

APP operation and promotion: How to keep users and create value?

How does private domain operation conduct private domain operation?

Learn this method well, you can solve 80% of Logo design problems

Jiuzhaigou earthquake, brands must stop issuing blessing posters!

New Studio·Douyin Practical Training Camp

Google has returned to China with its head down. Can it pay the price for its past willfulness?

Seven Australian universities are facing bankruptcy. Which Australian universities are they?

Can the 1799 yuan price tell the truth? Meizu MX4 full review

Bluepill: LinkedIn's open source iOS parallel UI testing tool

Recommend

What happened to those who loved drinking sparkling water?

Polluting the environment? Destroying the ecology? Fertilizers feel wronged

How did Chen Danian create an application as large as WeChat by working only 6 hours a day?

A 200-pound crocodile drowned? Don't laugh, crocodiles are also afraid of choking on water, okay?

How does the Xiaomi TV 3S 60-inch TV from the future interpret artificial intelligence?

How come birds are also playing the "fourth love"?

How to choose a children's electric toothbrush? At what age can children use it? An article will solve your confusion

What are the functions of Foshan home furnishing and building materials mini program? How much does it cost to develop a building materials distribution mini program?

He weaves a space computing network by the Zhijiang River

China Passenger Car Association: Wuling Hongguang MINI EV sold 41,255 units in October 2022, a year-on-year decrease of 13.8%

Tesla uses solar power to generate electricity for the entire island, and it only takes 7 hours to fully charge

The World Health Organization announced that it will be completely eradicated by the end of 2023! All about its files and "crimes" are explained at once

Baidu bidding is becoming increasingly difficult, how to deal with it?

Is the TV industry about to change? PPTV's open strategy is implemented, stirring up the smart TV market

3,000 km "Green Great Wall"? China's largest desert is surrounded! | Environmental Trumpet