For Apple fans, the most discussed topic about the new iPhone X is its unlocking method, the successor to TouchID - FaceID. For the new generation of borderless phones, major mobile phone manufacturers have to develop new ways to unlock the phone to maintain its borderless shape. Some of Apple's competitors continue to use traditional fingerprint recognition sensors, but just put them in other places that do not affect the shape. This time, Apple has taken the lead in the technological trend and innovated it, inventing a simpler and faster method - users only need to look at the phone to complete the unlocking process.
Thanks to the advanced front-facing depth camera, the iPhone X is able to create a stereo image of the user's face. In addition, an infrared camera is used to capture the image of the user's face, which is to make the system more robust to changes in ambient light, color, etc. After that, through deep learning, the smartphone is able to learn the details of the user's face so well that the phone can automatically recognize the identity and unlock it every time the user looks at the phone. Some people will question the accuracy of this method. After all, a person's fingerprint will not change, while the characteristics of a person's face will change with whether he has a beard, wears glasses, wears makeup, etc. Surprisingly, Apple advertises FaceID as more secure than the TouchID method, and its error rate is only 1:1,000,000. The whole system process looks simple, it is nothing more than obtaining the user's face image and then using the deep learning method to perform face recognition. I am curious about how the deep learning method is applied in the whole process and how each step is optimized to obtain such a high recognition accuracy. This article will tell you how I use Kears to implement an algorithm similar to FaceID. Understanding FaceID “Neural networks give FaceID the ability to do more than just perform simple classification processes” The first step is to carefully analyze how FaceID works on the iPhone X. Their white paper can help us understand how FaceID works. When using TouchID, users must first register their fingerprints, which requires pressing on the fingerprint sensor several times. After collecting about a dozen different positions, the smartphone completes the entire registration process. Similarly, FaceID also requires users to first register their facial information. This process is simpler. Users only need to look at the phone and then slowly turn their heads along a circle, so that facial information from different angles can be registered. This extremely fast registration method covers many underlying learning algorithms, which will be introduced one by one below. For a neural network, performing the classification process means learning to predict whether the face seen by the smartphone is the one registered with it. Therefore, some training data should be used to learn the ability to distinguish "real" or "fake". But in principle, this is different from many deep learning cases. Some traditional deep learning methods use a large training data set to train the model, which will take a lot of time, effort, etc. In addition, for Apple, it will not be adopted to first train a more complex offline "network model" and then migrate it to the phone after training. I believe FaceID is based on something like a Siamese CNN and is trained offline. The face is mapped into a low-dimensional latent space, maximizing the distance between different faces, and the performance of the model is measured using contrastive loss. From faces to neural networks A Siamese neural network basically consists of two identical neural networks that share all weights between them. This architecture can learn to calculate distances between specific data types, such as images. The idea is to map the user images into a low-dimensional feature space, similar to an n-dimensional array, through the Siamese neural network, and then train the network to map so that data points from different categories are as close as possible, while data points from the same category are as close as possible. In the final analysis, the network will learn to extract the most meaningful features from the data and compress them into an array, and then create a meaningful mapping. Siamese neural networks can do this, and so can autoencoders. Using this technique, one can use a large number of faces to train such a network model to recognize which face is most similar to the user. As Apple has done, one can use more difficult image data to enhance the network's robustness to twins, adversarial attacks (masks), etc., so that it has the correct prediction and calculation capabilities. One of the biggest advantages of using this method is that you get a plug-and-play model that can recognize different users, simply by mapping the images taken during the initial setup into the latent space without any further training. In addition, FaceID is able to adapt to changes in all aspects of you: sudden changes (such as glasses, hats, makeup, etc.) and slow changes (beards, etc.). These are calculated based on the new appearance by adding a reference vector in the mapped feature space.
Implementing FaceID with Kears For all machine learning projects, the first thing you need is data. Creating your own dataset takes a lot of time and effort. Therefore, this article obtained an RGB-D face dataset by browsing the web. These RGB-D image data are composed of a series of faces facing different directions and with different facial expressions, which is the same data used by iPhone X. To see the final implementation, you can check out my personal GitHub page, where you can find a Jupyter Notebook. In addition, I used Colab Notebook to complete the experiments in this article. This paper creates a convolutional neural network model based on SqueezeNet, which takes RGBD face images as input to the network and outputs the distance between the two maps. The contrast loss used in model training ultimately minimizes the distance between photos belonging to the same person and maximizes the distance between photos of different people. After some training, the network is able to map faces to a 128-dimensional array, which will result in photos of the same person being grouped together and as far away from other people as possible. This means that in order to unlock a mobile device, the network model only needs to calculate the distance between the picture taken during the unlocking process and the picture stored during the previous registration phase. If the distance is below a certain threshold (the smaller the value, the safer), the device is unlocked. I used the T-SNE algorithm to visualize two of the 128 dimensions, with each color corresponding to a different person. As you can see, the network has learned to group these images. In addition, when using the PCA dimensionality reduction algorithm, the resulting visualization is also very interesting. experiment The experiment simulates the entire FaceID process: first, the user's face is registered; then, in the unlocking phase, the model calculates the distance between the face detected during unlocking and the previously registered face, and determines whether it is below a set threshold, and finally determines whether the phone should be unlocked. Now let's start by enrolling a user: We take a series of photos of the same person from the dataset and simulate the enrollment process. The device computes feature maps for these images and stores them in local memory.
Now let’s see what happens if the same user tries to unlock the device. Different poses and facial expressions of the same user all get a lower distance, around 0.30 on average. Let’s see what happens if different users try to unlock the device. The average distance calculated for the face images of different users is 1.10. Therefore, using a threshold of around 0.40 should be enough to prevent strangers from unlocking your device. in conclusion This article mainly shows the basic working mechanism of FaceID unlocking the machine. The method adopted is based on face mapping and twin convolutional neural network. The Python code of this article can be obtained here. I hope this article is helpful to you. Author Information Norman Di palo, a student at the University of Rome, focuses on artificial intelligence and robotics. |
<<: Small flaws in iOS native image tagging that you may not have noticed
>>: Should I give up wireless charging to avoid iPhone battery aging?
[[131972]] The 10 mistakes listed in this article...
Baidu's zero-threshold oCPC has been online f...
After the app is launched, the next important tas...
Everyone knows how popular TikTok is today. If yo...
Not long ago, the well-known market analysis orga...
Hoh Xil Located in the western part of Yushu Tibe...
Tencent products, a high-quality performance mark...
How to create a hit on Douyin? What are the six m...
This article shares with you a case study of cust...
Review expert: Zhou Hongzhi, senior experimenter ...
Recently, Google suddenly released the Android 12...
For SEO trainees, most of them learn SEO in order...
The sales volume of tens of millions of units is ...
After the underlying LeTV business develops, hard...
Recommendations for places to taste tea in Jiangn...