As we all know, artificial intelligence (AI) is very powerful nowadays. With high computing power hardware and software combined with ubiquitous cameras as our eyes, it is no longer difficult to distinguish whether an animal in a picture is a cat or a dog, or to find a target in a crowd. In tasks such as image classification and image target detection, the accuracy rate has almost reached the level of "100% accuracy", and even exceeds that of real people in many cases. Through face recognition and big data, AI vision systems can not only know “who are you?”, but also very likely guess “where are you from?” and “where are you going?” For these “three ultimate philosophical questions” that hit the soul, you don’t have to say a word, AI already “knows”. If a person does not want to be detected by the overwhelming AI visual systems, is there any good way? Cover your face with a hood or mask, break the camera, or unplug the computer power cord? These reckless actions will only make things worse, and expose yourself. The real trick is to deceive AI secretly and quietly. But AI is getting smarter and smarter, so how can it be easily fooled? In fact, deceiving AI is not only feasible, but there are also many ways to hack it. Copyright images in the gallery. Reprinting and using them may lead to copyright disputes. Although the current AI system represented by the deep neural network model can accomplish many tasks well, its internal structure is like a black box, and researchers have not fully understood the mechanism. Such a complex system is bound to have various loopholes (bugs). As long as you catch one of them and hit the key point, you can make AI fool around. Just like human eyes can produce various visual illusions, AI vision systems often "see things wrong". Even if an AI system normally performs very well, it can become completely out of control if it is tricked. An AI system is like an unstable student who can easily answer the final difficult questions in the exam, but will inexplicably make low-level mistakes and fail to answer the "easy questions". If only a trivial or even imperceptible change is made to an image, inducing the AI visual system to make a wrong judgment is called an adversarial sample attack. For electronic images on mobile phones and computers, users can arbitrarily change the value of each pixel, and it is very easy to change the content of the image. The difficulty lies in that, on the one hand, the changes made should be as subtle as possible and not easy to be seen, and on the other hand, the changes made should be enough to trigger the loopholes of the AI system and mislead the generation of wrong outputs. However, there are many optimization algorithms that are not afraid of difficulties and can generate such images, which are called adversarial samples. The following figure shows the various strange behaviors of AI vision systems after encountering electronic "image fraud" attacks. Figure 1: After encountering electronic "image spoofing" (adversarial sample) attacks, the AI vision system behaves strangely: the original panda picture is basically unchanged after adding optimized colored snowflakes to the human eye, but the AI system tells you that it is not a panda, but a gibbon; the Alps are seen as dogs, and pufferfish are seen as crabs, which is too outrageous; it can't even recognize the most basic handwritten numbers, insisting that 4 is 9, 3 is 7, and 8 is 1... This must be a lot of drinking [1, 2, 3] It has to be said that after being attacked by adversarial samples, the AI's vision is really not good. It is "lying with open eyes". The image recognition results are all nonsense, and the performance is far below the standard. In addition to giving results, the AI system will also give corresponding confidence levels, indicating how confident it is that the results are correct. However, the confidence levels of these misjudgment results are quite high, either 99% or 100%. It seems that the AI vision system is still very "ordinary" at this time. It is obviously very ordinary, but still so confident. If we say that magic attacks are used for electronic images, then physical attacks are needed for real objects. A simpler physical attack method is to put a small label and draw a few lines on a face, traffic sign or other object. Of course, the changes made are also carefully designed to make the AI vision system fail. Figure 2: A simple method of confusing the AI vision system by attaching small labels Don't underestimate this kind of prank. The "Stop" traffic sign in the picture will be mistaken by AI as "Speed Limit 45". If a self-driving car is really deceived by such a sign, a traffic accident may occur, resulting in the destruction of the car and the death of people. This vulnerability in the AI vision system deserves serious attention. Figure 3: When a girl wears special frames, she will be recognized by AI as a different girl Generally speaking, if someone's appearance can be recognized by the AI visual system, then there is a high probability that he can still be recognized even if he wears glasses. However, this is not necessarily the case with the special "glasses" mentioned above. This fake glasses has no lenses, only frames, and the colorful decorative patterns of the frames are also very alternative. They look like a contemporary art work on the outside, but are actually "exclusively customized" based on the defects of the AI model, which can make the AI visual system feel "like two different people." Adversarial sample attacks on physical objects are of course inseparable from optical means. Using a projector to project interference light patterns on the surface of an object is a simple and effective method. Using shadows that can be found everywhere as an attack method looks more natural and more concealed. It only requires placing the blocking object in the appropriate position to make the shadow look as desired. Figure 4: Optical adversarial attack using a projector (left) and an object’s natural shadow (right) Nowadays, smartphones are becoming more and more advanced in terms of camera functions, and the photos and videos they take are becoming clearer and clearer. Even a low-profile phone can basically provide satisfactory photo quality, but there is a difference between the real world faithfully recorded by digital camera equipment and the human eye. Not only does the AI software system itself have vulnerabilities, but the cameras and cameras that serve as AI visual input also have vulnerabilities. By "manipulating" the shooting process, objects that look normal to the human eye will be converted into strange photos on the image sensor, thereby misleading the AI's judgment. To deceive the camera, the first trick is to use the difference in the spectrum between the human eye and the camera sensor. Red, orange, yellow, green, cyan, blue, and purple are all called visible light because they are within the visual range of the human eye and can be seen, while infrared light with a longer wavelength than red light and ultraviolet light with a shorter wavelength than purple light are invisible to the human eye. The spectrum range of light signals that the sensors of ordinary mobile phones and cameras can receive is roughly similar to that of the human eye, but not exactly the same. They can often detect infrared light of a certain wavelength that is invisible to the human eye. Figure 5: Weird infrared light causes AI to always misidentify people: images taken by the camera (first row) and recognition results (second row) Some researchers have used an infrared LED lamp to place different light distribution patterns on a person's face. No matter how they look at it, there is nothing unusual in the eyes of real viewers. However, in the photos taken, there is always a purple area on the face, which will induce the AI face recognition system to produce "face blindness" and mistake the same person for multiple different people. A normal paper QR code used for online payment can be transformed into a completely different QR code in the eyes of a mobile phone camera after being illuminated by an infrared laser a hundred meters away, and become the entrance to a malicious website link without being noticed. Figure 6: Under a set of rapidly alternating projection patterns, the face is still the same in the human eye, but in the camera’s eyes it is a completely different “makeup-less” look. The second trick is to use the unique color fusion mechanism of the human eye. When red light and green light are displayed alternately at a rapid speed, such as 60 frames per second, the human eye will find it difficult to distinguish due to the fast flashing, and will only see the yellow light formed by the fusion of red light and green light. In contrast, the image sensor is more discerning, and records either red light or green light at each moment, not the fused yellow light. The projector quickly alternates between two patterns containing the face of the disguised target (such as Hillary) and projects them onto the real face. The real viewer sees a uniform light pattern after the two projection patterns are neutralized, which does not seem to affect the appearance of the face itself. However, in the photos taken by the mobile phone or camera, a face that is highly distorted by the projection pattern appears, and will be identified as the person in the projection pattern. This is equivalent to using a projector to "make up" the face, but the makeup is looming, and the makeup is automatically removed as long as it is not viewed through the camera. The third trick is to use the defects of the rolling shutter of the image sensor. The retina in the human eye is equivalent to the sensor in the camera, and both are used to record the image light signal. However, the difference is that when the retina records the entire two-dimensional image, it is done synchronously. For example, when looking at a face, the ears, eyes, nose and mouth are all seen at the same time. But many camera sensors are not like this. They adopt a line-by-line scanning method called rolling shutter. The light signal of a single image is also recorded separately line by line. Because the ears, eyes, nose and mouth are in different positions, they are not recorded synchronously, but there is a slight time difference. In this way, when the light that changes rapidly between light and dark happens to meet the camera sensor that just records a line of light signal in the image, the light is dark, and a black line will appear in the photo taken, and finally the whole picture becomes a zebra. For the human eye, because the light flickers too fast, it is completely impossible to feel the light flickering, and there will be no black lines in the picture you see. Figure 7: The rolling shutter effect causes black lines or colored streaks in photos taken by cameras under rapidly flashing lights. If we use three different colors of light, red, green, and blue, and calculate the on and off status of each light in each brief moment more carefully, the photos will no longer show simple black lines, but colorful stripes like a rainbow. Both black lines and colorful stripes can confuse the AI system and make it unable to work properly. In addition to exploiting the flaws of existing camera sensors, the researchers also tried a more proactive attack method, like hiding soldiers in the Trojan Horse, by adding an additional processing module to the optical path of the imaging system of a normal camera. This module can optically make slight modifications to the light signal of the captured image. Specifically, usually the light signal of the object image will be directly projected onto the image sensor after passing through the camera lens. However, in this unusual system, an additional module is added between the camera lens and the sensor. The module includes two lenses and a spatial light modulator. The first lens is equivalent to simulating the Fourier transform of the image with light field propagation, and then the spatial light modulator is used to adjust the phase of the transformation result, and then another lens is used to perform the inverse Fourier transform. The images processed by this special module will be slightly different from those taken by a normal camera. “The devil is in the details” and cleverly designed small changes in the input are enough to disrupt the normal operation of an AI system. Figure 8: A camera system with an embedded optical processor for generating adversarial images. Of course, facing various adversarial sample attacks, the designers of AI vision systems are not helpless. The two are in a relationship of spear and shield. The sharper the spear, the stronger the shield will be. In recent years, researchers have frequently held global AI adversarial attack and defense competitions, where participants can learn from each other in simulated scenarios and gather together to discuss their skills. As the saying goes, "the devil is one foot high, but the way is ten feet high." The ability of AI vision systems to resist adversarial attack is also improving, and it is becoming more and more perfect as various loopholes are filled. References [1] IJ Goodfellow, J. Shlens, and C. Szegedy, Explaining and Harnessing Adversarial Examples, arXiv:1412.6572 (2014) [2] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, Boosting Adversarial Attacks with Momentum, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), 9185-9193 (2018) [3] H. Ye, X. Liu, and C. Li, DSCAE: a denoising sparse convolutional autoencoder defense against adversarial examples, J. Ambient. Intell. Human Comput. 13, 1419–1429 (2022) [4] J. Fang, Y. Jiang, C. Jiang, ZL Jiang, S.-M. Yiu, and C. Liu, State-of-the-art optical-based physical adversarial attacks for deep learning computer vision systems, arXiv: 2303.12249 (2023) [5] M. Sharif, S. Bhagavatula, L. Bauer, and MK Reiter, Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition, In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16), 1528–1540 (2016) [6] A. Gnanasambandam, AM Sherman, and SH Chan, Optical Adversarial Attack, arXiv:2108.06247 (2021) [7] Y. Zhong, X. Liu, D. Zhai, J. Jiang, and X. Ji, Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15324-15333 (2022) [8] Z. Zhou, D. Tang, X. Wang, W. Han, X. Liu, and K. Zhang, Invisible mask: practical attacks on face recognition with infrared, arXiv:1803.04683 (2018) [9] “Paper QR codes can also be tampered with remotely: traceless attacks from a distance of 100 meters away, turning them into malicious website entrances in seconds”, QuantumBit WeChat Official Account, https://mp.weixin.qq.com/s/mNB-4mAfFCtcNtvSUW3x5Q [10] M. Shen, Z. Liao, L. Zhu, K. Xu, and X. Du, VLA: a Practical Visible Light-based Attack on Face Recognition Systems in Physical World, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3(3), 103 (2019) [11] Z. Chen, P. Lin, ZL Jiang, Z. Wei, S. Yuan, and J. Fang, An Illumination Modulation-Based Adversarial Attack Against Automated Face Recognition System, In Information Security and Cryptology: 16th International Conference (Inscrypt 2020), 53–69 (2020) [12] A. Sayles, A. Hooda, MK Gupta, R. Chatterjee, and E. Fernandes, Invisible Perturbations: Physical Adversarial Examples Exploiting the Rolling Shutter Effect, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14661-14670 (2020) [13] K. Kim, J. Kim, S. Song, J.-H. Choi, C. Joo, and J.-S. Lee, Engineering pupil function for optical adversarial attacks, Optics Express 30(5), 6500-6518 (2022) [14] Zhang Zihao, “Neural networks are too easy to fool? How did the Tsinghua team win three championships in the NIPS attack and defense competition?” Programmer Good Things WeChat Official Account, https://mp.weixin.qq.com/s/k0dCmIhwMsqvsR_Fhhy93A Planning and production Source: Light Science Forum/China Optics Editor: Zhong Yanping |
>>: A "tea-making party" that ignites your taste buds may quietly poison you
The May Day holiday is just a few days away, and a...
As the Spring Festival approaches, I am reminded ...
Introduction: This year, the theme that the entir...
Recently, the news that "Martial arts film s...
Musk is the richest man in the world, and he owns...
Smart hardware is very popular. However, when mor...
520 is coming Your love's call Emm...it shoul...
"This new product is great, you have to sell...
With the development of the Internet, traffic has...
Recently, the China Electric Vehicle 100 Associat...
The pharmaceutical and healthcare industry is clo...
Microsoft is very experienced in manufacturing PC...
Microsoft released the first technical preview of...
The "2017 China OTT Large Screen Marketing L...
On November 1, 2022, the China Automobile Dealers...