Do you find a white A4 paper on the white desktop? Do you know if your action is threatening one second in advance? If you are in danger, make an "SOS" gesture to the camera and the police will come to help you? Are these too much like sci-fi movies? Megvii can do it. Bill Gates exclaimed "This is very cool" after hearing the product introduction. The New York Police Department (NYPD) came to find a solution, and Nvidia listed it as a customer as important as Xiaomi... Why is a domestic company founded in 2013 so "high-profile"? DeepGlint is creating computer eyes that can understand reality, and the first step is to protect our safety. Start with security monitoring In the summer of 2012, I put my schoolbag in the luggage locker at the library. After borrowing books, I found that my wallet was missing. I called the police, checked the surveillance video, and confirmed that someone took my schoolbag and wallet from the locker and put it back. However, there was still no news of my wallet until graduation. I heard that this is a repeat offender, and the school has been searching for him many times but failed to catch him. If the camera could identify this person, recognize him, and automatically call the police, maybe everything would be different. Security monitoring has always been considered to replace a large number of manual labor, extend the human eye's observation distance, and work in harsh environments. But in fact, they are still just systems that use optical fiber, coaxial cable or microwave to transmit video signals in a closed loop. Although they can play recorded images in real time, they still need to trigger the on-site alarm system to arouse vigilance when facing threats. If no one is watching, these real-time images are meaningless. Even if it is for the purpose of finding evidence, it is necessary to review the video afterwards. Finding clues in the blurry video is an extremely arduous task. Can these cameras understand the world like our eyes and detect dangers and anomalies on their own? People use their two eyes to obtain raw three-dimensional data, and then the brain processes the information to make appropriate responses. In the past decade or so, researchers have always believed that optical lenses + computer algorithms can understand our world, but optical lenses lose important information about the three-dimensional world - depth. Equipment used by DeepGlint DeepGlint's device looks a bit different from ordinary security monitoring equipment. Compared with the general spherical single camera, it uses three cameras in parallel: the left one is the same RGB camera as ordinary security cameras, and the other two are laser transmitters and receivers, which are very similar in appearance to Microsoft Kinect. Can we really understand our world through it? When Zhao Yong, CTO of DeepGlint, was still at Google, he believed that if computers wanted to understand images, they had to go through the three-dimensional path. Through the emission and reception of laser transmitters, the camera has the ability to perceive three-dimensional spatial changes by using structured light sources to achieve depth. But this is only the first step. Receiving light through the human eye only provides information. To truly "understand" the image, the brain needs to convert the light signal into a neural signal. The core of Megvii is a complete system that converts the original data of the three-dimensional world into the most original data that computers can understand. Let machines understand the world "DeepGlint can do two things: the first is based on people. If there are a dozen or twenty people walking around in a room, such as in a subway, we can track the trajectory and speed of pedestrians very accurately. The other is to identify people's body movements at medium and long distances, and people's hand movements at close range." He Bofei, CEO of DeepGlint, told GeekPark.
DeepGlint CEO He Bofei explains the device principle to Geek Park Light travels in a straight line, so how does DeepGlint's device ensure that the shielding between people will not affect the system's judgment? Because people are continuous - they can neither appear out of thin air nor disappear out of thin air, which is also the premise of DeepGlint's algorithm. When shielding occurs, the system will track until the "missing" person appears again. So how does DeepGlint predict crime in advance? Model all abnormal (pushing, bumping) behaviors and then match them? It doesn't have to be that complicated. Take violent behavior as an example. The speed, amplitude and intensity of the movements of people in the space are measured. The intensity of violent movements is very different from that of normal movements. By analyzing and judging abnormal behaviors through the amplitude of body movements, if the amplitude of the person's movements exceeds the safety value, DeepGlint can realize an alarm 0.5 seconds or 1 second before the behavior occurs. At present, banks, especially ATM self-service banks, are the main application scenarios of DeepGlint. A system with learning capabilities is placed in an ATM environment. In about a month, the system can learn that most people enter the door, queue, walk to the machine, insert the card, press the keyboard, wait for a while to withdraw money and leave, and considers such a process to be normal behavior. If someone enters a business hall in the urban-rural fringe area of Beijing at 10 o'clock in the evening, but does not withdraw money but squats in the corner, the system will consider this to be an abnormal situation and report it. Or if someone makes a lot of movements at the card insertion port, they may be installing a card reader or a membrane keyboard, and the system will also prompt an abnormality. Although the product is called an unmanned security monitoring system, DeepGlint has no intention of replacing all monitoring manpower with it. The human world is too complicated, and machines will help humans free themselves from repetitive work, but the ultimate decision still needs to be made by humans. The DeepGlint system exists to greatly improve the efficiency of security personnel, telling them "Hey, something is not right here, check to see if there is any problem?" rather than replacing them. Would three-dimensional data be much larger? Can traditional computers really handle this data? Yes, the total amount of 3D data is much larger than 2D data, so DeepGlint chose to structure all data locally before uploading it to the cloud. The bandwidth usage is no different from the current 2D security monitoring. As for whether existing computers can handle it, it depends on the GPU - this is why NVIDIA values DeepGlint. A computer vision + artificial intelligence company In April 2013, just three months after its establishment, DeepGlint received joint angel investment from ZhenFund and China Unicom. In June this year, DeepGlint received tens of millions of dollars in Series A investment from Sequoia Capital. Taking the elevator, withdrawing cash from the ATM, shopping in the supermarket, surveillance cameras are everywhere. Do you think there are 10,000 "eyes" watching us in Beijing T3 Terminal? The answer is five times - 50,000. In the eyes of CEO He Bofei, security monitoring has a larger market than smartphones, and banking is only a part of it. After sorting out the entire process through these projects, it is only natural for DeepGlint to enter other industries in parallel. But it is impossible for one team or one company to change the entire industry. DeepGlint often calls itself a "computer vision + artificial intelligence company." They hope that after the success of the "verification point" of security monitoring, they can provide a platform based on computer vision, so that more people in more industries can access it and experience the unprecedented power that this technology can bring. For example, in the medical field, currently heart surgery requires artificially stopping the heart and switching to extracorporeal circulation. Computer vision can synchronize the movement of the scalpel with the heartbeat, thus achieving relatively static heart surgery. This application is in the experimental stage, and perhaps in the near future everyone can benefit from it. Or it can sense unexpected situations of empty-nest elderly people at their homes and remind their families in time, or sense the learning effects of students through their expressions in the classroom to improve the teaching plan... Computer vision with perception capabilities can bring more imagination to this world.
There is a big X on the ceiling of the Green Deep Eyes conference room, representing the unknown. DeepGlint hopes to be an artificial intelligence company in the future. "At that stage, I hope to combine the cognitive and perceptual capabilities of computers to do something very interesting." If you think DeepGlint has been "deified", it means that the CV field needs more attention and more participation. Compared with short-term to C projects, the field of artificial intelligence is full of variables, so it is destined to be more thorny and pregnant with more possibilities. Geeks are people who are aware of trends and then immerse themselves in doing them - this is how He Bofei interprets the "geek" spirit. Green Eyes is also constantly adjusting its pace and direction according to reality, and every visit brings new changes. When the Internet and cloud become the foundation, and machine learning and big data become the norm, do you think the next trend will be artificial intelligence? |
<<: Mobile front-end: mobile page pitfalls and pitfall prevention techniques
>>: Qt 5.4 released to help cross-platform application development and device creation
You may have hundreds or even thousands of WeChat...
We know that each keyword has its own part of spe...
1. Why should we use framework thinking? As an op...
For bidders, after setting the bid, the most impo...
Editor's note: When it comes to smart hardwar...
Black holes, white holes, and wormholes are all s...
I believe that most people have this simple and p...
Why do we want to make a hit product? How do hot-...
Leviathan Press: In 1974, researcher Larry Weiskr...
How much does it cost to attract investment for t...
Paying for beauty has become a daily routine for ...
My favorite animal literature writer, Vi Bianchi,...
Macaca macaca is an open-source automated testing...
If your child To play with others and please othe...
The online learning platform built based on moder...