MIT proposes a network dissection framework to automatically peek into the black box of neural network training

New MIT technology helps illuminate the inner workings of neural networks trained on visual data.

[[195593]]

Neural networks learn how to perform computational tasks by analyzing large training data sets, and they are responsible for many of today's best-performing artificial intelligence systems, such as speech recognition systems, automatic translators, and self-driving cars. But neural networks are black boxes, and once they are trained, even their designers do not understand how they work: what data they process and how they process it.

Two years ago, a team of computer vision researchers from MIT’s CSAIL lab described a way to peer into the black box of neural network training to recognize visual scenes. The method provided some interesting insights, but required sending the data to human reviewers via Amazon’s Mechanical Turk crowdsourcing service.

At this year's CVPR conference, CSAIL researchers upgraded the above system and will present a fully automated version. The previous paper gave an analysis of one neural network (on one task), and the new paper will give an analysis of four neural networks (over 20 tasks), including tasks such as recognizing scenes and objects, coloring grayscale images, and solving puzzles. Some new networks are too large, so it is too expensive to analyze them using the old method.

The researchers also conducted several sets of experiments on the Internet, which not only revealed the characteristics of various computer vision and computational-photography algorithms, but also provided some evidence for how the human brain is organized.

The name of the neural network comes from the simulation of the human nervous system, which has a large number of relatively simple but densely connected information processing nodes. Similar to neurons, the nodes of a neural network receive information signals from neighboring nodes, and then activate and release their own signals, or do not react. Like neurons, the advantage of node activation reactions is that they can change.

In both papers, MIT researchers modified neural networks and trained them to complete computer vision tasks in order to reveal how each node responded to different input images. They then selected the 10 input images that most triggered each node.

In the previous paper, the researchers sent the images to humans hired through Mechanical Turk and had them identify what the images had in common. In the new paper, the researchers used a computer system to do the same.

“We catalogued more than 1,100 visual concepts, such as green, earth texture, wood, human face, bicycle wheel, snowy mountain, and so on,” said David Bau, an MIT graduate student. “We took multiple datasets that others had developed and combined them with datasets that were densely labeled with visual concepts, and we got many, many labels, and we knew which pixel corresponded to which label.”

Other authors of the paper include co-first author Bolei Zhou, Antonio Torralba, a professor of electrical engineering and computer science at MIT, Aude Oliva, a senior research scientist at CSAIL, and Aditya Khosla, a Ph.D. student of Torralba who is now CTO of the medical computing company PathAI.

The researchers also know which pixel in which image corresponds to the best response of a given network node. Today's neural networks are organized into layers, with data fed into the first layer, then processed and passed to the next layer, and so on. With visual data, the input image is broken into small pieces, and each piece is fed into a separate input node.

For each response from a given layer of nodes in one of their networks, the researchers were able to trace the triggering pattern and thus identify the specific image pixels that corresponded to it. Because their system could frequently identify labels that corresponded to exact groups of pixels, it could characterize the node's behavior in great detail.

In the dataset, the researchers organized these visual concepts in layers. Each level starts with the concepts of the top layer, such as color and texture, and then materials, components, objects, and scenes. Generally speaking, the lower layers of the neural network can correspond to simple visual features, such as color and texture, and the higher layers can stimulate responses to more complex features.

But the layers also allow researchers to quantify where neural networks are focused when they are trained to perform specific tasks. For example, training a neural network to colorize a black-and-white image will focus on a large number of nodes that recognize textures. Another example is training a network to track objects in a video frame, which will focus more on image recognition than training a network to recognize scenes. In this case, many nodes are actually dedicated to object recognition.

The researchers' experiments also shed light on a puzzle in neuroscience. Studies of human subjects implanted with electrodes to control neurological disorders have shown that individual neurons in the brain fire in response to specific visual stimuli. This hypothesis, formerly known as the grandmother-neuron hypothesis, is more familiar to neuroscientists as the Jennifer-Aniston neuron hypothesis. They came up with the hypothesis after finding that neurons in several neurological patients tended to respond only to depictions of specific Hollywood stars.

Many neuroscientists dispute this explanation. They believe that clusters of neurons, not individual neurons, control sensory recognition in the brain. So the Jennifer Aniston neuron is just a bunch of neurons firing together in response to the image of Jennifer Aniston. And it's possible that many clusters of neurons are responding to the stimulus together, but they haven't been tested.

Because the MIT researchers' analysis technique is fully automated, they were able to test whether something similar happened during the process of training neural networks to recognize visual scenes. In addition to identifying individual network nodes that were tuned to specific visual concepts, they also randomly selected combinations of nodes. However, combinations of nodes selected far fewer visual concepts than individual nodes, about 80 percent.

“To me, this suggests that the neural network is actually trying to approximate a grandmother neuron,” Bau said. “It’s not that they’re trying to throw the concept of a grandmother neuron everywhere, but they’re trying to assign it to a neuron. That’s an interesting implication, and most people don’t believe that this architecture is that simple.”

Paper: Network Dissection: Quantifying Interpretability of Deep Visual Representations

Paper link: http://netdissect.csail.mit.edu/final-network-dissection.pdf

We propose a general framework, Network Dissection, to quantify the interpretability of CNN hidden representations by evaluating the correspondence between individual hidden units and a set of semantic concepts. Given a CNN model, our proposed method scores the semantics of each intermediate convolutional layer hidden unit using a large dataset of visual concepts. These semantically-laden units are assigned a wide range of labels, from objects, components, scenes to textures, materials, and colors. We use our proposed method to test the hypothesis that the interpretability of a unit is equivalent to its random linear combination, and then apply our method to compare the latent representations of different networks when trained to solve different supervised and self-supervised training tasks. We further analyze the impact of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the impact of dropout and batch normalization on the interpretability of deep visual representations. We show that our proposed method can reveal properties of CNN models and training methods beyond measures of their discriminative power.

<<: The second issue of the Aiti Tribe live broadcast class: How to make good use of HTML5 in mobile Internet products?

>>: DeepMind's first step in decrypting the black box: It turns out that the cognitive principles of neural networks are the same as those of humans!

Even though Apple is hiring News editors, algorithms still decide what you read

Lao Jiang's "Market Strategy" series: From the main players to the sellers - a deep decryption - the secrets of large institutional trading

Blog

4G mobile phones will not be eliminated and the network experience will be guaranteed in the future

WeChat has launched a feature that does not shut down your phone when you owe money on time, which means you can still use your phone if you owe 100 yuan

[[432032]] I believe everyone has encountered the...

MIT proposes a network dissection framework to automatically peek into the black box of neural network training

Even though Apple is hiring News editors, algorithms still decide what you read

A better way to visualize microservice architectures

You may not have thought of it: five ways to reduce the cost of acquiring users!

Lao Jiang's "Market Strategy" series: From the main players to the sellers - a deep decryption - the secrets of large institutional trading

4G mobile phones will not be eliminated and the network experience will be guaranteed in the future

12,000 words to fully interpret Perfect Diary: from organizational structure to growth strategy

Does Baidu promotion deduction have anything to do with promotion ranking?

Taking stock of the 4 business models of private domain traffic!

One-year price for renting Gigabit exclusive bandwidth, G-port broadband rental

Struggling to survive in the cracks, is Huawei phone still worth buying now?

Recommend

3 types of traffic thinking in e-commerce marketing!

There are so many advertising channels, how do you choose?

Tencent Cloud Server 11.11 Limited Time Flash Sale - 2 Cores 4G First Year 70 Yuan

3-step analysis | The user growth system behind Xiaohongshu’s massive content

How to build a user growth system?

WeChat has launched a feature that does not shut down your phone when you owe money on time, which means you can still use your phone if you owe 100 yuan

Promotion strategy of Sina Weibo Fanstong

How to predict user churn rate and make strategies in advance?

How ordinary people can make their first million through marketing

How to use AARRR methodology to achieve product growth?

"Black hat SEO spinach quick ranking" Hubei SEO website ranking optimization skills!

The last unknown blue ocean of Android emulator mobile games

Tips to increase sales 10 times through live streaming!

How to deeply interpret those operational professional terms?

Manual translation of the basic principles of CG rendering [HD quality with material]