This article introduces a paper about the application of CNN on graphs in ICML2016. ICML is a top conference in machine learning. The content of this paper, Learning CNNs for Graphs, has very good theoretical and practical value. If you are not familiar with the data structure of graphs, it is recommended that you refer to the introduction of relevant basic knowledge at the end of this article. CNN has achieved state-of-the-art results in fields such as computer vision (CV) and natural language processing. The data in these fields can be called Euclidean Data. CNN can efficiently process this data structure and explore the feature representations contained therein. The so-called Euclidean data refers to data similar to grids, sequences, etc. For example, images can be regarded as 2D grid data, and speech signals can be regarded as 1D grid data. However, there are still a lot of Non-Euclidean Data in real processing problems, such as social multimedia network data, chemical compound structure data, biological gene protein data, and knowledge graphs data, etc. This type of data belongs to graph-structured data. Neural network structures such as CNN cannot effectively process such data. Therefore, the problem to be solved in this paper is how to use CNN to efficiently process graph-structured data. The algorithm proposed in this paper is very simple. It converts a graph structure into a structure that CNN can process efficiently. The processing process is mainly divided into two steps: 1. Select a representative sequence of nodes from the graph structure; 2. For each selected node, find a convolution neighborhood field. Next, we will introduce the details of the algorithm in detail. This paper considers an image as a special kind of graph, that is, a grid graph, where each pixel is a node in the graph. I guess the motivation of the paper mainly comes from the desire to generalize the application of CNN on images to general graphs. Then let's first look at the application of CNN in Image. As shown in Figure 3, the left figure shows the convolution operation process of an image in a neural network layer. The bottom layer is the input feature map (or original image), which is output through a convolution operation (here it represents a 3*3 convolution kernel, that is, receptive field=9 in the article). As shown in the convolution operation in Figure 3, the 9 pixels of the bottom layer are weighted mapped to a pixel of the upper layer; look at the right figure in Figure 3, which shows the input data of the bottom layer of the left image from the perspective of the graph. Any of the convolution areas can be regarded as a node with a central point and a set of nodes in its field, which are finally weighted and mapped to a value. Therefore, the input feature map at the bottom can be regarded as: in a square grid graph, a series of nodes are determined to represent the image and a regularized neighborhood graph is constructed (and this neighborhood graph is the area of the convolution kernel, that is, the receptive field). Explained in this way, as shown in Figure 1 in the paper, a 4*4 image can actually be represented as a graph with 4 nodes (1, 2, 3, 4 in the figure), where each node also includes a neighborhood field of the same size as the convolution kernel. Therefore, the convolution of this image is actually the convolution of the field of the graph composed of these 4 nodes. So, for a general graph data, you only need to select the nodes and solve for their related fixed-size (same size as the convolution kernel) field to use CNN convolution to obtain the feature representation of the graph. It should be noted that Figure 4 (b) shows the neighborhood of a node in (a). The receptive field is mapped to a vector of the same size as the convolution kernel in the order of spatial position from left to right and from top to bottom, and then convolution is performed. However, in general atlases, there is no spatial position information in the image. This is also a problem that needs to be solved in the process of processing graph data. Based on the above description, the paper mainly does three things: 1. Select appropriate nodes; 2. Establish a neighborhood for each node; 3. Establish a single mapping from graph representation to vector representation to ensure that nodes with similar structural features can be mapped to similar positions in the vector. The algorithm is divided into 4 steps: 1. Node Sequence SelectionFirst, for an input graph, we need to determine a width w (defined in Algorithm 1), which indicates the number of nodes to be selected. In fact, it is the number of receptive fields (in fact, here it means that the receptive field of each convolution of a node, the stride of the convolution = kernel size). So how do we select nodes specifically? In fact, the paper says that the selection is made according to the sorting label of the node in the graph, but this article does not provide more information on how to sort. The main method adopted is: centrality, which is a centralized method. My personal understanding is that the more central the point is, the more important it is. The central position here is not a spatial concept, but a concept that measures the importance of a point in a relationship. Let me give you a simple example. As shown in Figure 5, the two graphs actually represent the same graph. For the two different nodes marked in red, we compare their central position relationship. During the comparison process, we calculate the distance relationship between the node and all other nodes. We assume that the distance between two adjacent nodes is 1. For the red node on the left in Figure 5, there are 4 nodes directly connected to it, so the distance is +4; there are 3 nodes slightly further away, that is, adjacent to it, with a distance of +6; there are 3 more adjacent nodes, +9; and finally the farthest one is +4; so we know that the total distance of the node is 23. Similarly, we get the distance of the node on the right is 3+8+6+8=25. So it is obvious that when selecting nodes, the node on the left will be selected first. Of course, this is just a method of sorting and selecting nodes, and its problems are also very obvious. The paper did not explain it in detail in this work. 2. Find the Node's Neighborhood AssemblyNext, a receptive field is determined for each selected node in order to perform the convolution operation. However, before that, the neighborhood field of each node is first found, and then the nodes in the receptive field are determined from it. Assuming that the size of the receptive field is k, there are obviously two situations for each node: the number of neighboring nodes is less than k, or there are too many neighboring points. This will be explained in the following chapters. As shown in the figure, 6 nodes are selected. For each node, first find its direct neighboring nodes (called 1-neighborhood). If it is not enough, add indirect neighboring nodes. If 1-neighborhood is enough, put all of them in the candidate area first, and make the final selection through normalization in the next step. 3. Graph NormalizationAssume that in the previous step of Neighborhood Assemble, a node gets a domain with a total of N nodes. Then the number of N may not be equal to k. Therefore, the normalization process is to select them by sorting labels and map them into the vector in that order. If the number of neighboring nodes of this node is insufficient, all of them are directly selected, and dummy nodes are added if they are insufficient, but they still need to be sorted; if the number N exceeds, the nodes behind need to be truncated according to the sorting. As shown in Figure 7, the whole process from node selection to solving the receptive field is shown. After Normalize sorting, it can be mapped into a vector. Therefore, the most important step in this step is to sort the nodes. As shown in Figure 8, it shows the process of solving the receptive field of any node. The size of the convolution kernel here is 4, so 4 nodes need to be selected in the end, including this node itself. Therefore, these nodes need to be labeled. Of course, there are many ways, so what is the best way to label? As shown in Figure 7, in fact, selecting 4 nodes from these 7 nodes will form a set of graphs containing 4 nodes. The author believes that: under a certain label, randomly select two graphs from the set, and calculate the expected difference between their graph distance in vector space and their graph distance in graph space. If this expectation is smaller, then it means that the label is better! The specific expression is as follows: After obtaining the best label, we can map the node into an ordered vector in sequence, and thus obtain the receptive field of the node, as shown in the far right of Figure 6. 4. Convolutional ArchitectureThe article uses a 2-layer convolutional neural network, which converts the input into a vector vector and can be used for convolution operation. The specific operation is shown in Figure 9. First, the gray blocks at the bottom layer are the input of the network. Each block represents the receptive field area of a node, which is also the 4 nodes obtained by the previous solution. Among them, an represents a dimension in the data of each node (if the node is a color image, it is 3-dimensional; if it is text, it may be a word vector... Here it indicates that the dimension of the data is n). The pink represents the convolution kernel, the size of the kernel is 4, but the width must be the same as the data dimension. Therefore, after convolution with each node, a value is obtained. The stride of the convolution is 4, indicating that each time 1 node is convolved, stride=4 will just cross to the next node next time. (Note: In Figure 1 in the paper, the stride in (a) is 1, but after conversion to the structure in (b), stride=9). The number of convolution kernels is M, indicating that the number of channels of the feature map obtained after convolution is M, so the final result is V1...VM, which is the feature representation of the graph. With it, classification or regression tasks can be performed. Basic questions: The basic concept of a graph: It is mainly composed of vertices and edges. There is an adjacency matrix A. If the nodes in it are represented by features (Feat), it will be as shown in the right figure below. |
<<: Summarize some suggestions from Effective Java that can help Android development
>>: What is the principle of maxpool in CNN?
There is an animal that is both familiar and unfa...
Author | Wang Siliang Review | Dong Chenhui Edito...
Before we start, let’s talk about: “How many head...
To customize the African mercenary sign , add our...
From the surge in the number of positive COVID-19...
In the past, human resources, especially human re...
Produced by: Science Popularization China Author:...
Today is World Fisheries Day. Fish are the most d...
Author: Duan Yuechu and Huang Xianghong According...
This article mainly introduces what kind of posts...
Mobile phone manufacturers hope to let you experi...
Recently, some netizens took photos of long white...
Before, people always said that meeting someone w...
Among various deep space explorations, scientists...