Quickdraw’s CNN-RNN model The quickdraw model used in "Guess the Painting Song" is essentially a classification model. The input is the coordinate information of the stroke points and the identification information of the starting point of each stroke. Several cascaded one-dimensional convolutions are applied, and then a BiLSTM layer is used and the results are summed. Finally, a Softmax layer is used for classification. The entire network structure is shown in the figure:
For more information about open source data and code, please refer to the reference document below. The entire network is relatively simple, and the final model accuracy is 75% with its default parameters, as shown in the figure below. It is not a high-demand scenario, and the effect is good enough. Here I share a few interesting details that I noticed (pats from experts). Small details Data preprocessing For stroke-3(x, y, n), Google uses the TFRecord data by default to normalize and interpolate the coordinates.
Why Normalization? Similar to the role of BN in the input layer, the distribution of data is adjusted from the convergence area of the original activation function to the area with larger gradient We only care about the stroke trend of the painting, not the size of the painting. In other words, there is not much difference between drawing a large circle and drawing a small circle in terms of input data. Why interpolation? Ignore the effect of the starting coordinate position, that is, starting to draw the same shape in the middle and the four corners of the canvas will not make much difference in the input data level. Convolutional Layer Use multiple one-dimensional convolutions (conv1d) in cascade, use a linear activation function, and do not use a pooling layer.
Why does linear activation and removing the pooling layer improve the effect by 2-3 points? What are the functions of the pooling layer?
The author (simple) understands that stick figures are already a high level of abstraction of objects by humans, so there is no need to use complex CNN networks to abstract features, and the global features are obtained by the subsequent RNN layer. Small Thoughts Google launched the web version of QuickDraw in November 2016, and it has recently become popular again with the help of mini-programs. A large amount of real user data has been obtained before and used to optimize the effectiveness of this mini-program. What else can models be used for? Recently, I saw an article that studied the relationship between the drawing order of people from different countries in this stick figure data and their national characters. In addition, there has been a lot of research and progress in the fields of anomaly analysis, handwriting recognition, speech recognition, and text classification on time series classification models.
When I was a graduate student, I studied abnormal analysis of computer users. I built a classification model based on the user's mouse trajectory and keyboard operation to identify whether the user was operating the computer. Now that I think about it, it should be pretty good to use this model to run the previous task. What other innovations can we have at the product level?
What other value can be mined from these painting data? Painting is a way for people to describe the world they understand in their own way. If we start with these simple sketches, we can learn how people understand objects and the world. In simple terms, we can migrate to the high-level abstract stage of current image recognition algorithms and improve the effectiveness of certain tasks. In a more complex way, we can even use it to enhance the reasoning ability of machines and learn the human ability to abstractly model objects and the world (brain hole). |
<<: An article to show you 20 new changes in iOS 12 Beta 5
>>: Android native communication with H5
On the white wall of the house If you are not car...
Although the author is not an expert, he comes fr...
Leo Tolstoy once said, “All great literature can ...
137 years ago today (October 7, 1885), Niels Bohr...
How do PR activities affect brands? Organizing a ...
Winter and spring are the peak seasons for respir...
Yes, Honor of Kings , a mobile game , seems to ha...
Sitting for long periods of time and lack of exer...
In 1896, in what is now Somalia, Carl Akeley fire...
Today we are going to talk about a common and rel...
Is it more labor-saving to walk or ride a bicycle...
A few days ago, a good friend and I walked to the...
Although performance hit a new high, Xpeng's ...
Sales of nearly 10 million in three days! The fil...