Quickdraw’s CNN-RNN model The quickdraw model used in "Guess the Painting Song" is essentially a classification model. The input is the coordinate information of the stroke points and the identification information of the starting point of each stroke. Several cascaded one-dimensional convolutions are applied, and then a BiLSTM layer is used and the results are summed. Finally, a Softmax layer is used for classification. The entire network structure is shown in the figure:
For more information about open source data and code, please refer to the reference document below. The entire network is relatively simple, and the final model accuracy is 75% with its default parameters, as shown in the figure below. It is not a high-demand scenario, and the effect is good enough. Here I share a few interesting details that I noticed (pats from experts). Small details Data preprocessing For stroke-3(x, y, n), Google uses the TFRecord data by default to normalize and interpolate the coordinates.
Why Normalization? Similar to the role of BN in the input layer, the distribution of data is adjusted from the convergence area of the original activation function to the area with larger gradient We only care about the stroke trend of the painting, not the size of the painting. In other words, there is not much difference between drawing a large circle and drawing a small circle in terms of input data. Why interpolation? Ignore the effect of the starting coordinate position, that is, starting to draw the same shape in the middle and the four corners of the canvas will not make much difference in the input data level. Convolutional Layer Use multiple one-dimensional convolutions (conv1d) in cascade, use a linear activation function, and do not use a pooling layer.
Why does linear activation and removing the pooling layer improve the effect by 2-3 points? What are the functions of the pooling layer?
The author (simple) understands that stick figures are already a high level of abstraction of objects by humans, so there is no need to use complex CNN networks to abstract features, and the global features are obtained by the subsequent RNN layer. Small Thoughts Google launched the web version of QuickDraw in November 2016, and it has recently become popular again with the help of mini-programs. A large amount of real user data has been obtained before and used to optimize the effectiveness of this mini-program. What else can models be used for? Recently, I saw an article that studied the relationship between the drawing order of people from different countries in this stick figure data and their national characters. In addition, there has been a lot of research and progress in the fields of anomaly analysis, handwriting recognition, speech recognition, and text classification on time series classification models.
When I was a graduate student, I studied abnormal analysis of computer users. I built a classification model based on the user's mouse trajectory and keyboard operation to identify whether the user was operating the computer. Now that I think about it, it should be pretty good to use this model to run the previous task. What other innovations can we have at the product level?
What other value can be mined from these painting data? Painting is a way for people to describe the world they understand in their own way. If we start with these simple sketches, we can learn how people understand objects and the world. In simple terms, we can migrate to the high-level abstract stage of current image recognition algorithms and improve the effectiveness of certain tasks. In a more complex way, we can even use it to enhance the reasoning ability of machines and learn the human ability to abstractly model objects and the world (brain hole). |
<<: An article to show you 20 new changes in iOS 12 Beta 5
>>: Android native communication with H5
In Android development, android:exported is an at...
1. Why did the App product change its name? For m...
Beijing Winter Paralympics is about to open Xue R...
Spring is here, Various insects also began to bec...
Brand rejuvenation is one of the buzzwords in the...
The Google Store APP advertising promotion that I...
In recent years, with the launch of "Mini Pr...
In recent years, Toutiao has developed rapidly, a...
Operations are based on specific products and are...
Data from a certain technology media showed that ...
Since OPPO, Samsung, and Huawei have successively...
The recent news that Zeekr 001 has been reduced i...
January 14, 2013 To the surprise of some, the PC ...
The space-ground integrated quantum communication...
Is it easy to be a Wuliangye agent? How much does...