King's Counterattack: Use AI to play WeChat Jump Jump, automatically brushing points to break 10,000 points

King's Counterattack: Use AI to play WeChat Jump Jump, automatically brushing points to break 10,000 points

Recently, the WeChat mini game Jump Jump has become popular all over the country. From children to adults, it seems that everyone is playing Jump Jump. As AI programmers who can do everything, we are wondering if we can use artificial intelligence (AI) and computer vision (CV) methods to play this game?

Therefore, we developed the WeChat Auto-Jump algorithm, which redefined the correct posture for playing Auto-Jump. Our algorithm not only far exceeds the human level, but also far exceeds all currently known algorithms in terms of speed and accuracy. It can be said to be the state-of-the-art in the Auto-Jump field. Below we will introduce our algorithm in detail.

The first step of the algorithm is to take a screenshot of the phone screen and control the touch operation of the phone. Our github repository details the configuration methods for Android and IOS phones.

Github address:

https://github.com/Prinsphield/Wechat_AutoJump

You just need to connect your phone to your computer and follow the instructions to complete the configuration. After getting the screenshot, it is a simple visual problem. What we need to find is the position of the little man and the center of the next platform to jump.

As shown in the figure, the green dot represents the current position of the character, and the red dot represents the target position.

Multiscale Search

There are many ways to solve this problem. In order to quickly and easily get on the list, I started with a multi-scale search. I randomly picked a picture and cut out the little man, like the one below.

In addition, I noticed that the size of the little man is slightly different at different positions on the screen, so I designed a multi-scale search to match different sizes and select the one with the highest confidence score.

The code for multi-scale search looks like this:

  1. defmulti_scale_search(pivot, screen, range=0.3, num=10):  
  2. H, W = screen.shape[: 2]  
  3. h, w = pivot.shape[: 2]  
  4. found = None  
  5. forscale innp.linspace( 1-range, 1+range, num)[:: -1]:  
  6. resized = cv2.resize(screen, ( int (W * scale), int (H * scale)))  
  7. r = W / float (resized.shape[1])  
  8. ifresized.shape[ 0] < h orresized.shape[ 1] < w:  
  9. break  
  10. res = cv2.matchTemplate(resized, pivot, cv2.TM_CCOEFF_NORMED)  
  11. loc = np. where (res >= res. max ())  
  12. pos_h, pos_w = list(zip(*loc))[0]  
  13. iffound isNoneorres. max () > found[ -1]:  
  14. found = (pos_h, pos_w, r, res. max ())  
  15. iffound isNone: return (0, 0, 0, 0, 0)  
  16. pos_h, pos_w, r, score = found  
  17. start_h, start_w = int (pos_h * r), int (pos_w * r)  
  18. end_h, end_w = int ((pos_h + h) * r), int ((pos_w + w) * r)  
  19. return [start_h, start_w, end_h, end_w, score]

Let's give it a try. The effect is quite good. It should be said that it is fast and good. In all my experiments, I have never made any mistakes in finding the little man.

However, the bottom center of the location box here is not the position of the little man, the actual position is slightly above that.

Similarly, the target table can also be searched in this way, but we need to collect a number of different tabletops, including round, square, convenience stores, manhole covers, prisms, etc. Due to the large number and multiple scales, the speed will be slow.

At this time, we need to find a way to speed up. First of all, you can notice that the target position is always above the position of the little man, so one thing you can do is to discard the part below the little man's position after finding the little man's position, so as to reduce the search space.

But this is not enough, we need to further explore the story in the game. The villain and the target table are basically symmetrical about the center of the screen. This provides a very good idea to narrow the search space.

Assuming the screen resolution is (1280, 720), the position of the bottom of the man is (h1, w1), then the position of the symmetric point about the center is (1280 - h1, 720 - w1). If we search for the target position in multiple scales within a square with a side length of 300 centered on this point, we will be able to find it quickly and accurately.

The effect is shown in the figure below. The blue box is the search area (300, 300), the red box is the searched table, and the center of the rectangle is the coordinates of the target point.

Fast-Search

Playing the game requires careful observation. We can find that if the little man jumps to the center of the table last time, there will be a white dot in the center of the target table next time, just like the picture shown above.

More careful people will find that the RGB value of the white dot is (245, 245, 245). This allows me to find a very simple and efficient way, which is to search for the white dot directly. Note that the white dot is a connected area, and the number of pixels with a pixel value of (245, 245, 245) is stable between 280-310, so we can use this to directly find the target location.

This method can only be used when the previous jump to the center is made, but it doesn’t matter. We can try this time-saving method every time, and if it doesn’t work, we can consider multi-scale search.

At this point, our method can run very well, basically a perpetual motion machine. Below is the state of playing with my mobile phone for about an hour and a half, jumping 859 times. Our method correctly calculated the position of the little man and the target position, but I chose to die because my mobile phone was no longer working.

The following is a demonstration of the effect:

Is this the end? Then what is the difference between us and amateur players? Now let’s get into serious academic time. Non-combatants please evacuate quickly.

CNN Coarse-to-Fine Model

Considering that iOS devices cannot use fast-search due to the limitations of the screen capture solution (the screenshots obtained by WebDriverAgent have been compressed, the image pixels are damaged, and they are no longer the original pixel values. The reason is unknown. Friends who know the details are welcome to propose improvements), and in order to be compatible with multi-resolution devices, we use convolutional neural networks to build a faster and more robust object detection model.

The following is an introduction to our algorithm in four parts: data collection and preprocessing, coarse model, fine model, and cascade.

Data collection and preprocessing

Based on our very accurate multiscale-search and fast-search models, we collected data from 7 experiments, totaling about 3,000 screenshots, each of which was labeled with the target location. For each image, we performed two different preprocessing methods, which were used to train the coarse model and the fine model respectively. The two different preprocessing methods are introduced below.

Coarse model data preprocessing

Since the area of ​​each image that is truly meaningful for current judgment is only in the center of the screen, where the person and the target object are located, the upper and lower parts of each screenshot are meaningless.

Therefore, we cut off the collected 1280*720 image by 320*720 in the upper and lower x directions, and only keep the central 640*720 image as training data.

We observed that in the game, every time the little man landed at the center of a target object, a white dot would appear at the center of the next target object.

Considering that fast-search in the training data will generate a large amount of data with white dots, in order to prevent the white dots from interfering with network training, we performed a white dot removal operation on each image. The specific method is to fill the white dot area with pure color pixels around the white dots.

Fine model data preprocessing

In order to further improve the accuracy of the model, we established a data set for the fine model. For each image in the training set, a 320*320 block was cut near the target point as training data.

To prevent the network from learning trivial results, we added a 50-pixel random offset to each image. The fine model data was also de-whitened.

Coarse Model

We formulate this problem as a regression problem and the coarse model uses a convolutional neural network to regress the location of the target.

  1. defforward(self, img, is_training, keep_prob, name = 'coarse' ):  
  2. withtf.name_scope( name ):  
  3. withtf.variable_scope( name ):  
  4. out = self.conv2d( 'conv1' , img, [ 3, 3, self.input_channle, 16], 2)  
  5. # out = tf.layers.batch_normalization( out , name = 'bn1' , training=is_training)  
  6. out = tf.nn.relu( out , name = 'relu1' )  
  7. out = self.make_conv_bn_relu( 'conv2' , out , [ 3, 3, 16, 32], 1, is_training)  
  8. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  9. out = self.make_conv_bn_relu( 'conv3' , out , [ 5, 5, 32, 64], 1, is_training)  
  10. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  11. out = self.make_conv_bn_relu( 'conv4' , out , [ 7, 7, 64, 128], 1, is_training)  
  12. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  13. out = self.make_conv_bn_relu( 'conv5' , out , [ 9, 9, 128, 256], 1, is_training)  
  14. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  15. out = tf.reshape( out , [ -1, 256* 20* 23])  
  16. out = self.make_fc( 'fc1' , out , [ 256* 20* 23, 256], keep_prob)  
  17. out = self.make_fc( 'fc2' , out , [ 256, 2], keep_prob)  
  18. returnout

After ten hours of training, the coarse model achieved an accuracy of 6 pixels on the test set, with an actual test accuracy of about 10 pixels and an inference time of 0.4 seconds on the test machine (MacBook Pro Retina, 15-inch, Mid 2015, 2.2 GHz Intel Core i7).

This model can easily get a score of over 1k, which is far beyond the human level and the level of most automatic algorithms. It is more than enough for daily entertainment. However, if you think we will stop there, you are wrong.

Fine Model

The structure of the fine model is similar to that of the coarse model, but the number of parameters is slightly larger. The fine model serves as a refine operation on the coarse model.

  1. defforward(self, img, is_training, keep_prob, name = 'fine' ):  
  2. withtf.name_scope( name ):  
  3. withtf.variable_scope( name ):  
  4. out = self.conv2d( 'conv1' , img, [ 3, 3, self.input_channle, 16], 2)  
  5. # out = tf.layers.batch_normalization( out , name = 'bn1' , training=is_training)  
  6. out = tf.nn.relu( out , name = 'relu1' )  
  7. out = self.make_conv_bn_relu( 'conv2' , out , [ 3, 3, 16, 64], 1, is_training)
  8. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  9. out = self.make_conv_bn_relu( 'conv3' , out , [ 5, 5, 64, 128], 1, is_training)  
  10. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  11. out = self.make_conv_bn_relu( 'conv4' , out , [ 7, 7, 128, 256], 1, is_training)  
  12. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  13. out = self.make_conv_bn_relu( 'conv5' , out , [ 9, 9, 256, 512], 1, is_training)  
  14. out = tf.nn.max_pool( out , [ 1, 2, 2, 1], [ 1, 2, 2, 1], padding= 'SAME' )  
  15. out = tf.reshape( out , [ -1, 512* 10* 10])  
  16. out = self.make_fc( 'fc1' , out , [ 512* 10* 10, 512], keep_prob)  
  17. out = self.make_fc( 'fc2' , out , [ 512, 2], keep_prob)  
  18. returnout

After ten hours of training, the fine model test set accuracy reached 0.5 pixels, the actual test accuracy was about 1 pixel, and the inference time on the test machine was 0.2 seconds.

Cascade

The overall accuracy is about 1 pixel and the time is 0.6 seconds.

Summarize

To address this problem, we use AI and CV technologies to propose a complete solution suitable for iOS and Android devices. Users with a little technical background can successfully configure and run it.

We proposed three algorithms, Multiscale-Search, Fast-Search and CNN Coarse-to-Fine, to solve this problem. The three algorithms work together to achieve fast and accurate search and jump. Users can achieve a perpetual motion machine by slightly adjusting the jump parameters for their own devices.

At this point, it seems that we can declare that our work has terminated this problem. The WeChat mini game Jump Game Over!

Friendly reminder: Moderate gaming is good for the brain, but addiction to gaming is harmful to the body. The fun of technical means lies in the technology itself rather than in the game rankings. I hope everyone will treat the game rankings and the technology proposed in this article rationally, and use games to entertain their lives.

Statement: The algorithm and open source code proposed in this article comply with the MIT open source agreement. All consequences caused by using the algorithm for commercial purposes must be borne by the user himself.

<<:  What I did after taking over a negative iOS project

>>:  [Live] Technology or management, how should programmers plan their career path?

Recommend

Analysis of WeChat Reading VS NetEase Wuniu Reading Competitive Products

This article conducts a competitive analysis of W...

Eating nuts incorrectly can lead to trouble! Check out 8 common misconceptions

Nuts and roasted seeds are an essential snack for...

Data analysis of Internet finance platforms: These three models are enough

Since the operations department is the department...

Automatic login is not available yet! But WeChat has added this useful function

[[427117]] There have been rumors online for the ...

Famous Ships in Chinese History (VIII)

Today I would like to introduce to you the first ...

The secret to optimizing vocational training/academic education landing pages!

At the beginning of 2020, the impact of the epide...

iOS 16.4 Quick Security Response Update

Today, Apple released a quick security response u...