On March 16, 2015, the official railway ticket purchase website 12306 introduced a new verification method on the login interface. After filling in the login name and password, users must accurately select the picture verification code to log in successfully. It is reported that after the revision of the 12306 verification code, all ticket grabbing tools are currently unable to log in. What a devastating news! I believe that all major Internet companies are devoting themselves to developing new ticket-grabbing assistants to crack the new verification code mode. Below, the editor will show you the design principles of various verification codes and how to crack them. The first is the plain text verification code, which is a relatively primitive one. This type of verification code does not meet the definition of verification code, because only automatically generated questions can be used as verification codes. This type of text verification code is selected from the question bank, and the number is limited. The cracking method is also very simple. Refresh a few times, build a question bank and corresponding answers, use regular expressions to grab questions from the web page, and crack them after finding matching answers. There are also some that use randomly generated mathematical formulas, such as random numbers [+-*/] random operators random numbers =?, which can be solved by programmers at the elementary school level... This kind of verification code is not completely useless. For many spam bots that will attack as soon as they see a form, there is really no need to put so much effort into a single website. For those who are determined to flood your website with spam, this kind of verification code is the same as not having it. The second one is the currently more mainstream picture verification code: The principle of this type of image verification code is to increase the difficulty of recognition by sticking characters together, and the above type is generally used for small websites. This type of verification code processing method: Image preprocessing How to remove background interference? You can notice that each verification code number or letter is the same color, so divide the verification code into 5 parts. Calculate the color distribution of each area. Except for white, the color with the highest value is the color of the verification code, so it is easy to remove the background. Code: - 1. public static BufferedImage removeBackgroud(String picFile)
- 2. throws Exception {
- 3. BufferedImage img = ImageIO.read( new File(picFile));
- 4. img = img.getSubimage( 1 , 1 , img.getWidth() - 2 , img.getHeight() - 2 );
- 5. int width = img.getWidth() ;
- 6. int height = img.getHeight() ;
- 7. double subWidth = ( double ) width / 5.0 ;
- 8. for ( int i = 0 ; i < 5 ; i++) {
- 9. Map<Integer, Integer> map = new HashMap<Integer, Integer>();
- 10. for ( int x = ( int ) ( 1 + i * subWidth); x < (i + 1 ) * subWidth
- 11. && x < width - 1 ; ++x) {
- 12. for ( int y = 0 ; y < height; ++y) {
- 13. if (isWhite(img.getRGB(x, y)) == 1 )
- 14. continue ;
- 15. if (map.containsKey(img.getRGB(x, y))) {
- 16. map.put(img.getRGB(x, y), map.get(img.getRGB(x, y)) + 1 );
- 17 . } else {
- 18. map.put(img.getRGB(x, y), 1 );
- 19. }
- 20. }
- twenty one . }
- 22. int max = 0 ;
- 23. int colorMax = 0 ;
- 24. for ( Integer color : map.keySet()) {
- 25. if (max < map.get ( color)) {
- 26. max = map.get(color);
- 27. colorMax = color;
- 28. }
- 29. }
- 30. for ( int x = ( int ) ( 1 + i * subWidth); x < (i + 1 ) * subWidth
- 31 . && x < width - 1 ; ++x) {
- 32. for ( int y = 0 ; y < height; ++y) {
- 33. if (img.getRGB(x, y) != colorMax) {
- 34. img.setRGB(x, y, Color.WHITE.getRGB());
- 35 . } else {
- 36. img.setRGB(x, y, Color.BLACK.getRGB());
- 37. }
- 38. }
- 39. }
- 40 . }
- 41. return img ;
Get the following figure
The next step is to scan the image vertically and cut it. Scan each part horizontally Then train Finally, because of the fixed size, recognition is the same as verification code recognition--1, and pixel comparison is sufficient. Source code: - 1. public class ImagePreProcess2 {
- 2 .
- 3. private static Map<BufferedImage, String> trainMap = null ;
- 4. private static int index = 0 ;
- 5 .
- 6. public static int isBlack( int colorInt) {
- 7. Color color = new Color(colorInt);
- 8. if (color.getRed() + color.getGreen () + color.getBlue() <= 100 ) {
- 9. return 1 ;
- 10. }
- 11. return 0 ;
- 12. }
- 13 .
- 14. public static int isWhite( int colorInt) {
- 15. Color color = new Color(colorInt);
- 16. if (color.getRed() + color.getGreen() + color.getBlue() > 100 ) {
- 17. return 1 ;
- 18. }
- 19. return 0 ;
- 20. }
- twenty one .
- 22. public static BufferedImage removeBackgroud(String picFile)
- 23. throws Exception {
- 24. BufferedImage img = ImageIO.read( new File(picFile));
- 25. return img ;
- 26 . }
- 27 .
- 28. public static BufferedImage removeBlank(BufferedImage img) throws Exception {
- 29. int width = img.getWidth() ;
- 30. int height = img.getHeight() ;
- 31 . int start = 0 ;
- 32. int end = 0 ;
- 33. Label1: for ( int y = 0 ; y < height; ++y) {
- 34. int count = 0 ;
- 35. for ( int x = 0 ; x < width; ++x) {
- 36. if (isWhite(img.getRGB(x, y)) == 1 ) {
- 37.count ++;
- 38. }
- 39. if (count >= 1 ) {
- 40. start = y;
- 41. break Label1 ;
- 42 . }
- 43. }
- 44. }
- 45. Label2: for ( int y = height - 1 ; y >= 0 ; --y) {
- 46 . int count = 0 ;
- 47. for ( int x = 0 ; x < width; ++x) {
- 48. if (isWhite(img.getRGB(x, y)) == 1 ) {
- 49.count ++;
- 50 . }
- 51 . if (count >= 1 ) {
- 52. end = y;
- 53. break Label2 ;
- 54. }
- 55. }
- 56 . }
- 57. return img.getSubimage( 0 , start, width, end - start + 1 );
- 58 . }
- 59 .
- 60. public static List<BufferedImage> splitImage(BufferedImage img)
- 61 . throws Exception {
- 62. List<BufferedImage> subImgs = new ArrayList<BufferedImage>();
- 63. int width = img.getWidth() ;
- 64. int height = img.getHeight() ;
- 65. List<Integer> weightlist = new ArrayList<Integer>();
- 66. for ( int x = 0 ; x < width; ++x) {
- 67. int count = 0 ;
- 68. for ( int y = 0 ; y < height; ++y) {
- 69. if (isWhite(img.getRGB(x, y)) == 1 ) {
- 70.count ++;
- 71 . }
- 72 . }
- 73. weightlist.add(count);
- 74 . }
- 75. for ( int i = 0 ; i < weightlist.size();) {
- 76 . int length = 0 ;
- 77. while (weightlist.get(i++) > 1 ) {
- 78. length++;
- 79 . }
- 80 . if (length > 12 ) {
- 81. subImgs.add(removeBlank(img.getSubimage(i - length - 1 , 0 ,
- 82. length / 2 , height)));
- 83. subImgs.add(removeBlank(img.getSubimage(i - length / 2 - 1 , 0 ,
- 84 . length / 2 , height)));
- 85 . } else if (length > 3 ) {
- 86. subImgs.add(removeBlank(img.getSubimage(i - length - 1 , 0 ,
- 87 . length, height)));
- 88 . }
- 89 . }
- 90 . return subImgs;
- 91 . }
- 92 .
- 93. public static Map<BufferedImage, String> loadTrainData() throws Exception {
- 94. if (trainMap == null ) {
- 95. Map<BufferedImage, String> map = new HashMap<BufferedImage, String>();
- 96. File dir = new File( "train2" );
- 97. File[] files = dir.listFiles();
- 98. for (File file : files) {
- 99. map.put(ImageIO.read(file), file.getName().charAt( 0 ) + "" );
- 100 . }
- 101. trainMap = map;
- 102 . }
- 103. return trainMap ;
- 104 . }
- 105 .
- 106 . public static String getSingleCharOcr(BufferedImage img,
- 107. Map<BufferedImage, String> map) {
- 108. String result = "" ;
- 109. int width = img.getWidth() ;
- 110. int height = img.getHeight() ;
- 111. int min = width * height;
- 112. for (BufferedImage bi : map.keySet()) {
- 113 . int count = 0 ;
- 114. int widthmin = width < bi.getWidth() ? width : bi.getWidth();
- 115. int heightmin = height < bi.getHeight() ? height : bi.getHeight() ;
- 116. Label1: for ( int x = 0 ; x < widthmin; ++x) {
- 117. for ( int y = 0 ; y < heightmin; ++y) {
- 118. if (isWhite(img.getRGB(x, y)) != isWhite(bi.getRGB(x, y))) {
- 119.count ++;
- 120 . if (count >= min)
- 121. break Label1 ;
- 122 . }
- 123 . }
- 124 . }
- 125. if (count < min) {
- 126. min = count;
- 127. result = map.get(bi);
- 128 . }
- 129 . }
- 130. return result ;
- 131 . }
- 132 .
- 133 . public static String getAllOcr(String file) throws Exception {
- 134. BufferedImage img = removeBackgroud(file);
- 135. List<BufferedImage> listImg = splitImage(img);
- 136. Map<BufferedImage, String> map = loadTrainData();
- 137. String result = "" ;
- 138. for (BufferedImage bi : listImg ) {
- 139. result += getSingleCharOcr(bi, map);
- 140 . }
- 141. ImageIO.write(img, "JPG" , new File( "result2//" + result + ".jpg" ));
- 142. return result ;
- 143 . }
- 144 .
- 145. public static void downloadImage() {
- 146. HttpClient httpClient = new HttpClient();
- 147. GetMethod getMethod = null ;
- 148. for ( int i = 0 ; i < 30 ; i++) {
- 149. getMethod = new GetMethod( "http://www.pkland.net/img.php?key="
- 150 . + ( 2000 + i));
- 151. try {
- 152 .
- 153. int statusCode = httpClient.executeMethod( getMethod );
- 154. if (statusCode != HttpStatus.SC_OK) {
- 155. System.err.println( "Method failed: "
- 156. + getMethod.getStatusLine());
- 157 . }
- 158 .
- 159. String picName = "img2//" + i + ".jpg" ;
- 160. InputStream inputStream = getMethod.getResponseBodyAsStream();
- 161. OutputStream outStream = new FileOutputStream(picName);
- 162. IOUtils.copy(inputStream, outStream);
- 163. outStream.close();
- 164. System.out.println(i + "OK!" );
- 165. } catch (Exception e) {
- 166. e.printStackTrace();
- 167 . } finally {
- 168 .
- 169. getMethod.releaseConnection();
- 170 . }
- 171 . }
- 172 . }
- 173 .
- 174 . public static void trainData() throws Exception {
- 175. File dir = new File( "temp" );
- 176. File[] files = dir.listFiles();
- 177. for (File file : files) {
- 178. BufferedImage img = removeBackgroud( "temp//" + file.getName());
- 179. List<BufferedImage> listImg = splitImage(img);
- 180. if (listImg.size() == 4 ) {
- 181. for ( int j = 0 ; j < listImg.size(); ++j) {
- 182. ImageIO.write(listImg.get(j), "JPG" , new File( "train2//"
- 183. + file.getName().charAt(j) + "-" + (index++)
- 184 . + ".jpg" ));
- 185 . }
- 186 . }
- 187 . }
- 188 . }
- 189 .
- 190 .
-
-
-
- 194 . public static void main(String[] args) throws Exception {
- 195 .
- 196. for ( int i = 0 ; i < 30 ; ++i) {
- 197. String text = getAllOcr( "img2//" + i + ".jpg" );
- 198. System.out.println(i + ".jpg = " + text);
- 199 . }
- 200 . }
- 201 .}
The verification codes of giants like BAT use interference lines, a mixture of bold and unbold characters, common Chinese characters (there are about 5,000 commonly used Chinese characters, with complex strokes and many similar characters, which is much more difficult than 26 letters), a mixture of different fonts, such as Kaiti, Songti, and Youyuan, pinyin, distorted fonts, and the need to accurately recognize 13 Chinese characters, which greatly increases the probability of failure. Of course, in addition to the mainstream image verification code, some websites use voice verification codes to take care of users with poor eyesight. Generally, this kind of verification code is a machine-generated voice reading a number. However, many programmers are lazy in this regard. They find 10 sound recordings of numbers in advance, and then randomly put them together when generating them. The result is like this: The design principle is as follows: Overall Effect • The number of characters is random within a certain range • Font size is random within a certain range •Wave distortion (random angle within a certain range) • Anti-identification • Don’t over-rely on anti-identification technology • Don't use too many character sets - poor user experience •Anti-segmentation• Overlapping adhesion is better than interference lines •Backup Plan • A completely different set of verification codes of the same strength Now that the principles are known, it becomes easy to crack them. But the problem is, this time the verification code of 12306 is actually a picture, and none of the above methods can be used. So can it not be cracked? Some people think that the image memory of the 12306 website is not too large, and it can be completely stripped and then cracked. Of course, this is just talk. There is a very advanced and very primitive method called "network coding" or "human coding" Some technical experts send the verification code to their self-made "coding" software, and some "coding workers" use this program to input the verification code into the machine for automatic registration. The verification code that comes out is transmitted to the automatic registration machine to complete the verification. At present, this simple and crude method can cope with the current situation. Conclusion: 12306 has come up with a killer move this time, killing all ticket grabbing software at once. Even if the scalpers are unhappy, we can still buy tickets. This not only solves the scalper problem, but also poses a difficult problem for programmers. |