How to grab red envelopes scientifically: write a program to grab red envelopes

Everyone knows the background. It's the Chinese New Year, and red envelopes are flying everywhere. I just learned Python two days ago, and I was quite excited. I studied how to crawl Weibo red envelopes. Why Weibo red envelopes instead of Alipay red envelopes? Because I only know the Web. If I have the energy, I may also study the whack-a-mole algorithm later.

Because I am a beginner in Python, and this program is the third one I wrote after learning Python, so please don’t point out any bad parts in the code. The key is the idea. Well, if there are any bad parts in the idea, please don’t point them out to me. You see, IE has the nerve to set itself as the default browser, so it’s acceptable for me to show off by writing a crappy article, right?

I use Python 2.7. I heard that there are big differences between Python 2 and Python 3. Friends who are even less knowledgeable than me should pay attention.

0×01 Thoughts

I'm too lazy to describe it in words, so I drew a sketch and I think you can understand it.

First of all, as usual, let's introduce a bunch of libraries that I don't know what they are used for but cannot be without:

 import re
 import urllib
 import urllib2
 import cookielib
 import base64
 import binascii
 import os
 import json
 import sys
 import cPickle as p
 import rsa

Then declare some other variables that will be used later:

 reload(sys)
 sys.setdefaultencoding( 'utf-8&' ) #Set the character encoding to utf -8  
 luckyList=[] #Red envelope list
 lowest = 10 # What is the lowest record of receiving red envelopes that can be tolerated?

An rsa library is used here. Python does not come with it by default, so you need to install it: https://pypi.python.org/pypi/rsa/

After downloading, run setpy.py install to install it, and then we can start our development steps.

0×02 Weibo login

The action of grabbing red envelopes can only be performed after logging in, so there must be a login function. Logging in is not the key, the key is the preservation of cookies, which requires the cooperation of cookielib.

 cj = cookielib.CookieJar()
 opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
 urllib2.install_opener(opener)

In this way, all network operations performed using opener will process the status of the cookie. Although I don’t quite understand it, it feels magical.

Next, you need to encapsulate two modules, one is the data acquisition module, which is used to simply GET data, and the other is used to POST data. In fact, there are only a few more parameters, which can be merged into one function, but I am lazy and stupid, and I don’t want to and can’t change the code.

 def getData(url) :
 try :
                req = urllib2.Request(url)
                result = opener.open(req)
                text = result.read()
                text=text.decode( "utf-8" ).encode( "gbk" , 'ignore' )
 return text
 except Exception, e:
 print u 'Request exception, url:' +url
 print e 
   
 def postData(url,data,header):
 try :
                data = urllib.urlencode(data)
                req = urllib2.Request(url,data,header)
                result = opener.open(req)
                text = result.read()
 return text
 except Exception, e:
 print u 'Request exception, url:' +url

With these two modules, we can GET and POST data. The reason why getData is decoded and then encoded is because the output is always garbled when I debug under Win7, so some encoding processing is added. These are not the point. The following login function is the core of Weibo login.

 def login(nick, pwd):
 print u "----------Logging in----------"  
        print "----------......----------"  
        prelogin_url = 'http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=%s&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.15)&_=1400822309846' % nick
        preLogin = getData(prelogin_url)
        servertime = re.findall( '"servertime":(.+?),' , preLogin)[ 0 ]
        pubkey = re.findall( '"pubkey":"(.+?)",' , preLogin)[ 0 ]
        rsakv = re.findall( '"rsakv":"(.+?)",' , preLogin)[ 0 ]
        nonce = re.findall( '"nonce":"(.+?)",' , preLogin)[ 0 ]
        #print bytearray( 'xxxx' , 'utf-8' )
        su = base64.b64encode(urllib.quote(nick))
        rsaPublickey= int (pubkey, 16 )
        key = rsa.PublicKey(rsaPublickey, 65537 )
        message = str(servertime) + '\t' + str(nonce) + '\n' + str(pwd)
        sp = binascii.b2a_hex(rsa.encrypt(message,key))
        header = { 'User-Agent' : 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)' }
 param = {
 'entry' : 'weibo' ,
 'gateway' : '1' ,
 'from' : '' ,
 'savestate' : '7' ,
 'userticket' : '1' ,
 'ssosimplelogin' : '1' ,
 'vsnf' : '1' ,
 'vsnval' : '' ,
 'su' : su,
 'service' : 'miniblog' ,
 'servertime' : servertime,
 'nonce' : nonce,
 'pwencode' : 'rsa2' ,
 'sp' : sp,
 'encoding' : 'UTF-8' ,
 'url' : 'http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack' ,
 'returntype' : 'META' ,
 'rsakv' : rsakv,
 }
        s = postData( 'http://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.15)' ,param,header) 
   
 try :
                urll = re.findall( "location.replace\(\'(.+?)\'\);" , s)[ 0 ]
                login=getData(urll)
 print u "---------Login successful!-------"  
                print "----------......----------"  
 except Exception, e:
 print u "---------Login failed!-------"  
                print "----------......----------"  
 exit( 0 )

The parameters and encryption algorithms here are all copied from the Internet, and I don’t really understand them. It’s probably about first requesting a timestamp and public key, then encrypting it with RSA, and finally processing it and submitting it to the Sina login interface. After a successful login from Sina, a Weibo address will be returned. A request is required to make the login status take effect completely. After a successful login, subsequent requests will carry the current user’s cookie.

After successfully logging into Weibo, I couldn't wait to find a red envelope to try it out, of course, I had to try it in the browser first. After clicking and clicking, I finally found a page with a button to grab a red envelope. I pressed F12 to summon the debugger to see how the data packet was requested.

You can see that the requested address is http://huodong.weibo.com/aj_hongbao/getlucky. There are two main parameters. One is ouid, which is the red envelope ID, which can be seen in the URL. The other share parameter determines whether to share it to Weibo. There is also a _t, which I don’t know what it is used for.

Well, now theoretically, you can complete the red envelope extraction by submitting three parameters to this URL. However, when you actually submit the parameters, you will find that the server will magically return you a string like this:

 1   
     
 { "code" : 303403 , "msg" : "Sorry, you do not have permission to access this page" , "data" : []}

Don't panic at this time. Based on my many years of experience in Web development, the other party's programmer should have determined the referer. It's very simple. Just copy all the headers of the request.

 def getLucky(id): #Lottery program
 print u "---Drawing red envelope:" +str(id)+ "---"  
        print "----------......----------"   
   
 if checkValue(id)==False: #Does not meet the conditions, this is the following function
 return  
        luckyUrl= "http://huodong.weibo.com/aj_hongbao/getlucky"  
 param={
 'ouid' :id,
 'share' : 0 ,
 '_t' : 0  
 } 
   
 header={
 'Cache-Control' : 'no-cache' ,
 'Content-Type' : 'application/x-www-form-urlencoded' ,
 'Origin' : 'http://huodong.weibo.com' ,
 'Pragma' : 'no-cache' ,
 'Referer' : 'http://huodong.weibo.com/hongbao/' +str(id),
 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 BIDUBrowser/6.x Safari/537.36' ,
 'X-Requested-With' : 'XMLHttpRequest'  
 }
        res = postData(luckyUrl,param,header)

In theory, there is no problem, and in fact, there is no problem. After the lottery action is completed, we need to judge the status. The returned res is a json string, where the code is 100000 for success, 90114 for today's lottery reaching the upper limit, and other values are also failures, so:

 hbRes = json.loads(res)
 if hbRes[ "code" ]== '901114' : #Today's red envelopes have been snatched
 print u "---------The upper limit has been reached---------"  
        print "----------......----------"  
        log( 'lucky' ,str(id)+ '---' +str(hbRes[ "code" ])+ '---' +hbRes[ "data" ][ "title" ])
 exit( 0 )
 elif hbRes[ "code" ] == '100000' : # Success
 print u "---------Congratulations on your prosperity---------"  
        print "----------......----------"  
        log( 'success' ,str(id)+ '---' +res)
 exit( 0 ) 
   
 if hbRes[ "data" ] and hbRes[ "data" ][ "title" ]:
        print hbRes[ "data" ][ "title" ]
        print "----------......----------"  
        log( 'lucky' ,str(id)+ '---' +str(hbRes[ "code" ])+ '---' +hbRes[ "data" ][ "title" ])
 else :
 print u "---------Request error---------"  
        print "----------......----------"  
        log( 'lucky' ,str(id)+ '---' +res)

Among them, log is also a function I customized to record logs:

 def log(type,text):
        fp = open(type+ '.txt' , 'a' )
 fp.write(text)
 fp.write( '\r\n' )
 fp.close()

0×04 Crawling the red envelope list

After the single red envelope receiving action test is successful, it is the core module of our program - crawling the red envelope list. There should be many methods and entrances to crawl the red envelope list, such as various Weibo search keywords and so on, but I use the simplest method here: crawling the red envelope list.

On the homepage of the red envelope activity (http://huodong.weibo.com/hongbao), you can see everything through various points. Although there are many links in the list, they can be summarized into two categories (except the richest red envelope list): theme and ranking list.

Continue to summon F12 and analyze the formats of these two pages. First, there is a list of topics, such as: http://huodong.weibo.com/hongbao/special_quyu

You can see that the red envelope information is all in a div with the class name info_wrap, so we just need to activate the source code of this page, grab all the infowrap, and then simply process it to get the red envelope list of this page. Here we need to use some regular expressions:

 def getThemeList(url,p):#Theme red envelope
 print u "---------第" +str(p)+ "页---------"  
        print "----------......----------"  
        html=getData(url+ '?p=' +str(p))
 pWrap=re.compile(r '(.+?)' ,re.DOTALL) #h Get all info_wrap regular expressions
 pInfo=re.compile(r '.+(.+).+(.+).+(.+).+href="(.+)" class="btn"' ,re.DOTALL) #Get red envelope information
        List=pWrap.findall(html,re.DOTALL)
 n = len(List)
 if n== 0 :
 return  
 for i in range(n): #Traverse all info_wrap divs
 s=pInfo.match(List[i]) #Get red envelope information
                info=list(s.groups( 0 ))
 info[ 0 ] = float (info[ 0 ].replace( '\xcd\xf2' , '0000' )) #Cash, 10,000 -> 0000  
 try :
 info[ 1 ] = float (info[ 1 ].replace( '\xcd\xf2' , '0000' )) #gift value
                except Exception, e:
 info[ 1 ] = float (info[ 1 ].replace( '\xd2\xda' , '00000000' )) #gift value
 info[ 2 ] = float (info[ 2 ].replace( '\xcd\xf2' , '0000' )) # Sent
 if info[ 2 ] == 0 :
 info[ 2 ] = 1 # prevent division by 0  
 if info[ 1 ] == 0 :
 info[ 1 ] = 1 # prevent division by 0  
 info.append(info[ 0 ]/(info[ 2 ]+info[ 1 ])) #Red envelope value, cash/(number of recipients + prize value)
                # if info[ 0 ]/(info[ 2 ]+info[ 1 ])> 100 :
 # print url
                luckyList.append(info)
 if   'class="page"' in html:#Next page exists
 p=p+ 1  
 getThemeList(url,p) #Recursively call to crawl the next page

Regular expressions are difficult. It took me a long time to learn them and I was only able to write these two sentences. There is also an info[4] appended to the info here. It is an algorithm I came up with to roughly determine the value of a red envelope. Why do we do this? Because there are many red envelopes but we can only draw four times. In the vast sea of red envelopes, we must find the most valuable red envelope and then draw it. There are three data for reference: cash value, gift value and number of recipients. Obviously, if there is little cash and many recipients or the prize value is extremely high (some are even crazy and in the billions), then it is not worth grabbing. So I worked hard for a long time and finally came up with an algorithm to measure the weight of red envelopes: red envelope value = cash/(number of recipients + prize value).

The principle of the ranking page is the same, find the key tags and match them with regular expressions.

 def getTopList(url,daily,p):#Ranking list red envelope
 print u "---------第" +str(p)+ "页---------"  
        print "----------......----------"  
        html=getData(url+ '?daily=' +str(daily)+ '&p=' +str(p))
 pWrap=re.compile(r '(.+?)' ,re.DOTALL) #h Get all list_info regular expressions
 pInfo=re.compile(r '.+(.+).+(.+).+(.+).+href="(.+)" class="btn rob_btn"' ,re.DOTALL) #Get red envelope information
        List=pWrap.findall(html,re.DOTALL)
 n = len(List)
 if n== 0 :
 return  
 for i in range(n): #Traverse all info_wrap divs
 s=pInfo.match(List[i]) #Get red envelope information
                topinfo=list(s.groups( 0 ))
                info=list(topinfo)
                info[ 0 ]=topinfo[ 1 ].replace( '\xd4\xaa' , '' ) #元-> ''  
 info[ 0 ] = float (info[ 0 ].replace( '\xcd\xf2' , '0000' )) #Cash, 10,000 -> 0000  
                info[ 1 ]=topinfo[ 2 ].replace( '\xd4\xaa' , '' ) #元-> ''  
 try :
 info[ 1 ] = float (info[ 1 ].replace( '\xcd\xf2' , '0000' )) #gift value
                except Exception, e:
 info[ 1 ] = float (info[ 1 ].replace( '\xd2\xda' , '00000000' )) #gift value
 info[ 2 ]=topinfo[ 0 ].replace( '\xb8\xf6' , '' ) # -> ''  
 info[ 2 ] = float (info[ 2 ].replace( '\xcd\xf2' , '0000' )) # Sent
 if info[ 2 ] == 0 :
 info[ 2 ] = 1 # prevent division by 0  
 if info[ 1 ] == 0 :
 info[ 1 ] = 1 # prevent division by 0  
 info.append(info[ 0 ]/(info[ 2 ]+info[ 1 ])) #Red envelope value, cash/(number of recipients + gift value)
                # if info[ 0 ]/(info[ 2 ]+info[ 1 ])> 100 :
                        # print url
                luckyList.append(info)
 if   'class="page"' in html:#Next page exists
 p=p+ 1  
 getTopList(url,daily,p) #recursively call to crawl the next page

OK, now we can successfully crawl the lists of both topic pages. The next step is to get the list of lists, that is, the collection of all these list addresses, and then crawl them one by one:

 def getList():
 print u "---------Search target---------"  
        print "----------......----------"   
   
 themeUrl={ #Theme list
 'theme' : 'http://huodong.weibo.com/hongbao/theme' ,
 'pinpai' : 'http://huodong.weibo.com/hongbao/special_pinpai' ,
 'daka' : 'http://huodong.weibo.com/hongbao/special_daka' ,
 'youxuan' : 'http://huodong.weibo.com/hongbao/special_youxuan' ,
 'qiye' : 'http://huodong.weibo.com/hongbao/special_qiye' ,
 'quyu' : 'http://huodong.weibo.com/hongbao/special_quyu' ,
 'meiti' : 'http://huodong.weibo.com/hongbao/special_meiti' ,
 'hezuo' : 'http://huodong.weibo.com/hongbao/special_hezuo'  
 } 
   
 topUrl={ #Ranking list
 'mostmoney' : 'http://huodong.weibo.com/hongbao/top_mostmoney' ,
 'mostsend' : 'http://huodong.weibo.com/hongbao/top_mostsend' ,
 'mostsenddaka' : 'http://huodong.weibo.com/hongbao/top_mostsenddaka' ,
 'mostsendpartner' : 'http://huodong.weibo.com/hongbao/top_mostsendpartner' ,
 'cate' : 'http://huodong.weibo.com/hongbao/cate?type=' ,
 'clothes' : 'http://huodong.weibo.com/hongbao/cate?type=clothes' ,
 'beauty' : 'http://huodong.weibo.com/hongbao/cate?type=beauty' ,
 'fast' : 'http://huodong.weibo.com/hongbao/cate?type=fast' ,
 'life' : 'http://huodong.weibo.com/hongbao/cate?type=life' ,
 'digital' : 'http://huodong.weibo.com/hongbao/cate?type=digital' ,
 'other' : 'http://huodong.weibo.com/hongbao/cate?type=other'  
 } 
   
 for (theme,url) in themeUrl.items():
                print "----------" +theme+ "----------"  
 print url
                print "----------......----------"  
                getThemeList(url, 1 ) 
   
 for (top,url) in topUrl.items():
                print "----------" +top+ "----------"  
 print url
                print "----------......----------"  
                getTopList(url, 0 , 1 )
                getTopList(url, 1 , 1 )

0×05 Determine the availability of red envelopes

This is relatively simple. First, search the keywords in the source code to see if there is a red envelope grabbing button, and then go to the receiving ranking to see what the highest record is. If the highest amount you receive is only a few dollars, then bye bye...

The address to view the red envelope record is http://huodong.weibo.com/aj_hongbao/detailmore?page=1&type=2&_t=0&__rnd=1423744829265&uid=red envelope id

 def checkValue(id):
        infoUrl= 'http://huodong.weibo.com/hongbao/' +str(id)
 html=getData(infoUrl) 
   
 if   'action-type="lottery"' in html or True: #There is a button to grab the red envelope
 logUrl= "http://huodong.weibo.com/aj_hongbao/detailmore?page=1&type=2&_t=0&__rnd=1423744829265&uid=" +id # View ranking data
 param={}
 header={
 'Cache-Control' : 'no-cache' ,
 'Content-Type' : 'application/x-www-form-urlencoded' ,
 'Pragma' : 'no-cache' ,
 'Referer' : 'http://huodong.weibo.com/hongbao/detail?uid=' +str(id),
 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 BIDUBrowser/6.x Safari/537.36' ,
 'X-Requested-With' : 'XMLHttpRequest'  
 }
                res = postData(logUrl,param,header)
 pMoney=re.compile(r '< span class="money">(\d+?.+?)\xd4\xaa< /span>' ,re.DOTALL) #h Get all list_info regular expressions
                luckyLog=pMoney.findall(html,re.DOTALL) 
   
 if len(luckyLog)== 0 :
                        maxMoney= 0  
 else :
                        maxMoney= float (luckyLog[ 0 ]) 
   
 if maxMoney< lowest: #The maximum red packet in the record is less than the set value
 return False
 else :
 print u "---------One step slower---------"  
                print "----------......----------"  
 return False
 return True

0×06 Finishing work

The main modules have been completed, and now we need to connect all the steps in series:

 def start(username,password,low,fromFile):
 gl=False
 lowest=low
        login(username, password)
 if fromfile== 'y' :
 if os.path.exists( 'luckyList.txt' ):
 try :
                                f = file( 'luckyList.txt' )
                                newList = []
                                newList = p.load(f)
 print u '---------Loading list---------'  
                                print "----------......----------"  
                        except Exception, e:
 print u 'Parsing the local list failed, crawling the online page.'  
                                print "----------......----------"  
                                gl=True
 else :
 print u 'luckyList.txt does not exist locally, fetch the online page.'  
                        print "----------......----------"  
                        gl=True
 if gl==True:
 getList()
                from operator import itemgetter
                newList=sorted(luckyList, key=itemgetter( 4 ),reverse=True)
                f = file( 'luckyList.txt' , 'w' )
 p.dump(newList, f) #Save the captured list to a file so you don’t have to capture it again next time
 f.close() 
   
 for lucky in newList:
 if not 'http://huodong.weibo.com' in lucky[ 3 ]: #Not a red envelope
 continue  
                print lucky[ 3 ]
                id=re.findall(r '(\w*[0-9]+)\w*' ,lucky[ 3 ])
                getLucky(id[ 0 ])

Because it is troublesome to crawl the red envelope list repeatedly every time you test, I added a code to dump the complete list to a file, so that you can read the local list and grab the red envelopes in the future. After constructing the start module, write an entry program to pass the Weibo account to it:

 if __name__ == "__main__" :
 print u "------------------Weibo Red Packet Assistant------------------"  
        print "---------------------v0.0.1---------------------"  
 print u "-------------by @All-powerful Soul Master----------------"  
        print "-------------------------------------------------"   
   
 try :
 uname=raw_input(u "Please enter your Weibo account: " .decode( 'utf-8' ).encode( 'gbk' ))
 pwd = raw_input(u "Please enter your Weibo password: " .decode( 'utf-8' ).encode( 'gbk' ))
 low = int (raw_input(u "Participate when the maximum cash received by the red envelope is greater than n: " .decode( 'utf-8' ).encode( 'gbk' )))
 fromfile=raw_input(u "Do you want to use the red envelope list in luckyList.txt: (y/n) " .decode( 'utf-8' ).encode( 'gbk' ))
 except Exception, e:
 print u "Parameter error"  
                print "----------......----------"  
 print e
 exit( 0 ) 
   
 print u "---------Program starts---------"  
        print "----------......----------"  
        start(uname,pwd,low,fromfile)
 print u "------------Program ends---------"  
        print "----------......----------"  
 os.system( 'pause' )

0×07 Go away!

The basic crawler skeleton has been basically completed. In fact, there is still a lot of room for improvement in many details of this crawler, such as modifying it to support batch login, optimizing the red envelope value algorithm, and there should be many places in the code itself that can be optimized, but with my ability, I think I can only get this far.

Everyone has seen the result of the program. I wrote hundreds of lines of code and thousands of words of articles, but all I got in return was a set of double-color balls. What a rip-off! How could it be a double-color ball? (Aside: The author became more and more excited as he spoke, and he actually started crying. People around him tried to persuade him: "Brother, it's not that serious. It's just a Weibo red envelope. I shook my hands so hard yesterday but I didn't get a WeChat red envelope.")

Alas, actually I am not crying about this. I am sad because I am already in my twenties and still doing such boring things as writing programs to grab red envelopes on Weibo. This is not the life I want at all!

Source code download: http://download..com/data/1984536

<<: How to speed up NFC development?

>>: One picture tells you how popular WeChat red envelopes are during the Chinese New Year

How to improve user retention rate? 6 strategies used by the entire Internet finance industry

Recommend

How do agricultural input companies conduct marketing through the Internet?

Do you still remember the slogan "I won’t ac...

"91 Ten Articles" - A daily must-read briefing for the new energy vehicle industry (210303)

1. A land transfer announcement was posted on the...

Luo Yonghao reviews four flagship phones: Xiaomi's camera is not as good as Huawei's Nut R1, and the overall experience is the best

[Smart Observation Report] In the past two days, ...

Download the complete set of videos for first-level construction engineers, Baidu cloud of first-level construction engineer videos in 2020!

[Abstract] Download the complete set of videos on...

How to grab red envelopes scientifically: write a program to grab red envelopes

How to improve user retention rate? 6 strategies used by the entire Internet finance industry

How Apple realized the TV that Steve Jobs wanted

[Case] UHome Hotel Operation and Promotion Case

The BUG in Marketing to Generation Z

This kind of beans is perfect to eat in spring, but some people should eat it with caution

Huanwang joins hands with Sofres to transform the smart TV industry with data

Daily Fresh Product Analysis

How to push messages to APP (I)

Windows 10's new start screen revealed: Tile folders are born

What would happen if you put a frog in a magnetic field?

Recommend

How do agricultural input companies conduct marketing through the Internet?

"91 Ten Articles" - A daily must-read briefing for the new energy vehicle industry (210303)

Luo Yonghao reviews four flagship phones: Xiaomi's camera is not as good as Huawei's Nut R1, and the overall experience is the best

Intelligence will continue. In 2016, the intelligent industry may be taken over by traditional manufacturers.

Alternative Halloween: 13 terrifying nightmares that keep programmers awake at night

Do you know this about remote sensing satellites?

The earliest urban planning in modern China turned out to be a little-known

E-commerce detail page conversion skills worth 50,000 yuan (2)

2022 Hangzhou Asian Games suspension order: When exactly does the suspension begin? Attached is the latest official news

2020 Q3 Beauty Industry Douyin & Xiaohongshu Marketing Report

Can’t do these 5 things? I advise you not to invest in information flow!

Download the complete set of videos for first-level construction engineers, Baidu cloud of first-level construction engineer videos in 2020!

Advertising marketing, how to reach users accurately?

Mother's Day Marketing Guide

Get in the right posture and see the practical ideas of Google's god-level deep learning framework TensorFlow