I have been using WeChat for several years and have quite a few WeChat accounts, but do I really know my friends? Which city has the most friends? What is the ratio of male to female friends? What are the signatures of friends? Let's get to know our WeChat friends better today. Operating platform: Windows Python version: Python 3.6 IDE: Sublime Text 1. Preparation 1.1 Library Introduction Only by logging into WeChat can you get the information of WeChat friends. This article uses the wxpy third-party library to log in to WeChat and obtain information. Based on itchat, wxpy improves the usability of the module through a large number of interface optimizations and provides rich functional expansions. Some common scenarios of wxpy: - Control routers, smart homes and other gadgets with open interfaces
- Automatically send logs to your WeChat when running the script
- Add the group owner as a friend and automatically join the group
- Forward messages across accounts or groups
- Automatically chat with people
- Funny
In short, it can be used to realize automated operations of various WeChat personal accounts. 1.2 wxpy library installation wxpy supports Python 3.4-3.6, and 2.7 Replace "pip" in the command below with "pip3" or "pip2" to ensure that the corresponding Python version is installed. Download and install from the official PYPI source (may be slow or unstable in the country): - pip install -U wxpy
Download and install from Douban PYPI mirror source (recommended for domestic users): - pip install -U wxpy -i "https://pypi.doubanio.com/simple/"
1.3 Log in to WeChat There is a robot object in wxpy. The robot Bot object can be understood as a Web WeChat client. Bot will perform a login operation when it is initialized, and a mobile phone scan is required to log in. Through the chats(), friends(), groups(), and mps() methods of the robot object Bot, you can get all the chat objects, friends, group chats, and public account lists of the current robot respectively. This article mainly obtains all friend information through friends() and then processes the data. - from wxpy import *
-
- # Initialize the robot and scan the code to log in
- bot = Bot()
-
- # Get all friends
- my_friends = bot.friends()
- print(type(my_friends))
The following is the output message: - Getting uuid of QR code.
- Downloading QR code.
- Please scan the QR code to log in .
- Please press confirm on your phone.
- Loading the contact, this may take a little while.
- <Login successfully as王强>
- <class 'wxpy.api.chats.chats.Chats' >
The wxpy.api.chats.chats.Chats object is a collection of multiple chat objects, which can be used for search or statistics. The information that can be searched and counted includes sex, province, city, signature, etc. 2. Ratio of male to female WeChat friends 2.1 Statistics Use a dictionary sex_dict to count the number of male and female friends. - # Use a dictionary to count the number of male and female friends
- sex_dict = { 'male' : 0, 'female' : 0}
-
- for friend in my_friends:
- # Statistical gender
- if friend.sex == 1:
- sex_dict[ 'male' ] += 1
- elif friend.sex == 2:
- sex_dict[ 'female' ] += 1
-
- print(sex_dict)
The following is the output: - { 'male' : 255, 'female' : 104}
2.2 Data Presentation This article uses ECharts pie chart to present data. Open the link http://echarts.baidu.com/echarts2/doc/example/pie1.html, and you can see the following content: 1. Original content of echarts pie chart
From the figure, you can see that the left side is the data, and the right side is the presented data graph. Other forms of graphs also have this left-right structure. Take a look at the data on the left: - option = {
- title : {
- text: 'Source of user access to a certain site' ,
- subtext: 'Purely fictitious' ,
- x: 'center'
- },
- tooltip : {
- trigger : 'item' ,
- formatter: "{a} <br/>{b} : {c} ({d}%)"
- },
- legend: {
- orient : 'vertical' ,
- x : 'left' ,
- data:[ 'direct access' , 'email marketing' , 'affiliate advertising' , 'video advertising' , 'search engine' ]
- },
- toolbox:
- show : true ,
- feature : {
- mark: {show: true },
- dataView: {show: true , readOnly: false },
- magicType : {
- show: true ,
- type: [ 'pie' , 'funnel' ],
- option : {
- funnel: {
- x: '25%' ,
- width: '50%' ,
- funnelAlign: 'left' ,
- max : 1548
- }
- }
- },
- restore : {show: true },
- saveAsImage: {show: true }
- }
- },
- calculable : true ,
- series: [
- {
- name : 'Access source' ,
- type: 'pie' ,
- radius : '55%' ,
- center: [ '50%' , '60%' ],
- data:[
- {value:335, name : 'Direct access' },
- {value:310, name : 'Email Marketing' },
- {value:234, name : 'Alliance Advertising' },
- {value:135, name : 'Video Ad' },
- {value:1548, name : 'search engine' }
- ]
- }
- ]
- };
You can see that the data in JSON format is in the curly brackets after option =. Next, let's analyze the data: - title: title
- text: title content
- subtext: subtitle
- x: Title position
- tooltip: Tips, put the mouse on the pie chart to see the tips
- legend: legend
- orient: direction
- x: legend position
- data: Legend content
- Toolbox: Toolbox, icons arranged horizontally in the upper right corner of the pie chart
- mark: auxiliary line switch
- dataView: data view, click to view pie chart data
- magicType: switch between pie chart and funnel chart
- restore: restore
- saveAsImage: Save as image
- calculable: I don't know what it is used for yet.
- series: main data
- data: data to be presented
The data formats of other types of graphs are similar and will not be analyzed in detail later. You only need to modify data, legend->data, series->data. The modified data is: - option = {
- title : {
- text: 'Gender ratio of WeChat friends' ,
- subtext: 'Real data' ,
- x: 'center'
- },
- tooltip : {
- trigger : 'item' ,
- formatter: "{a} <br/>{b} : {c} ({d}%)"
- },
- legend: {
- orient : 'vertical' ,
- x : 'left' ,
- data:[ 'Male' , 'Female' ]
- },
- toolbox:
- show : true ,
- feature : {
- mark: {show: true },
- dataView: {show: true , readOnly: false },
- magicType : {
- show: true ,
- type: [ 'pie' , 'funnel' ],
- option : {
- funnel: {
- x: '25%' ,
- width: '50%' ,
- funnelAlign: 'left' ,
- max : 1548
- }
- }
- },
- restore : {show: true },
- saveAsImage: {show: true }
- }
- },
- calculable : true ,
- series: [
- {
- name : 'Access source' ,
- type: 'pie' ,
- radius : '55%' ,
- center: [ '50%' , '60%' ],
- data:[
- {value:255, name : 'Male' },
- {value:104, name : 'Female' }
- ]
- }
- ]
- };
After the data modification is completed, click the green refresh button on the page to get the pie chart as follows (you can modify the theme according to your preferences): 2. Gender ratio of friends
Move the mouse over the pie chart to see detailed data: 3. View data on gender ratio of friends
3. WeChat friends nationwide distribution map 3.1 Data Statistics - # Use a dictionary to count the number of friends in each province
- province_dict = { 'Beijing' : 0, 'Shanghai' : 0, 'Tianjin' : 0, 'Chongqing' : 0,
- 'Hebei' : 0, 'Shanxi' : 0, 'Jilin' : 0, 'Liaoning' : 0, 'Heilongjiang' : 0,
- 'Shaanxi' : 0, 'Gansu' : 0, 'Qinghai' : 0, 'Shandong' : 0, 'Fujian' : 0,
- 'Zhejiang' : 0, 'Taiwan' : 0, 'Henan' : 0, 'Hubei' : 0, 'Hunan' : 0,
- 'Jiangxi' : 0, 'Jiangsu' : 0, 'Anhui' : 0, 'Guangdong' : 0, 'Hainan' : 0,
- 'Sichuan' : 0, 'Guizhou' : 0, 'Yunnan' : 0,
- 'Inner Mongolia' : 0, 'Xinjiang' : 0, 'Ningxia' : 0, 'Guangxi' : 0, 'Tibet' : 0,
- 'Hong Kong' : 0, 'Macau' : 0}
-
- # Statistics province
- for friend in my_friends:
- if friend.province in province_dict.keys():
- province_dict[friend.province] += 1
-
- # To facilitate data presentation, generate data in JSON Array format
- data = []
- for key , value in province_dict.items():
- data.append({ 'name' : key , 'value' : value})
-
- print(data)
The following is the output: - [{ 'name' : 'Beijing' , 'value' : 91}, { 'name' : 'Shanghai' , 'value' : 12}, { 'name' : 'Tianjin' , 'value' : 15}, { 'name' : 'Chongqing' , 'value' : 1}, { 'name' : 'Hebei' , 'value' : 53}, { 'name' : 'Shanxi' , 'value' : 2}, { 'name' : 'Jilin' , 'value' : 1}, { 'name' : 'Liaoning' , 'value' : 1}, { 'name' : 'Heilongjiang' , 'value' : 2}, { 'name' : 'Shaanxi' , 'value' : 3}, { 'name' : 'Gansu' , 'value' : 0}, { 'name' : 'Qinghai' , 'value' : 0}, { 'name' : 'Shandong' , 'value' : 7}, { 'name' : 'Fujian' , 'value' : 3}, { 'name' : 'Zhejiang' , 'value' : 4}, { ' name ' : 'Taiwan' , 'value' : 0} , { 'name' : ' Henan ' , 'value' : 1}, { 'name' : 'Hubei' , 'value' : 4}, { 'name' : 'Hunan' , 'value' : 4}, { 'name' : 'Jiangsu' , 'value' : 9}, { 'name' : 'Anhui' , 'value' : 2}, { 'name' : 'Guangdong' , 'value' : 63}, { 'name' : 'Hainan' , 'value' : 0}, { 'name' : 'Sichuan' , 'value' : 2}, { 'name' : 'Guizhou' , 'value' : 0}, { 'name' : 'Yunnan' , 'value' : 1}, { 'name' : 'Inner Mongolia' , 'value' : 0}, { 'name' : 'Xinjiang' , 'value' : 2}, { 'name' : 'Ningxia' , 'value' : 0}, { 'name' : 'Guangxi' , 'value' : 1}, { 'name' : 'Tibet' , 'value' : 0}, { 'name' : 'Hong Kong' , 'value' : 0}, { 'name' : 'Macao' , 'value' : 0}]
It can be seen that the province with the most friends is Beijing. So the question is: why do we need to reorganize the data into this format? Because the map of ECharts requires data in this format. 3.2 Data Presentation Use ECharts map to present the data of friend distribution. Open the URL and modify the data on the left to: - option = {
- title : {
- text: 'WeChat friends nationwide distribution map' ,
- subtext: 'Real data' ,
- x: 'center'
- },
- tooltip : {
- trigger : 'item'
- },
- legend: {
- orient: 'vertical' ,
- x: 'left' ,
- data:[ 'Number of friends' ]
- },
- dataRange: {
- min : 0,
- max : 100,
- x: 'left' ,
- y: 'bottom' ,
- text:[ 'high' , 'low' ], // text, default is numeric text
- calculable : true
- },
- toolbox:
- show: true ,
- orient : 'vertical' ,
- x: 'right' ,
- y: 'center' ,
- feature : {
- mark: {show: true },
- dataView: {show: true , readOnly: false },
- restore : {show: true },
- saveAsImage: {show: true }
- }
- },
- roamController: {
- show: true ,
- x: 'right' ,
- mapTypeControl: {
- 'china' : true
- }
- },
- series: [
- {
- name : 'Number of friends' ,
- type: 'map' ,
- mapType: 'china' ,
- roam: false ,
- itemStyle:{
- normal:{label:{show: true }},
- emphasis:{label:{show: true }}
- },
- data:[
- { 'name' : 'Beijing' , 'value' : 91},
- { 'name' : 'Shanghai' , 'value' : 12},
- { 'name' : 'Tianjin' , 'value' : 15},
- { 'name' : 'Chongqing' , 'value' : 1},
- { 'name' : 'Hebei' , 'value' : 53},
- { 'name' : 'Shanxi' , 'value' :2},
- { 'name' : 'Jilin' , 'value' : 1},
- { 'name' : 'Liaoning' , 'value' : 1},
- { 'name' : 'Heilongjiang' , 'value' : 2},
- { 'name' : 'Shaanxi' , 'value' : 3},
- { 'name' : 'Gansu' , 'value' :0},
- { 'name' : 'Qinghai' , 'value' :0},
- { 'name' : 'Shandong' , 'value' : 7},
- { 'name' : 'Fujian' , 'value' : 3},
- { 'name' : 'Zhejiang' , 'value' : 4},
- { 'name' : 'Taiwan' , 'value' :0},
- { 'name' : 'Henan' , 'value' : 1},
- { 'name' : 'Hubei' , 'value' : 4},
- { 'name' : 'Hunan' , 'value' : 4},
- { 'name' : 'Jiangxi' , 'value' : 4},
- { 'name' : 'Jiangsu' , 'value' : 9},
- { 'name' : 'Anhui' , 'value' :2},
- { 'name' : 'Guangdong' , 'value' : 63},
- { 'name' : 'Hainan' , 'value' : 0},
- { 'name' : 'Sichuan' , 'value' : 2},
- { 'name' : 'Guizhou' , 'value' : 0},
- { 'name' : 'Yunnan' , 'value' : 1},
- { 'name' : 'Inner Mongolia' , 'value' : 0},
- { 'name' : 'Xinjiang' , 'value' : 2},
- { 'name' : 'Ningxia' , 'value' : 0},
- { 'name' : 'Guangxi' , 'value' :1},
- { 'name' : 'Tibet' , 'value' : 0},
- { 'name' : 'Hong Kong' , 'value' : 0},
- { 'name' : 'Macao' , 'value' : 0}
- ]
- }
- ]
- };
Note two points: - dataRange->max is adjusted appropriately according to the statistical data
- The data format of series->data
After clicking the refresh button, the following map can be generated: Friends nationwide distribution map From the picture, you can see that my friends are mainly distributed in Beijing, Hebei and Guangdong. Interestingly, there is a slider on the left side of the map, which represents the range of the map data. If we pull the upper slider to the bottom, we can see the provinces where there are no WeChat friends: 5. Provinces without WeChat friends
Following this idea, we can see the provinces where the exact number of friends are distributed on the map. Readers can try it out. 4. Friend signature statistics 4.1 Statistics - def write_txt_file(path, txt):
- '' '
- Write txt text
- '' '
- with open (path, 'a' , encoding= 'gb18030' , newline= '' ) as f:
- f.write(txt)
-
- # Statistical signature
- for friend in my_friends:
- # Clean the data and remove factors such as punctuation that affect word frequency statistics
- pattern = re.compile(r '[一-龥]+' )
- filterdata = re.findall(pattern, friend.signature)
- write_txt_file( 'signatures.txt' , '' . join (filterdata))
The above code implements the function of cleaning and saving friends' signatures. After execution, the signatures.txt file will be generated in the current directory. 4.2 Data Presentation The data is presented using word frequency statistics and word cloud display, through which we can understand the life attitudes of our WeChat friends. The word frequency statistics use jieba, numpy, pandas, scipy, wordcloud libraries. If these libraries are not on your computer, execute the installation instructions: - pip install jieba
- pip install pandas
- pip install numpy
- pip install scipy
- pip install wordcloud
4.2.1 Reading txt files We have saved the friend's signature into a txt file before, now we read it out: - def read_txt_file(path):
- '' '
- Read txt text
- '' '
- with open (path, 'r' , encoding= 'gb18030' , newline= '' ) as f:
- return f.read ( )
4.2.2 stop word Let's introduce a concept: stop words. There are a lot of common words in the website, such as "in", "inside", "also", "of", "it", "for", etc. These words are stop words. Because these words are used too frequently and exist on almost every web page, search engine developers ignore all of them. If there are a lot of such words on our website, it is equivalent to wasting a lot of resources. Search for stpowords.txt in Baidu to download it and put it in the same directory as the py file. - content = read_txt_file(txt_filename)
- segment = jieba.lcut(content)
- words_df=pd.DataFrame({ 'segment' :segment})
-
- stopwords=pd.read_csv( "stopwords.txt" ,index_col= False ,quoting=3,sep= " " ,names=[ 'stopword' ],encoding= 'utf-8' )
- words_df=words_df[~words_df.segment.isin(stopwords.stopword)]
4.2.3 Word frequency statistics The highlight is here, word frequency statistics using numpy: - import numpy
-
- words_stat = words_df.groupby( by =[ 'segment' ])[ 'segment' ].agg({ "count" :numpy. size })
- words_stat = words_stat.reset_index().sort_values( by =[ "Count" ], ascending= False )
4.2.4 Word frequency visualization: word cloud Although the word frequency statistics are out, we can see the ranking, but it is not perfect. Next, we will visualize it. We use the wordcloud library, which is described in detail on github. - from scipy.misc import imread
- from wordcloud import WordCloud, ImageColorGenerator
-
-
- # Set word cloud properties
- color_mask = imread( 'background.jfif' )
- wordcloud = WordCloud(font_path= "simhei.ttf" , # Set the font to display Chinese
- background_color = "white" , # background color
- max_words=100, # Maximum number of words displayed in the word cloud
- mask=color_mask, #Set background image
- max_font_size=100, # font size
- random_state=42,
- width=1000, height=860, margin=2, # Set the default size of the image, but if you use a background image, # then the saved image size will be saved according to its size, margin is the distance from the word edge
- )
-
- # Generate word cloud, you can use generate to input all text, or we can use generate_from_frequencies function after calculating the word frequency
- word_frequence = {x[0]:x[1] for x in words_stat.head(100) .values }
- print(word_frequence)
- word_frequence_dict = {}
- for key in word_frequence:
- word_frequence_dict[ key ] = word_frequence[ key ]
-
- wordcloud.generate_from_frequencies(word_frequence_dict)
- # Generate color values from background image
- image_colors = ImageColorGenerator(color_mask)
- # Recolor
- wordcloud.recolor(color_func=image_colors)
- # Save the image
- wordcloud.to_file( 'output.png' )
- plt.imshow(wordcloud)
- plt.axis( "off" )
- plt.show()
The running effect diagram is as follows (the left picture is the background picture, and the right picture is the generated word cloud picture): 6. Comparison between background image and word cloud image
From the word cloud chart, we can analyze the characteristics of friends: - Do--------------------Action
- Life, life--------love life
- Happy-----------------Optimistic
- Choice------------------Decision
- Professional-----------------Professional
- Love--------------------Love
5. Summary So far, the analysis of WeChat friends has been completed. wxpy has many other functions, such as chatting, viewing public account information, etc. Interested readers are welcome to refer to the official documentation. 6. Complete code The above code is relatively loose. The complete code shown below encapsulates each functional module into a function: - #-*- coding: utf-8 -*-
- import re
- from wxpy import *
- import jieba
- import numpy
- import pandas as pd
- import matplotlib.pyplot as plt
- from scipy.misc import imread
- from wordcloud import WordCloud, ImageColorGenerator
-
- def write_txt_file(path, txt):
- '' '
- Write txt text
- '' '
- with open (path, 'a' , encoding= 'gb18030' , newline= '' ) as f:
- f.write(txt)
-
- def read_txt_file(path):
- '' '
- Read txt text
- '' '
- with open (path, 'r' , encoding= 'gb18030' , newline= '' ) as f:
- return f.read ( )
-
- def login():
- # Initialize the robot and scan the code to log in
- bot = Bot()
-
- # Get all friends
- my_friends = bot.friends()
-
- print(type(my_friends))
- return my_friends
-
- def show_sex_ratio(friends):
- # Use a dictionary to count the number of male and female friends
- sex_dict = { 'male' : 0, 'female' : 0}
-
- for friend in friends:
- # Statistical gender
- if friend.sex == 1:
- sex_dict[ 'male' ] += 1
- elif friend.sex == 2:
- sex_dict[ 'female' ] += 1
-
- print(sex_dict)
-
- def show_area_distribution(friends):
- # Use a dictionary to count the number of friends in each province
- province_dict = { 'Beijing' : 0, 'Shanghai' : 0, 'Tianjin' : 0, 'Chongqing' : 0,
- 'Hebei' : 0, 'Shanxi' : 0, 'Jilin' : 0, 'Liaoning' : 0, 'Heilongjiang' : 0,
- 'Shaanxi' : 0, 'Gansu' : 0, 'Qinghai' : 0, 'Shandong' : 0, 'Fujian' : 0,
- 'Zhejiang' : 0, 'Taiwan' : 0, 'Henan' : 0, 'Hubei' : 0, 'Hunan' : 0,
- 'Jiangxi' : 0, 'Jiangsu' : 0, 'Anhui' : 0, 'Guangdong' : 0, 'Hainan' : 0,
- 'Sichuan' : 0, 'Guizhou' : 0, 'Yunnan' : 0,
- 'Inner Mongolia' : 0, 'Xinjiang' : 0, 'Ningxia' : 0, 'Guangxi' : 0, 'Tibet' : 0,
- 'Hong Kong' : 0, 'Macau' : 0}
-
- # Statistics province
- for friend in friends:
- if friend.province in province_dict.keys():
- province_dict[friend.province] += 1
-
- # To facilitate data presentation, generate data in JSON Array format
- data = []
- for key , value in province_dict.items():
- data.append({ 'name' : key , 'value' : value})
-
- print(data)
-
- def show_signature(friends):
- # Statistical signature
- for friend in friends:
- # Clean the data and remove factors such as punctuation that affect word frequency statistics
- pattern = re.compile(r '[一-龥]+' )
- filterdata = re.findall(pattern, friend.signature)
- write_txt_file( 'signatures.txt' , '' . join (filterdata))
-
- # Read the file
- content = read_txt_file( 'signatures.txt' )
- segment = jieba.lcut(content)
- words_df = pd.DataFrame({ 'segment' :segment})
-
- # Read stopwords
- stopwords = pd.read_csv( "stopwords.txt" ,index_col= False ,quoting=3,sep= " " ,names=[ 'stopword' ],encoding= 'utf-8' )
- words_df = words_df[~words_df.segment.isin(stopwords.stopword)]
- print(words_df)
-
- words_stat = words_df.groupby( by =[ 'segment' ])[ 'segment' ].agg({ "count" :numpy. size })
- words_stat = words_stat.reset_index().sort_values( by =[ "Count" ], ascending= False )
-
- # Set word cloud properties
- color_mask = imread( 'background.jfif' )
- wordcloud = WordCloud(font_path= "simhei.ttf" , # Set the font to display Chinese
- background_color = "white" , # background color
- max_words=100, # Maximum number of words displayed in the word cloud
- mask=color_mask, #Set background image
- max_font_size=100, # font size
- random_state=42,
- width=1000, height=860, margin=2, # Set the default size of the image, but if you use a background image, # then the saved image size will be saved according to its size, margin is the distance from the word edge
- )
-
- # Generate word cloud, you can use generate to input all text, or we can use generate_from_frequencies function after calculating the word frequency
- word_frequence = {x[0]:x[1] for x in words_stat.head(100) .values }
- print(word_frequence)
- word_frequence_dict = {}
- for key in word_frequence:
- word_frequence_dict[ key ] = word_frequence[ key ]
-
- wordcloud.generate_from_frequencies(word_frequence_dict)
- # Generate color values from background image
- image_colors = ImageColorGenerator(color_mask)
- # Recolor
- wordcloud.recolor(color_func=image_colors)
- # Save the image
- wordcloud.to_file( 'output.png' )
- plt.imshow(wordcloud)
- plt.axis( "off" )
- plt.show()
-
- def main():
- friends = login()
- show_sex_ratio(friends)
- show_area_distribution(friends)
- show_signature(friends)
-
- if __name__ == '__main__' :
- main()
Author: Wang Qiang, a Python fanatic. |