After analyzing 6,000 apps, I found that there are so many great apps that I haven’t used yet.

After analyzing 6,000 apps, I found that there are so many great apps that I haven’t used yet.

[[251580]]

Abstract: Nowadays, mobile Internet is becoming more and more developed, and various apps emerge in an endless stream, which leads to the distinction between good and bad. Compared with ordinary apps, we are definitely willing to use those conscientious software, but it is not easy to find these apps. This article uses the Scrapy framework to crawl more than 6,000 apps on the famous application download market "Cool Security Network". Through analysis, the best apps in various categories are found. These apps can be called truly conscientious works, and using them will bring you a brand new mobile phone experience.

1. Analysis background

1.1. Why choose CoolAn

If GitHub is a paradise for programmers, then Coolank is a paradise for mobile app enthusiasts (also known as "machine geeks"). Compared with those traditional mobile app download markets, Coolank has three special features:

  • First, you can search and download various tools and software that are hard to find in other application download markets, such as the terminal desktop "Aris", the most powerful Android reader "Jingdu Tianxia", and the RSS reader "Feedme" mentioned in the previous article.
  • Second, you can find cracked versions of many apps. We advocate "paying for good things", but some apps are very annoying, such as "Baidu Netdisk", where you can find cracked versions of many apps.
  • Third, you can find the historical versions of the App. Many people like to use the latest version of the App and upgrade immediately when there is an update. However, many Apps are becoming more and more utilitarian, more bloated with updates, and full of advertisements. It is better to return to the original and use the earlier version with small size, streamlined functions and no advertisements.

As an App lover, I have found many good Apps on Coolapk. The more I use it, the more I feel that what I know is just the tip of the iceberg. I want to dig out how many good things there are on this website. It is definitely unrealistic to find them one by one manually. Naturally, I think of the best way - to use crawlers to solve it. To achieve this goal, I recently learned the Scrapy crawler framework and crawled about 6,000 Apps on the website. Through analysis, I found high-quality Apps in different fields. Let's take a look.

1.2. Analysis content

An overall analysis of 6,000 apps’ ratings, downloads, size and other indicators.

Based on daily usage scenarios, apps are divided into 10 categories, including system tools, information reading, social entertainment, etc., and high-quality apps in each category are selected.

1.3. Analysis Tools

  • Python
  • Scrapy
  • MongoDB
  • Pyecharts
  • Matplotlib

2. Data Capture

Since the CoolApp mobile app has set up anti-pickup measures, Charles failed to capture the package after trying, so we temporarily used Scrapy to capture the App information on the web. The crawling period ended on November 23, 2018, with a total of 6086 apps and 8 fields of information captured: App name, download volume, rating, number of ratings, number of comments, number of followers, size, and App category label.

2.1. Target website analysis

This is the target webpage we want to crawl. Clicking on the page will reveal two useful pieces of information:

  • Each page shows 10 App information, with a total of 610 pages, which means about 6,100 Apps.
  • The web page request is in GET format, and the URL has only one parameter for increasing the page number, so constructing page turning is very simple.

Next, let's see what information to capture. You can see that the main page displays information such as the App name, download volume, and rating. Click the App icon to enter the details page, and you can see that more complete information is provided, including: category tags, number of ratings, number of followers, etc. Since we need to classify and filter Apps later, category tags are very useful, so here we choose to enter each App homepage to capture the required information indicators.

Through the above analysis, we can determine the crawling process. First, traverse the main page and crawl the URLs of the detail pages of 10 apps. Then the detail pages crawl the indicators of each app. After traversing like this, we need to crawl about 6,000 web page contents. The crawling workload is not small, so we will try to use the Scrapy framework for crawling.

2.2. Introduction to Scrapy Framework

Before introducing the Scrapy framework, let's recall the Pyspider framework. We used it to crawl 50,000 articles from Huxiu.com. It is a crawler tool written by a domestic master. Its Github Star exceeds 10K, but its overall functionality is relatively weak. Is there a more powerful framework than it? Yes, that is the Scrapy framework we are going to talk about here. Its Github Star exceeds 30K. It is the most widely used crawler framework in the Python crawler world. You must know how to use this framework to crawl.

There are many official documents and tutorials about Scrapy on the Internet. Here are a few.

  • Scrapy Documentation
  • Cui Qingcai's Scrapy column
  • Scrapy crawling
  • Scrapy crawls Douban movies

The Scrapy framework is relatively more complex than Pyspider. It has different processing modules, and the project file is composed of several programs. Different crawler modules need to be placed in different programs. So when you first get started, you will feel that the programs are scattered and it is easy to confuse people. It is recommended to take the following ideas to quickly get started with Scrapy:

  • First, quickly go through the reference tutorial above to understand Scrapy's crawler logic and the purpose and coordination of each program.
  • Next, look at the two practical examples above to get familiar with how to write crawlers in Scrapy.
  • Finally, find a website that you are interested in as a crawler project. If you encounter something you don’t understand, read the tutorial or Google it.

This learning path is relatively fast and effective, and is much better than just following the tutorial without doing anything. Next, we will take Coolan.com as an example and use Scrapy to crawl it.

2.3. Capture data

First, we need to install the Scrapy framework. If it is a Windwos system and Anaconda has been installed, then installing the Scrapy framework is very simple. Just open the Anaconda Prompt command window and enter the following command, which will automatically help us install all the libraries that Scrapy needs to be installed and depends on.

  1. conda pip scrapy

2.3.1. Create a project

Next, we need to create a crawler project, so we first switch from the root directory to the working path where the project needs to be placed. For example, the storage path I set here is: E:\my_Python\training\kuan, and then continue to enter the following line of code to create the kuan crawler project:

  1. # Switch working path
  2. e:
  3. cd E:\my_Python\training\kuan
  4. # Generate project
  5. scrapy startproject kuspider

After executing the above command, a scrapy crawler project named kuan will be generated, which contains the following files:

  1. scrapy.cfg # Configuration file for Scrapy deployment
  2. kuan # The module of the project needs to be imported from here
  3. _init__.py
  4. items.py # Define the crawled data structure
  5. middlewares.py # Middlewares
  6. pipelines.py #Data pipeline file, which can be used for subsequent storage
  7. settings.py # Configuration file
  8. spiders # Crawl the main program folder
  9. _init_.py

Next, we need to create a crawling main program in the spiders folder: kuan.py, and then run the following two commands:

  1. cd kuan # Enter the kuan project folder just generated
  2. scrapy genspider kuan www.coolapk.com # Generate the main program file kuan.py

2.3.2. Declare item

After the project files are created, we can start writing the crawler program.

First, you need to pre-define the names of the field information to be crawled in the items.py file, as shown below:

  1. class KuanItem(scrapy.Item):
  2. # define the fields for your item here like :
  3. name = scrapy.Field()
  4. volume = scrapy.Field()
  5. download = scrapy.Field()
  6. follow = scrapy.Field()
  7. comment = scrapy.Field()
  8. tags = scrapy.Field()
  9. score = scrapy.Field()
  10. num_score = scrapy.Field()

The field information here is the 8 field information we located in the web page earlier, including: name represents the App name, volume represents the volume, and download represents the number of downloads. After defining it here, we will use these field information in the subsequent crawling main program.

2.3.3. Crawling the main program

After creating the kuan project, the Scrapy framework will automatically generate some crawling code. Next, we need to add the field parsing content of the web page crawling in the parse method.

  1. class KuanspiderSpider(scrapy.Spider):
  2. name = 'kuan'  
  3. allowed_domains = [ 'www.coolapk.com' ]
  4. start_urls = [ 'http://www.coolapk.com/' ]
  5. def parse(self, response):
  6. pass

Open Dev Tools on the homepage, find the node position of each crawling indicator, and then use CSS, Xpath, regular expressions and other methods to extract and parse. Scrapy supports all these methods and you can choose any of them. Here we use CSS syntax to locate nodes, but it should be noted that Scrapy's CSS syntax is slightly different from the CSS syntax we used with pyquery before. Here are a few examples to compare and explain.

First, we locate the homepage URL node of the first APP. We can see that the URL node is located in the a node under the div node with the class attribute app_left_list. Its href attribute is the URL information we need. Here is a relative address, which will be the complete URL after splicing.

Next, we enter the Coolan details page, select the App name and locate it. We can see that the App name node is located in the text of the p node whose class attribute is .detail_app_title.

After locating these two nodes, we can use CSS to extract field information. Here is a comparison between the conventional writing method and the writing method in Scrapy:

  1. # Conventional writing
  2. url = item( '.app_left_list>a' ).attr( 'href' )
  3. name = item( '.list_app_title' ).text()
  4. # Scrapy Writing
  5. url = item.css( '::attr("href")' ).extract_first()
  6. name = item.css( '.detail_app_title::text' ).extract_first()

As you can see, to get the href or text attribute, you need to use ::, for example, to get text, use ::text. extract_first() means extracting the first element. If there are multiple elements, use extract(). Next, we can refer to write the parsing code for the 8 fields.

First, we need to extract the URL list of apps on the home page, and then go to each app's detail page to further extract 8 fields of information.

  1. def parse(self, response):
  2. contents = response.css( '.app_left_list>a' )
  3. for content in contents:
  4. url = content.css( '::attr("href")' ).extract_first()
  5. url = response.urljoin(url) # concatenate relative urls to absolute urls
  6. yield scrapy.Request(url,callback=self.parse_url)

Here, we use the response.urljoin() method to concatenate the extracted relative URLs into a complete URL, and then use the scrapy.Request() method to construct a request for each App detail page. Here we pass two parameters: url and callback. The url is the detail page URL, and the callback is the callback function, which passes the response returned by the homepage URL request to the parse_url() method specifically used to parse the field content, as shown below:

  1. def parse_url(self, response):
  2. item = KuanItem()
  3. item[ 'name' ] = response.css( '.detail_app_title::text' ).extract_first()
  4. results = self.get_comment(response)
  5. item[ 'volume' ] = results[0]
  6. item[ 'download' ] = results[1]
  7. item[ 'follow' ] = results[2]
  8. item[ 'comment' ] = results[3]
  9. item[ 'tags' ] = self.get_tags(response)
  10. item[ 'score' ] = response.css( '.rank_num::text' ).extract_first()
  11. num_score = response.css( '.apk_rank_p1::text' ).extract_first()
  12. item[ 'num_score' ] = re.search( 'Total (.*?) ratings' , num_score).group ( 1)
  13. yield item
  14.  
  15. def get_comment(self,response):
  16. messages = response.css( '.apk_topba_message::text' ).extract_first()
  17. result = re.findall(r '\s+(.*?)\s+/\s+(.*?) downloads\s+/\s+(.*?) people follow\s+/\s+(.*?) comments.*?' ,messages) # \s+ means match any whitespace character more than once
  18. if result: # not empty
  19. 19 results = list(result[0]) # Extract the first element in the list
  20. 20 return results
  21. twenty one
  22. 22def get_tags(self,response):
  23. 23 data = response.css( '.apk_left_span2' )
  24. 24 tags = [item.css( '::text' ).extract_first() for item in data]
  25. 25 return tags

Here, two methods get_comment() and get_tags() are defined separately.

The get_comment() method extracts information of the four fields volume, download, follow, and comment through regular matching. The regular matching results are as follows:

  1. result = re.findall(r '\s+(.*?)\s+/\s+(.*?) downloads\s+/\s+(.*?) people follow\s+/\s+(.*?) comments.*?' ,messages)
  2. print(result) # Output the result information of the first page
  3. # The result is as follows:
  4. [( '21.74M' , '52.18 million' , '24,000' , '54,000' )]
  5. [( '75.53M' , '27.68 million' , '23 thousand' , '30 thousand' )]
  6. [( '46.21M' , '16.86M' , '23K' , '34K' )]
  7. [( '54.77M' , '16.03 million' , '38,000' , '49,000' )]
  8. [( '3.32M' , '15.3 million' , '15,000' , '3343' )]
  9. [( '75.07M' , '11.27 million' , '16 thousand' , '22 thousand' )]
  10. [( '92.70M' , '11.08M' , '9167' , '13K' )]
  11. [( '68.94M' , '10.72 million' , '5718' , '9869' )]
  12. [( '61.45M' , '9.35M' , '11K' , '16K' )]
  13. [( '23.96M' , '9.25 million' , '4157' , '1956' )]

Then use result[0], result[1], etc. to extract four pieces of information respectively. Taking volume as an example, output the extraction results of the first page:

  1. item[ 'volume' ] = results[0]
  2. print(item[ 'volume' ])
  3. 21.74M
  4. 75.53M
  5. 46.21M
  6. 54.77M
  7. 3.32M
  8. 75.07M
  9. 92.70M
  10. 68.94M
  11. 61.45M
  12. 23.96M

In this way, all the field information of the 10 apps on the first page are successfully extracted and returned to the yielded item generator. Let's output its content:

  1. [
  2. { 'name' : '酷安' , 'volume' : '21.74M' , 'download' : '52.18万' , 'follow' : '24万' , 'comment' : '54万' , 'tags' : "['酷市场', '酷安', '市场', 'coolapk', '安装能必须']" , 'score' : '4.4' , 'num_score' : '14万' },
  3. { 'name' : '微信' , 'volume' : '75.53M' , 'download' : '27.68万' , 'follow' : '23万' , 'comment' : '30万' , 'tags' : "['微信', 'qq', '腾讯', 'tencent', '即时聊天', '安装能必须']" , 'score' : '2.3' , 'num_score' : '11万' },
  4. ...
  5. ]

2.3.4. Paginated crawling

Above, we crawled the first page of content. Next, we need to crawl all 610 pages. There are two ways to do this:

  • The first is to extract the node information of the page turning, then construct a request for the next page, and then repeatedly call the parse method for parsing, and repeat this cycle until the last page is parsed.
  • The second method is to directly construct the URL addresses of 610 pages, and then call the parse method in batches for parsing.

Here, we write the parsing code for two methods respectively. The first method is very simple. Just continue to add the following lines of code after the parse method:

  1. def parse(self, response):
  2. contents = response.css( '.app_left_list>a' )
  3. for content in contents:
  4. ...
  5. next_page = response.css( '.pagination li:nth-child(8) a::attr(href)' ).extract_first()
  6. url = response.urljoin(next_page)
  7. yield scrapy.Request(url,callback=self.parse)

The second method is to define a start_requests() method before the parse() method at the beginning to generate 610 pages of URLs in batches, and then pass them to the following parse() method for parsing through the callback parameter in the scrapy.Request() method.

  1. def start_requests(self):
  2. pages = []
  3. for page in range(1,610): # There are 610 pages in total
  4. url = 'https://www.coolapk.com/apk/?page=%s' %page
  5. page = scrapy.Request(url,callback=self.parse)
  6. pages.append(page)
  7. return pages

The above is the idea of ​​crawling all pages. After crawling successfully, we need to store them. Here, I choose to store them in MongoDB. I have to say that compared with MySQL, MongoDB is much more convenient and hassle-free.

2.3.5. Storing Results

In the pipelines.py program, we define the data storage method. Some parameters of MongoDB, such as the address and database name, need to be stored separately in the settings.py setting file, and then called in the pipelines program.

  1. import pymongo
  2. class MongoPipeline(object):
  3. def __init__(self,mongo_url,mongo_db):
  4. self.mongo_url = mongo_url
  5. self.mongo_db = mongo_db
  6. @classmethod
  7. def from_crawler(cls,crawler):
  8. return cls(
  9. mongo_url = crawler.settings.get( 'MONGO_URL' ),
  10. mongo_db = crawler.settings.get( 'MONGO_DB' )
  11. )
  12. def open_spider(self,spider):
  13. self.client = pymongo.MongoClient(self.mongo_url)
  14. self.db = self.client[self.mongo_db]
  15. def process_item(self,item,spider):
  16. name = item.__class__.__name__
  17. self.db[ name ]. insert (dict(item))
  18. return item
  19. def close_spider(self,spider):
  20. self.client.close ( )

First, we define a MongoPipeline() storage class, which defines several methods. Let's briefly explain them:

  • from crawler() is a class method, marked with @class method. This method is mainly used to obtain the parameters we set in settings.py:
  • 1MONGO_URL = 'localhost' 2MONGO_DB = 'KuAn' 3ITEM_PIPELINES = {4 'kuan.pipelines.MongoPipeline': 300,5}
  • The open_spider() method mainly performs some initialization operations. This method will be called when the Spider is opened.
  • The process_item() method is the most important method, which implements inserting data into MongoDB.

After completing the above code, enter the following line of command to start the entire crawler's crawling and storage process. If running on a single machine, it will take a long time to complete 6,000 web pages, so be patient.

  1. scrapy crawl kuan

Here are two additional points:

First, in order to reduce the pressure on the website, we'd better set a few seconds delay between each request. You can add the following lines of code at the beginning of the KuanSpider() method:

  1. custom_settings = {
  2. "DOWNLOAD_DELAY" : 3, # Delay 3s, default is 0, no delay
  3. "CONCURRENT_REQUESTS_PER_DOMAIN" : 8 # The default concurrency is 8 times per second, which can be reduced appropriately
  4. }

Second, in order to better monitor the operation of the crawler program, it is necessary to set up an output log file, which can be achieved through Python's own logging package:

  1. import logging
  2. logging.basicConfig(filename= 'kuan.log' ,filemode= 'w' , level =logging.WARNING,format= '%(asctime)s %(message)s' ,datefmt= '%Y/%m/%d %I:%M:%S %p' )
  3. logging.warning( "warn message" )
  4. logging.error( "error message" )

The level parameter here indicates the warning level. The severity levels range from low to high: DEBUG < INFO < WARNING < ERROR < CRITICAL. If you don't want the log file to record too much content, you can set a higher level. Here it is set to WARNING, which means that only information above the WARNING level will be output to the log.

The datefmt parameter is added to add a specific time in front of each log, which is very useful.

Above, we have completed the capture of the entire data. With the data, we can start the analysis, but before that, we still need to simply clean and process the data.

3. Data cleaning

First, we read the data from MongoDB and convert it into a DataFrame, then take a look at the basic situation of the data.

  1. def parse_kuan():
  2. client = pymongo.MongoClient(host= 'localhost' , port=27017)
  3. db = client[ 'KuAn' ]
  4. collection = db[ 'KuAnItem' ]
  5. # Convert database data to DataFrame
  6. data = pd.DataFrame(list(collection.find()))
  7. print(data.head())
  8. print(df.shape)
  9. print(df.info())
  10. print(df.describe())

From the first five rows of data output by data.head(), we can see that except for the score column which is in float format, all other columns are in object text type.

Some rows in the five columns of data, comment, download, follow, and num_score, have the suffix "万". You need to remove the characters and convert them into numeric types. The volume column has the suffixes "M" and "K" respectively. In order to unify the sizes, you need to divide "K" by 1024 and convert it into "M" volume.

The entire data has 6086 rows x 8 columns, and there are no missing values ​​in each column.

The df.describe() method makes basic statistics on the score column. We can see that the average score of all apps is 3.9 (out of 5), the lowest score is 1.6, and the highest score is 4.8.

Next, we convert the above columns of text data into numerical data. The code is as follows:

  1. def data_processing(df):
  2. #Process the 5 columns of data: 'comment' , 'download' , 'follow' , 'num_score' , 'volume' , convert the unit 10,000 to the unit 1, and then convert it to numeric type
  3. str = '_ori'  
  4. cols = [ 'comment' , 'download' , 'follow' , 'num_score' , 'volume' ]
  5. for col in cols:
  6. colori = col+str
  7. df[colori] = df[col] # Copy and keep the original column
  8. if not (col == 'volume' ):
  9. df[col] = clean_symbol(df,col)# Process the original column to generate a new column
  10. else :
  11. df[col] = clean_symbol2(df,col)# Process the original column to generate a new column
  12.  
  13. # Convert download to ten thousand units
  14. df[ 'download' ] = df[ 'download' ].apply(lambda x:x/10000)
  15. # Batch conversion to numeric type
  16. df = df.apply(pd.to_numeric,errors= 'ignore' )
  17.  
  18. def clean_symbol(df,col):
  19. # Replace the character "万" with nothing
  20. con = df[col].str. contains ( '10,000$' )
  21. df.loc[con,col] = pd.to_numeric(df.loc[con,col].str. replace ( '万' , '' )) * 10000
  22. df[col] = pd.to_numeric(df[col])
  23. return df[col]
  24.  
  25. def clean_symbol2(df,col):
  26. # Replace the character M with nothing
  27. df[col] = df[col].str. replace ( 'M$' , '' )
  28. # Divide the volume in K by 1024 to convert it to M
  29. con = df[col].str. contains ( 'K$' )
  30. df.loc[con,col] = pd.to_numeric(df.loc[con,col].str. replace ( 'K$' , '' ))/1024
  31. df[col] = pd.to_numeric(df[col])
  32. return df[col]

The above completes the conversion of several columns of text data. Let's check the basic situation:

The download column is the number of App downloads. The App with the most downloads has 51.9 million downloads, the App with the least downloads has 0 (very few), and the average number of downloads is 140,000. The following information can be seen from this:

  • The volume column is the App size. The largest App is nearly 300M, the smallest is almost 0, and the average size is around 18M.
  • Comments are listed as App ratings, with the highest number of ratings reaching over 50,000, and an average of over 200.

The above completes the basic data cleaning process. Next, we will conduct an exploratory analysis of the data.

4. Data Analysis

We mainly analyze App downloads, ratings, size and other indicators from two dimensions: overall and classified.

4.1. General situation

4.1.1. Download ranking

First, let’s take a look at the download volume of the App. Many times when we download an App, the download volume is a very important reference indicator. Since the download volume of most Apps is relatively small, the histogram cannot show the trend, so we choose to segment the data and discretize it into a bar chart. The drawing tool used is Pyecharts.

It can be seen that as many as 5,517 apps (accounting for 84% of the total) have less than 100,000 downloads, and only 20 apps have more than 5 million downloads. To develop a profitable app, user downloads are particularly important. From this point of view, most apps are in an awkward situation, at least on the Coolapk platform.

The code is implemented as follows:

  1. from pyecharts import Bar
  2. # Download distribution
  3. bins = [0,10,100,500,10000]
  4. group_names = [ '<=100,000' , '100,000-1,000,000' , '1,000,000-5,000,000' , '>5,000,000' ]
  5. cats = pd.cut(df[ 'download' ],bins,labels=group_names) # Use pd.cut() method to segment
  6. cats = pd.value_counts(cats)
  7. bar = Bar( 'App download number interval distribution' , 'The vast majority of app downloads are less than 100,000' )
  8. # bar.use_theme( 'macarons' )
  9. bar.add (
  10. 'Number of apps' ,
  11. list( cats.index ),
  12. list( cats.values ​​),
  13. is_label_show = True ,
  14. is_splitline_show = False ,
  15. )
  16. bar.render(path= 'download_interval.png' ,pixel_ration=1)

Next, let's take a look at the 20 most downloaded apps:

As you can see, the "Cool Security" App is far ahead with more than 50 million downloads, nearly twice the 27 million downloads of the second place WeChat. Such a huge advantage is easy to understand. After all, it is a self-owned App. If you don't have "Cool Security" on your phone, it means you are not a real "gadget enthusiast". From the picture we can also see the following information:

  • Among the TOP 20 apps, many are must-haves and can be considered popular apps.
  • From the App rating chart on the right, we can see that only 5 apps have a rating of more than 4 points (out of 5 points), and the vast majority of them have a rating of less than 3 points, or even less than 2 points. Is it because these App developers can't make good apps or don't want to make them at all?
  • Compared with other apps, RE Manager and Green Guardian are very prominent. Among them, RE Manager can still get 4.8 points (the highest score) with such a high download volume and its size is only a few MB, which is rare. What is a "conscientious app"? This is it.

For comparison, let's take a look at the 20 apps with the least downloads.

As you can see, compared with the apps with the most downloads above, these pale in comparison. The one with the least downloads, "Guangzhou Traffic Restriction Pass", has only 63 downloads.

This is not surprising. It may be that the App has not been promoted, or it may have just been developed. With such a small number of downloads, the rating is still good, and it can continue to be updated. I give a thumbs up to these developers.

In fact, this type of app is not embarrassing. The really embarrassing ones are those apps with a lot of downloads but the lowest ratings. They give people the feeling: "I am so bad, so be it. If you have the ability, don't use me."

4.1.2. Rating Ranking

Next, let's take a look at the overall score of the App. Here, the score is divided into the following 4 intervals, and corresponding levels are defined for different scores.

Several interesting phenomena can be found:

  • There are very few softwares with scores below 3 points, accounting for less than 10%. Among the 20 most downloaded apps before, most of the apps such as WeChat, QQ, Taobao, and Alipay scored less than 3 points, which is a bit embarrassing.
  • Apps with medium quality or medium scores are the most numerous.
  • The number of high-scoring apps with scores above 4 points accounts for nearly half (46%). This may be because these apps are indeed good, or it may be because the number of ratings is too small. In order to select the best, it is necessary to set certain screening thresholds in the future.

Next, let’s take a look at the 20 highest-rated apps. Many times, we download apps based on the feeling of “download the one with the highest rating”.

It can be seen that the 20 highest-rated apps all scored 4.8 points, including: RE Manager (appears again), Pure Light Rain Icon Pack, etc. There are also some less common ones, which may be good apps, but we still need to look at the download volume. Their download volumes are all above 10,000. With a certain amount of downloads, the ratings are relatively reliable, and we can download them with confidence to experience them.

After the above overall analysis, we have roughly found some good apps, but it is not enough, so we will subdivide them and set certain filtering conditions.

4.2. Classification

According to the app functions and daily usage scenarios, the apps are divided into the following 9 categories, and then the 20 best apps are selected from each category.

In order to find the best app possible, here are three conditions:

  • Rating no less than 4 points
  • Downloads no less than 10,000
  • Set a total score evaluation indicator (total score = download volume * rating), and then standardize it to a full score of 1000 points as a reference indicator for the App ranking.

After selection, we got the 20 apps with the highest scores in each category. Most of these apps are indeed conscientious software.

4.2.1. System Tools

System tools include: input method, file management, system cleaning, desktop, plug-ins, lock screen, etc.

As you can see, the first place is the well-known old-fashioned file manager "RE Manager". It is only 5M in size. In addition to having all the functions of an ordinary file manager, its biggest feature is the ability to uninstall apps that come with the phone, but it requires Root.

The file analyzer of "ES File Explorer" is very powerful and can effectively clean up bloated mobile phone space.

The App "A Mu Han" is quite awesome. Just as its software introduction says, "It's better to have me than to have many things", when you open it, you will find that it provides dozens of practical functions, such as: translation, image search, express delivery query, making emoticons, etc.

"Super SU", "Storage Cleaner", "Lanthanum", "MT Manager" and "My Android Tools" are all highly recommended. In short, the apps on this list are worthy of being included in your mobile app usage list.

4.2.2. Social Chat

In the social chat category, "Share Weibo Client" ranks first. As a third-party client App, it is naturally better than the official version. For example, compared with the 70M size of the genuine version, it is only one-tenth of its size, and there are almost no advertisements. It also has many additional powerful functions. If you love to browse Weibo, then you might as well try this "Share".

The "Ji Ke" app is also quite good. If you scroll down, you can also see the "Bullet Messenger" which was very popular a while ago. It claims that it will replace WeChat, but it seems that it will not be able to do so in the short term.

You may find that common apps such as Zhihu, Douban, and Jianshu are not on this social list. This is because their ratings are relatively low, only 2.9, 3.5, and 2.9 points respectively, so they naturally cannot be included in this list. If you really want to use them, it is recommended that you use their third-party clients or historical versions.

4.2.3. Information reading

It can be seen that in the information reading category, "Jingdu Tianxia" firmly occupies the first place. I have previously written an article specifically to introduce it: The most powerful reader for Android.

Apps in the same category like "Duokan Reading", "Book Chasing Tool" and "WeChat Reading" also made the list.

In addition, if you often have a headache because you don’t know where to download e-books, you might as well try "Book Search Master" or "Laozi Book Search".

4.2.4. Audiovisual Entertainment

Next is the audio-visual entertainment section, where NetEase's "NetEase Cloud Music" takes the top spot without any pressure, a rare high-quality product from a major company.

If you love playing games, then you should try Adobe AIR.

If you are artistic, you will probably like the short video shooting app "VUE". You can definitely show off by posting your creations to your Moments.

The last one, "Hiby Music", is great. I recently discovered that it has a powerful function that can be used in conjunction with Baidu Netdisk. It can automatically identify audio files and play them.

4.2.5. Communication network

Next is the communication network category, which mainly includes: browser, address book, notification, mailbox and other subcategories.

Each of us has a browser on our mobile phones, and we use them in a variety of ways. Some people use the browser that comes with the phone, while others use big-name browsers such as Chrome and Firefox.

However, you will find that you may not have heard of the first three on the list, but they are really awesome, and it is most appropriate to describe them as "extremely simple and efficient, refreshing and fast". Among them, "Via" and "X Browser" are less than 1M in size, which is truly "small but complete", and highly recommended.

4.2.6. Photographic images

Taking photos and editing them is also a common function. You may have your own photo management software, but here I strongly recommend the first app "Quick Picture Browser". It is only 3M in size, but it can instantly find and load tens of thousands of photos. If you are a photo fanatic, you can open as many photos as you want with it. In addition, it has functions such as hiding private photos and automatically backing up Baidu Netdisk. It is one of the apps I have used the longest.

4.2.7. Documentation

We often need to write and take memos on our mobile phones, so naturally we need good document writing apps.

There is no need to say much about "Evernote", I think it is the best note-taking and summary app.

If you like to write in Markdown, then the exquisite app "Pure Writing" should be very suitable for you.

It is less than 3M in size but has dozens of functions such as cloud backup, generation of long images, automatic spacing between Chinese and English, etc. Even so, it still maintains a design style of simplicity. This is probably the reason why the number of downloads has soared tenfold from 20,000 to 30,000 in just two or three months. Behind this App is a big guy who has sacrificed several years of his spare time to continuously develop and update it. He is worthy of admiration.

4.2.8. Travel, transportation and shopping

In this category, the first place is 12306. When it is mentioned, it reminds us of those weird verification codes. However, the App here is not from the official website, but developed by a third party. The most amazing function should be "Grab the Ticket". If you are still relying on posting on Moments to grab tickets, you might as well try it.

4.2.9. Xposed plugin

The last category is Xposed, which many people may not be familiar with, but many people should know about the red envelope grabbing and anti-withdrawal functions on WeChat. These awesome and unusual functions use various module functions in the Xposed framework. This framework is from the famous foreign XDA mobile phone forum. Some of the so-called software cracked by XDA masters that you often hear about come from this forum.

Simply put, after installing the Xposed framework, you can install some fun and interesting plug-ins in it. With these plug-ins, your phone can achieve more and greater functions. For example: it can remove advertisements, crack App payment functions, kill power-consuming self-starting processes, virtual phone positioning and other functions.

However, using this framework and these plug-ins requires flashing and ROOT, which is a bit high.

5. Summary

This article uses the Scrapy framework to crawl and analyze 6,000 apps on Kuaik.com. Beginners to Scrapy may find the program writing rather messy, so you can try using ordinary function methods first, write the program completely together, and then split it into Scrapy projects. This will also help shift your thinking from a single program to a framework. I will write a separate article about it later.

Since the number of web-based apps is less than that of apps, many useful apps are not included, such as Chrome, MX player, Snapseed, etc. It is recommended to use Coolapk App, where there are more fun things.

The above is the crawling and analysis process of the entire article. The article involves a lot of fine software. If you are interested, you can try to download and experience it.

<<:  iPhone XR sales are not good, Apple uses trade-in to offer big discounts

>>:  Apple has selected 6 apps worth recommending in 2018. How many of them have you played?

Recommend

Apple urgently releases iOS 14.7.1 official version to fix two major bugs

[[413605]] The official version of iOS 14.7.1 has...

YouTube App product analysis report!

This article is a product experience report of Yo...

How to make a good online event promotion plan?

In marketing psychology, herd mentality, greed fo...

How much bandwidth is required for renting different website servers?

How much bandwidth is required for renting differ...

Foreign media: iPhone and iPad will use USB-C interface from next year

[[329485]] According to foreign media reports, Ap...

How to improve conversion rate and encourage users to place more orders?

There are many scenarios for users to place order...

How much does it cost to invest in the Fuzhou Course Mini Program?

How much does it cost to attract investors for th...

520 confession posture, programmers all do this...

Counting with fingers Today is 520 Love Confessio...

5 APP identification channels to save trial and error costs!

Faced with an increasing number of channels, CPs ...

How to correctly understand short video operations?

In most traditional industries, there is actually...

In 2021, many websites will stop working on older versions of Android

Certificate authority Let's Encrypt has warne...