In the past, we used AFN to get JSON data. For example, click here to view JSON data. http://news-at.zhihu.com/api/4/news/latest But for example, Baidu Tieba and Douban Reading below do not provide us with an API to obtain data. Baidu Post Bar: Baidu Tieba data.png Douban Reading: Douban reading data.png At this point we can parse their HTML to get the data we want. Tool Preparation At this time we need two tools, Firefox and FireBug. You can download the FireFox browser from http://www.firefox.com.cn/download/, and then download the FireBug plug-in from the Add-ons Manager in the upper right corner of the menu. FireBug has powerful JavaScript debugging capabilities and can also edit HTML CSS in real time. It is a favorite tool for front-end developers. After downloading and installing, click the Bug icon in the upper right corner to use FireBug to debug the current web page. If you don't know XPath, you can learn from w3school's tutorial. Open FireBug.png Ono Open Source Library Ono is an open source project on Github, which can help us parse XML, HTML tags, and support CSS XPath to search for specific nodes. You may not have heard of this library, but you certainly know its author. Mattt Thompson, the author of AFN and the author of the blog NSHipster. Swift version of similar open source library Ji Java or Android can use Jsoup start All preparations are OK. Let's start coding. Create a new blank project. Note that if you want to add two lines of App Transport Security Settings and Allow Arbitrary Loads YES in Info.plist, you can allow HTTP transmission. App allows Http.png Then use CocoaPods to add the third-party library pod 'Ono'. Here, the HTML data to be parsed is my blog Create another Post class that inherits from NSObject to represent each article. Modify the .h file as follows
Import Ono in the .m file and add a constant Url.
Then we can use AFN to download the HTML data of the URL, and then use XPath to get the XPath representing each article. First open FireFox and FireBug, click the picture below FireBug element selector.png Move the mouse appropriately and click to select an article on the web page. Post data.png At this point we can see that the HTML tree of FireBug is expanded, and we can find that each Tags contain data about an article. We right click , copy its XPath Copy XPath.png The copied result //*[@id="posts"], this Each child node under the node represents an article. Now let's use this XPath to get all the HTML data. Add the following method in Post.m:
And call this method in ViewController.m:
After running, check the Console, we can already get the HTML of each article, and then we will parse the specific data of each article. Switch to FireBug and expand the node of one of the articles Article HTML node.png We can see that under the <h2 class="title"> node
The tag contains the article's URL and article title. <div class="info">Under the node,
The tag has the time when the article was published. At this time, we can right-click the node and copy the XPath of the node such as the article title and publishing time. But here we use relative XPath. The HTML structure of each article is as follows:
So our
Next, let’s analyze the detailed data of each article. Add the following method to Post.m:
Then modify the +(NSArray*)getNewPosts method as follows:
Finally, because the URL of the HTML article we obtained is a relative URL, similar to /post/jazzhands/jazzhands-yuan-ma-shi-xian-fen-xi So we concatenate the domain name in the Setter method, http://BigPi.me
We breakpoint at the position below to view the results: Code breakpoint.png Running it, the results are as follows: Crawl article data results.png So far we can use FireBug + Ono + XPath to parse HTML data I used this method to obtain the HTML of our school's academic management system and created an App that counts grades and calculates GPA. Replenish
The demo for this article can be found at https://github.com/iShawnWang/BlogDemo/tree/master/ParseHTMLDemo |
<<: How to build an Android MVVM application framework
>>: Don't worry about MVC or MVP. Listen to me.
Guests | Li Chuanzhao, Song Xujun Written by | Yu...
Relying on the high-temperature flame ejected fro...
【Today’s cover】 Now is the busy season for shovel...
The Chinese New Year is just a few days away, and...
I believe most of you know the significance of ne...
[[172434]] According to foreign media reports, Go...
Information flow ads are ads located in the updat...
gossip "If the child doesn't want to tak...
How to promote Tik Tok ? What are the channels fo...
Although domestic policies have been relaxed, all...
Recently, the Ecosystem Pattern and Process Team ...
Mobile information flow ads have been very popula...
Information security research company Check Point ...
Information flow platforms all have popular produ...
I believe this phenomenon should be quite common ...