Zhihu Blue Ocean: A Guide to Mining 20 Million Traffic Opportunities

Zhihu Blue Ocean: A Guide to Mining 20 Million Traffic Opportunities

According to my incomplete statistics of local data (Zhihu Baidu top 3 keywords)

Baidu PC keyword traffic is: 127.43 million

Zhihu’s actual total traffic is: 127.43 million x 0.15 (average click-through rate) = 19.11 million

This is just looking at the PC side. An opportunity of more than 2000W traffic is in front of us.

The premise of making money through the Internet is to obtain traffic first, and traffic is more valuable than gold now.

The truth is, I have more than 5 friends around me who have made 60,000 to 250,000 yuan in profits in the past six months by relying on this opportunity.

We only need a pair of hardworking hands and a clear mind.


Why is there this traffic opportunity?

What exactly does traffic opportunity refer to?

How do we get traffic from it?

Now, let me open the door to this traffic for you.

Reading guide: Different from the various "cool articles" on the market, this article uses a narrative approach to describe how to "go from 0 to 1" according to my actual thinking. Friends need to think while reading. It is recommended to take a whole block of time (10-20 minutes) to read.

1. The game of capital

There is a saying circulating in the "jianghu", which roughly goes like this:

Baidu Daddy, a webmaster harvester and traffic interception expert, invested in Zhihu in August 2019, led by Kuaishou. Subsequently, Baidu increased its rights over Zhihu, and traffic performance continued to rise.

When I read this passage, I questioned the amount of information I received. Why?

Friends who are familiar with communication should know a basic principle:

For everything, we should try to focus on factual judgments rather than value judgments

Because factual judgments are conclusive and can reach consensus, while value judgments depend on perspective and position and can be interpreted in many ways.

The investment event here is a description of facts, and the subsequent impact is a description of value.

However, there are many versions of such a simple factual description on the Internet, some of which state the wrong time point, and some of which state the wrong investor.

After verification, you will find that Baidu has also invested in Kuaishou, which may be another opportunity?

Sometimes ideas are derived from facts.

So regarding value judgment, is there really a traffic increase? Is it really an increase in power?

Let’s verify the data directly (here we take the data of Aizhan for half a year from the investment time point of August 2019, and a slight error is not a big deal):

Word quantity data

Through the word count data, we can observe the following two points:

Since mid-November 2019, traffic has grown dramatically, with the number of words increasing from 300,000 to 2.7 million, nearly 10 times!

Since July 2020, traffic growth has slowed down, but it still maintains an upward trend

So how did this traffic grow?

Data collection

We can observe the following two points through the collected data:

Although the data caliber is different, during the period of rapid growth in traffic, the number of included pages did not increase. In other words, the ranking of the original included pages under the corresponding search terms has improved, which is a solid proof of the increase in authority.

When the indexed pages cannot cover more search terms, Baidu's directional traffic to Zhihu will reach a critical value, hiccup~

The above analysis can easily lead to a sense of "nonsense" because the analysis results are basically the same as the information received for the first time, and our brain cannot process the same information.

This is exactly the difference between the two ways of thinking: "induction" and "deduction".

If no verification is done, the inductive thinking implicitly assumes that Baidu's privilege escalation is real, resulting in all subsequent actions being based on an assumption

Each step of deductive thinking is based entirely on the premise that the condition is "true". Think about what would happen if the analysis result is the opposite.

In this era of information explosion, we really need the ability to filter information, and independent thinking is particularly important. However, independent thinking does not mean that we have to put forward different opinions on everything.

Effective thinking must be based on sufficient knowledge accumulation, otherwise it is just blind thinking.

If you are in an unfamiliar field, learning from your peers is still a good choice

So although the traffic growth has slowed down, Zhihu has not "swallowed" all of this huge traffic. There is still and certainly is an opportunity to take advantage of this bonus period to gain traffic and make money.

Let's keep going!

2. SEO?

Inclusion? Ranking? Improvement? If you have any questions, then you may not know much about the field of SEO. Here is a brief description

SEO is to adjust the website by understanding the rules of search engines (hereinafter referred to as SE) to improve its ranking on the target search engine and achieve the purpose of obtaining traffic.

Collection: SE's crawler system crawls web pages and caches them on the server

Weight: SE's comprehensive score for the site, the main basis for ranking

Ranking: The ranking position of the page cache in the search results

The above 3 points are all dynamic changes

So, how is search traffic generated?

First, the user enters a search term (query) and initiates a search request to the SE. The SE ranks the cached pages through an algorithm and returns them to the front end (browser). The user observes the search results and clicks on the pages from the search results according to his or her preferences.

If a page wants to have traffic, it must first be included (cached by SE), then ranked high (top 10), then people must search for it (search volume), and finally it must be attractive enough to make people want to click on it (title + description)

In the click-through rate, Zhihu has an important innate advantage. After years of positioning and development as a "knowledge-based" platform, users have built up a natural trust in the Zhihu brand. This may lead to a higher click-through rate than the market even if the ranking is not in the top 3.

This time, Baidu provides Zhihu with targeted traffic, and Zhihu improves the traffic efficiency. It's wonderful!

3. Blue Ocean Problem + Blue Ocean Traffic

So where are our opportunities?

Honghong was short of money recently, so he searched Baidu for "how to make money quickly" (real data, just for example), and found that a certain page on Zhihu ranked first.

Then he clicked into the page with a trembling hand, and looked at the empty page, with a slight change in his facial expression.

This is such a bummer!

5 years of online earning experience has given me a keen sense of smell, this is an opportunity

So I got a million-level keyword + Zhihu data, and after screening and analysis, I found that a considerable number of question pages have search traffic but the following situations exist:

Answer unresolved search needs

Low quality answers

Few answers

Answer the first N likes

So can we find such questions, write our own answers, and then make them rank higher and direct traffic to our own carriers (WeChat/official accounts, etc.)?

The answer is yes!

To sum up, we call the problems with search traffic and low competition "blue ocean problems", and the collection of these problem traffic is collectively called "blue ocean traffic"

Here's a little bomb, friends, try it first~ (SE rankings are dynamic, your actual search may be slightly different. In addition, considering the openness, I roughly selected an example)

BOOM! That’s right, it’s gay. The same question ranks second on both PC and mobile. The average monthly search volume on mobile is 447,000, and the average monthly search volume on PC is 95,000, which adds up to 500,000. The click-through rate of the second-ranked question is about 20%, which means that this question has 100,000 SEO traffic per month. What about the answer?

The No. 1 has only 58 likes. Is there a chance to move up? Yes! Is there a way to monetize it?

4. Break through cognitive limitations

Some of you may get impatient here and start thinking about how your industry should operate.

But what if there is no blue ocean traffic in your industry? Why do you have to do it in a field you are familiar with?

Traffic experts always think from a big picture perspective, that is, they think about problems from a global perspective.

This time, we want to analyze the distribution of Zhihu's overall search traffic. We will go wherever there is blue ocean traffic, not just to a certain question or industry.

Even Zhihu Good Things can be completely based on the thinking of blue ocean traffic

Always remember that we have only one goal, which is to make money

At the same time, this is also the main idea behind my writing the public account [TACE] (Traffic ACE, Traffic Master), but later I was busy with projects and rarely posted articles, ahem….

I have talked about a lot of things before, because I want to make it clear about the "Tao" level, that is, why we do it this way; and the "method" is dead, if the rules change, the method will immediately become invalid.

For example: When Tesla was first established, the cost of batteries was 10 times lower than that of the market at that time. Why was CEO Musk able to do this?

That's because his philosophy is "physical thinking", breaking things down into the smallest units to find solutions (TED has a speech)

However, 80% of people like to get the method directly, why?

Dad said he heard from his grandfather that hundreds of thousands of years ago, when humans were still in the hunting stage, the brain was born in order to survive.

The evolution of the brain takes millions of years, and the human race has only been around for about 200,000 years, which means we are still using the "old brain"

One of the most notable characteristics of the "old brain" is the principle of least force. Humans are naturally inclined to perform behaviors that consume less brain power, that is, they will not use their brains if they can, while learning principles requires a higher level of brain power.

Including me, whenever I feel too lazy to use my brain, I mock myself as a primitive man, ahem…

Then, let us step into the "battlefield" step by step.

5. Build a million-level vocabulary

The word library is a collection of user search terms and word attributes

We try to collect keywords from N channels as much as possible, because each channel or third-party platform has its limitations.

In the eyes of traffic experts, what lies in the vocabulary is not individual keywords, but RMB.

From the perspective of search traffic, in most cases, adding words equals adding traffic.

If you can find words that others can't find, you can get traffic that others can't get, and thus make money that others can't make.

Regarding the data storage format, I personally recommend using csv format directly, storing it in a local file with commas as delimiters. Compared with mysql-like databases, it is not very convenient to query and analyze using Bash shell.

Channels for getting words:

5118, Aizhan, the webmaster’s home.

Let me use 5118 as an example

5.1 Mother Word Acquisition


Download Baidu PC keywords and mobile keywords separately and process them separately

Friends who don't have membership can go to Taobao by themselves. Friends who have the enterprise version are advised to export all the data.

In the next steps, we will start to involve some programming knowledge:

Bash shell (Linux) + Python

Because conventional tools can no longer meet the needs of this data calculation, we have to use the "mysterious" power of programming

I have developed all of them myself, and some simple Bash shell command lines are given directly in the article

But I believe that this alone will discourage 80% of people. But including me, who hasn't come up step by step from being a rookie?

Programming is really not that difficult, trust me! If you can, tell yourself to do that 20%

Also remember that we are not trying to become a professional programmer, but programming skills that can meet our current needs are good enough.

2) Initial processing

Transcoding (GBK > UTF-8), because the data encoding given by 5118 is GBK, and UTF-8 is required in Linux

Only keywords are output, and no other data is used, because the accuracy of third-party data is really unsatisfactory. For a site of 5118's scale, the daily update volume is at least 100 million, and the cost is there.

The first one to obtain the top 100 is because the data accuracy is low, and we have to verify the data ourselves later. Secondly, as mentioned earlier, dynamic ranking & Baidu authority enhancement, there is a time difference between you obtaining the data and verifying the data. During the time difference, the ranking may have changed.

bash shell:

cat input file name | iconv -c -f GB18030 -t utf-8 | grep -Ev "Entire domain Baidu PC keyword ranking list | Baidu index | outside 100" | awk -F, '{print $1}' > output file name

3) Keyword cleaning

Special symbols


This is a step that is easily overlooked. Many people naturally trust keyword data produced by different channels (including Baidu), but the search volume data of "traffic experts" and "traffic experts" are very different.



Year replacement, for example, 2010 is replaced with 2020

Chinese length>=2 (optional)

4) Remove sensitive words

You know the illegal words. Here we use the DFA algorithm, which takes less than 0.1s to process a keyword on average.

5) Deduplication

Deduplication is a very important step, but it requires a lot of memory, which means that the size of the file you want to deduplicate cannot exceed the available memory size.

The current solution is to use sort + uniq. First, split the target file with split, then sort it one by one, and then use sort + uniq to merge and remove duplicates.

Although the memory usage is not significantly reduced, the computational efficiency is improved.

Simplified bash shell version:

cat input file name | sort | uniq > input file name

bash shell big data version:

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :


Save as filename.sh, create a words_split folder in the current directory, and then use the following command line to specify the path for both input and output files

sh script name.sh input file output file

OK, the processing is complete. Now we have obtained two very "clean" mother word data, namely, Zhihu Baidu PC keywords and mobile keywords

5.2 Word Expansion

Word expansion is to expand the obtained mother word, because a page may hit multiple related keywords

We can further assume that the words obtained from third-party platforms are only a subset of the words they can find and that Zhihu can currently hit.

We need to find out as many words as possible from other parts, so as to more accurately estimate the Baidu traffic of a question page.

Suppose there are two questions A and B. In your vocabulary, A hits 50 keywords with a total traffic of 10,000, and B hits 10 keywords with a traffic of 100.

Then you might ignore problem B and only deal with problem A.

However, question B actually hit 100 keywords and had a traffic of 100,000

In this way, due to the incompleteness of the data, the information gap is caused, and the opportunity to obtain these traffic is directly missed.

For example:

After expansion, this page hits 47 keywords in total, and the total PC + mobile traffic is 132W. There are too many ads, so Zhihu is forced to issue risk control reminders. The following is a display of this part of the data

How about it? Are you beginning to feel the charm of data? Cheer up, let's keep going!

Since we only do Baidu traffic, we will only use Baidu to expand

1) Related search + drop-down box word capture

Many people only know how to use these two channels, but do not know the nature of these two channels:

Related Searches

Related searches are horizontal expansions, most of which are related expansions across keyword topics. There may be serious drift in topics. To ensure relevance, only one round of crawling is performed.

Drop-down box

The drop-down box is vertically expanded, and most of them add affixes to the end of the keywords.

The significance of clarifying the nature of the channel is that for text data such as keywords, there are only two directions for expansion. Other channel expansion methods are superpositions or variations of these two basic directions.

Because the data produced by different terminals may be different, we need to expand the mother words of PC and mobile terminals on the same port separately.

That is, PC mother word grabs PC related search + PC drop-down box, mobile mother word grabs mobile related search + drop-down box

2) Baidu promotion background word expansion

The path is: Register/Login > Enter Search Promotion > Promotion Management > Keyword Planner > Keywords

Registration is free, and you can also use Aichi SEM tools/Duniu SEO tools, etc.

3) Word processing

First merge the word points of each channel into one

bash shell:

cat file1.txt file2.txt > all.txt

Then repeat the keyword cleaning and deduplication part of [5.1 Mother Word Acquisition]

5.3 Obtain keyword traffic

The keyword planner in Baidu promotion backend is also used, but the "traffic query" function is used.

This is the traffic data given by Baidu. The previous data was based on daily search volume, but now it has become monthly search volume, but it doesn’t matter.

Some friends may wonder why not grab the ranking and filtering data first to reduce the pressure of data volume in the next step?

Because the keyword planner can query 1,000 keywords at a time! 100,000 keywords only need to be queried 100 times!

And the actual test proves that the cookie obtained once can be used across days and maintain a valid login for 10+ hours (promise me, please be gentle)

1) Traffic data acquisition

Post keyword data by simulating login

2) Data screening

Each end only retains keywords with search volume >= N (values ​​are customizable)

You can filter data while acquiring it, or you can separate it and do another filtering step. I personally recommend the latter. If the indicators are unreasonable, there is still room for re-screening.

bash shell:

cat file.txt | awk -F, '{$2>=100}' > file_new.txt

5.4 Get keyword ranking

Get the ranking data of each end separately, and only keep

https://www.zhihu.com/question/{question ID}

Under this URL feature, the top 10 keywords and the corresponding question URLs are stored

5.5 Available traffic

Keyword traffic is not equal to the actual traffic that Zhihu question page can obtain

As mentioned earlier, search traffic has another click step before reaching the page, so we should calculate the available traffic. The formula is:

Available traffic = traffic X click rate

The click-through rate is estimated based on the ranking, but Baidu seems to have never released click-through rate data, ahem…

But we found a Google click-through rate data released by Sistrix on July 14, 2020, which analyzed more than 80 million keywords and billions of search results.

Although it is only statistics from the mobile terminal, it is not a big deal

Original text (English):


After calculating the available traffic for each keyword, our vocabulary is complete. Niceee!

6. Zhihu data acquisition

The purpose of data acquisition is to make a preliminary judgment on the difficulty of a problem from the data in these N dimensions (corresponding to 9.1 data screening)

Data is more important than quantity. Too much data will only interfere with judgment.

Question Views

Question attention (Zhihu site traffic)

Issue creation time

Number of answers

Number of Likes for No.1

Number of words in the first answer

First answer time

So far, we have prepared all the basic data we need. Now you should have a keyword file of Baidu + Zhihu data. Good job!

If you have made it this far, I believe I would be very happy to meet a friend like you^_^

7. Data Analysis

7.1 Keyword Grouping

Faced with a large amount of disorganized data, we need to group related keywords and their corresponding question pages together in the form of keyword grouping.

1)jieba participle

Use the python-jieba module to split each keyword into N terms. For example, "流量高手" will be split into "流量" + "高手". Words containing the same term are considered a group.

2) Remove duplicate terms

Refer to the deduplication part of [5.1 Mother Word Acquisition]

3) Term data calculation

Use each term to match the keyword and calculate the number of matching results (term frequency) and the total amount of traffic available

SEO friends may feel familiar with this method. This method is similar to the "inverted index" of search engines. We actually use term as the index to classify Zhihu URLs.

Here's some random demo data:

7.2 Manual Classification

Directly grouping by word is a grouping from the perspective of string, which is simple and crude but lacks semantic relationship

For example, the two features "炒股" and "股票" should belong to the financial category, but they will become two groups when grouped by terms, so they should be manually reviewed in the end.

After the classification is completed, the corresponding word frequency and the total available traffic are added to obtain the total data

Then record it in the form of a mind map/table. Here is an example of a mind map:

But remember, don't group just for the sake of grouping. Terms that have no obvious relevance should not be grouped together, otherwise you will be asking for trouble.

8. Question screening

8.1 Data Filtering

Now we can select a term from the category that can get the most traffic. In the keyword file after we complete [6-7], use Bash shell or search "Keyword column" in Excel-csv to find the keywords containing this term, and then use indicators to filter. The following are a few filtering values ​​for reference only.

Question Views (Assisted)

Question attention (auxiliary)

Issue creation time (auxiliary)

Number of answers <= 50

Number of likes for the first place <= 100

The first answer has less than 800 words

First answer time (Assisted)

Available traffic >= 100

For example, after screening with hard indicators, if the page views of a question page are far lower than the available traffic, the number of followers is small, the question was created recently, and the first person answered recently, then this type of question needs to be marked.

But why? Friends, you might as well think about it yourself

Well, let me tell you, the number of people in each category is limited. If you reverse the above conditions, you may have missed some traffic, so we must be aware of seizing the opportunity.

After the screening is completed, you can sort in descending order according to conditions such as [Available Traffic] or [Number of Likes for the First Answer], and the blue ocean problems will be clear at a glance

8.2 Manual screening

Manual analysis is mainly used to solve content problems that cannot be determined by data, that is, whether the answer of the first person does not meet the requirements of the question. The main types of answers are:

1) Directly meet the needs, but the user's implicit needs are not met, and there is room for expansion


Q: “How often should a car be serviced?”

A: “I usually do maintenance once a quarter.”

A(new): "Different brands of cars have different maintenance times. I will list all brands below, maintenance items, engine oil selection, and maintenance pitfalls."

2) Indirect satisfaction

I just happened to find one, as shown above.

The answer explains the keystroke wizard, but does not give how to write this script

I believe that by now, you have found N problems in N categories, and then immediately start to analyze the problems > make an outline > xxxx…..

Stop! Please stop your behavior immediately, we still have one last step

9. Traffic Tracking

The last step of the Long March is very important, very important, very important

We mentioned 2 points earlier:

In the Baidu promotion backend - Keyword Planner, the data caliber of traffic is monthly and is an estimated value.

SEO page ranking is dynamic

This may cause the results to be unstable. You worked hard to collect data, write answers, and get rankings, but in the end there is no reading volume?

Therefore, we need to monitor how the page views grow to determine whether the page actually gets traffic, how much traffic it can get, and ultimately decide whether to answer these questions.

The monitoring time unit can be days, or more detailed can be every N hours. The monitoring time is determined by everyone. Of course, the longer the monitoring time, the more accurate it is.

For example, if the available traffic for a question is 150,000, then the average available traffic per day is about 5,000, and the available traffic for 3 days (excluding holidays) is 15,000.

Record the page views and compare them. As long as the fluctuation is not too large, it can be included in our answer list.

10. Last

If we raise our perspective to the level of the entire marketing, we will find that blue ocean traffic acquisition is the first step in the entire marketing process. Other parts such as answer ranking, traffic path, and monetization, etc.

There are also many methods and techniques that can help us better utilize blue ocean traffic, such as data cross-calculation, advanced gameplay, etc.

However, to expand on what has been said above is a large part of the content. Due to time and energy constraints, we will talk about it next time.

Author: CashWar

Source: TACE

<<:  5 basic elements of event operation!

>>:  Guangzhou online medical consultation mini program function, how much does it cost to develop an online medical consultation mini program?


How to turn demand into practical product solutions!

Here we assume that the company already has a cer...

June 1st Children's Day marketing plan!

The 2021 Children's Day is coming soon. As th...

95% of entrepreneurs, CEOs and CPs don’t know about SEM promotion

"For any company, SEO and SEM will be import...

Tutorial on Massive Engine Delivery | Delivery Positions and Traffic Audiences

Today I will introduce to you the various placeme...

Beginner operations cat, how to deal with the first demand output!

A year ago, when I first joined a unicorn company ...

What are some low-cost ways to acquire customers for your products?

The secret to success on the Internet is to be ab...

Which specific reports should I look at to analyze Baidu bidding account data?

Recently, I have often come into contact with som...

How to operate a Douyin corporate account without experience and manpower?

With the rise of short video platforms, more and ...