Identify the three levels of channel fraud from operational data!

Identify the three levels of channel fraud from operational data!

How to judge the quality and cunningness of the channel, how to judge whether the user is real, whether it comes from a wall, whether it is machine-brushed or flesh-brushed, is the most headache for the operation staff and even the whole company, because it involves cost, and cost is a matter of life and death, which is not small.

So how do we identify real and fake users and manage channels effectively? From an operational perspective, the best strategy would of course be to pay channels based on the comprehensive value of the large number of users. What is the comprehensive value of a user? It is the comprehensive value that the user creates for the company. The value includes but is not limited to:

  • Direct profit value : such as purchasing behavior, generating income and potential profit;
  • Content value : For example, it generates positive, high-quality content and indirectly creates value.
  • Communication value : For example, the communication of his/her product has triggered the use of other users, etc.
  • Derived value : length of stay or attention, advertising revenue generated, etc.

Each App is in a different industry, and these four values ​​have different emphases. But why do we, the App operator , never settle with the channel according to this logic, but instead negotiate and settle in the most deceptive way such as download and activation? This is another question, which we will discuss in another article.

Because the time window of activation as the settlement time node is too short for us, it brings huge challenges to distinguish the authenticity of users.

In the face of this huge challenge, different levels of cheating protection have emerged.

 

The first realm: being wise after the event, being like a pig before the event

To quantify this level a little, it takes more than 7 days to detect cheating, and about 40% of fake users can be detected, but the fraud of more than half of the others cannot be ensured. In layman's terms, you know you have been cheated, but you don't know how much. Their methods generally go like this:

A look at retention rate

Based on their long-term experience, they found that channels that increase traffic will choose to import user data at important time points such as the next day, 7 days, and 30 days . Then we found that the data of APP at key time points such as the next day, 7th day and 30th day were significantly higher than those at other time points. The retention curve of real users is a smooth exponential decay curve. If they find that the retention curve has abnormal fluctuations with sharp rises and falls, it basically means that the channel has intervened in the data.

Second, look at the user terminal information

Low-priced device ranking : Device ranking of new users or startup users in the channel based on experience analysis. If they find that a low-priced device is ranked abnormally high, they will regard it as an anomaly and start to report it to the police.

The proportion of new versions of operating systems : After years of channel devastation, operators finally discovered that many channel volume-boosting studios have delays in adapting the operating system version. When viewing the operating systems of channel users, you can compare them with the distribution of operating systems of all mobile Internet users.

  • The distribution and regularity of registered nicknames. Many low-end fake registered nicknames have strong regularity. All operators must have encountered such a situation.
  • The distribution of the registered mobile phone numbers by location is something that I think you have all encountered. The mobile phone numbers of users coming from a certain channel are not only from a certain city of a certain operator, but they are even consecutive mobile phone numbers.
  • Wi-Fi network usage: for example, whether the usage ratio of 2G, 3G, and 4G is normal, etc.

Summary: Operators in this realm rely heavily on personal experience, their tools and methods are unprofessional, their operations are inefficient, and they waste manpower and material resources. However, they discover problems late, and slightly more advanced cheating behaviors cannot be detected.

The second level - mend the fold after the sheep have been lost, losses are inevitable

For players at this level, it takes about 2-7 days to distinguish the real from the fake, and the fake volume identified is about 40%-70%. In layman's terms, they can identify a large portion of fake users with a relatively certainty, and if the business conditions are good, the scope of losses can be controlled. The reason for this effect is that they adopted some professional means:

A single indicator

  • IP: whether it is a blacklist IP or a proxy IP, and compare it with a huge blacklist database;
  • IMEI: whether it is a blacklist IP;
  • Mobile phone number: whether the number is illegal or on the blacklist;

Second group indicators

  • IP: Whether the geographical distribution of user IPs conforms to the distribution of prior data. The geographical distribution includes the domestic provincial distribution and the overseas market distribution;
  • IMEI: Whether the geographical distribution of user IMEI numbers conforms to the distribution of prior data, and whether the distribution of manufacturers represented by IMEI is random;
  • OS: Whether the distribution of the operating system version of the channel conforms to a certain degree of randomness and statistics, and is compared with the previous prior data;
  • Model: whether the model distribution is consistent with prior data and the proportion of the latest smartphone shipments;
  • Location information: The ratio of location information opening and the geographical distribution ratio of location information obtained are consistent with the distribution of prior data, the geographical situation promised by the channel, and the actual distribution of applications;
  • Operator: Is the data distribution of the operator random? Does it conform to the normal distribution of domestic operators and the random distribution of overseas operators?
  • Network access method: whether the distribution ratio of Wi-Fi, 2G, 3G, and 4G maintains the same trend and data characteristics as the prior data;

Three information consistency

Verification of device consistency, including: CPU, manufacturer, MAC address, IMEI, model, and operating system consistency verification ;

Generally speaking, general statistical analysis tools are not able to achieve the above distribution, which requires the use of professional cheating protection software or security software.

Operators at this level have gotten rid of the stage of manual processing and reliance on personal experience, and have instead embarked on a professional route of algorithms and data. The problem is that each company’s algorithm capabilities and data accumulation vary widely, so the effects vary greatly. The channels, in turn, speculate on the strength of each operator’s capabilities and add different proportions of fake volume to them.

The third realm - cut off immediately, zero loss can be expected

For players at this level, the time required to distinguish whether a user is genuine or fake ranges from 15 minutes to 30 minutes. If the time window is extended to 24 hours, their confidence will be even greater. So how do they do it? To sum it up in one sentence: use both soft and hard tactics, offense is the best defense!

A hard measure

The so-called hard means are to work hard on the hardware on the user side. The SDK installed on the user's hardware device actively sniffs to detect changes in the hardware environment, abnormalities in the operating system environment, and whether various application interfaces have been hijacked. Get the status of the user's smart device at the first time. When an abnormality occurs in the user's mobile phone hardware and system environment, all the user's data will be immediately and carefully reviewed in the background.

  • Track the status of its IP and port, as well as the historical behavior of the IP;
  • Track the correspondence between IMEI and IMSI . IMEI and IMSI can basically be understood as the correspondence between a key and a lock. It is easy to forge an IMEI number, but if you want to forge it together with the IMSI, the cost becomes extremely high. The whole process can be controlled within 15-30 minutes, which creates a sufficient time window to compete with channels.

Second soft means

Based on the second realm, they discovered through long-term data accumulation and research that the information fed back by each indicator has different strengths, so they assigned values ​​to all indicators and let these indicators cheat on the group of users. When the cumulative value exceeds the threshold, the user is declared a suspicious user. In this way, additional judgment on the authenticity of the user can be completed within 24 hours.

To reach this level, not only long-term accumulation of large amounts of data and updates that keep pace with the times are required, but also powerful data processing algorithms and technical capabilities that cover both software and hardware . If this article gets over 10,000 views and over 100 likes, we will disclose more detailed information to you about the technology of each company.

<<:  First Prize for Primary School Student's Cancer Research Paper Revoked (Attached with Full Notice)

>>:  In an arena where monetization is difficult, how do these self-media get tens of millions of exposures?

Recommend

2022, new consumer brand marketing strategy!

Only by binding yourself to a growing ecosystem a...

Creative analysis of advertising on Zhihu platform!

If Toutiao is an information engine for intellige...

Are App user reviews worth reading?

Are app user reviews worth reading? What have I l...

Live broadcast room retention and conversion skills: live broadcast props

What I’m going to share with you today is “ Live ...

APP promotion: Where do new users come from and how to acquire them?

A store without customers will close, and a produ...

How to motivate users to participate in activities?

I believe that many friends will be troubled by t...

Bleder tutorial takes off in 2021 [good quality and material]

Bleder tutorial takes off in 2021 [good quality a...

How did the first batch of seed users of the product come from?

Many startup products are struggling to find the ...

Which industries are suitable for information flow?

Question 1: I have a question: Regarding the prob...