Data Thief: How were Xiaomi and Pinduoduo’s e-commerce data sold to Wall Street?

Data Thief: How were Xiaomi and Pinduoduo’s e-commerce data sold to Wall Street?

Just before Xiaomi went public, a Chinese fund manager on Wall Street opened an unread email that read:

"Want to know Xiaomi's sales data? We provide real-time data, classified by brand and product. Online data is obtained from Tmall and JD.com; product data includes mobile phones, sweeping robots, etc. In addition, we also provide a comparison between Xiaomi and other brand manufacturers. If you are interested, please click to reply."

The mysterious email instantly piqued the fund manager's interest.

After all, Wall Street is playing an information game. In the stock market, whoever can get the news in advance can make arrangements in advance, thus generating "alpha" (excess returns).

Just as he was wondering where the email came from, his eyes happened to glance at the signature: Sandalwood.

Log on to Sandalwood's official website, and the company's main business looks like investment consulting. In fact, Sandalwood is a data trader.

01 “Data Thief”

Since the beginning of this year, more and more investors involved in Chinese stocks have begun to hear about and talk about the mysterious existence of the "data thief".

In a narrow sense, the so-called "data thieves" refer to "alternative data companies" that use specific Internet network technologies to obtain sales data of relevant listed companies from e-commerce companies' platforms, and then "clean" and organize these data before selling them to institutional investors.

In their workflow, there are mainly several steps: collecting data, cleaning data, analyzing data, and selling data.

As one of the typical representatives, Sandalwood was founded in 2015 by a Chinese named Tony and claims to be one of the largest listed company data companies in Asia.


Sandalwood is just one of the players with alternative data

In addition to actively crawling data from relevant e-commerce platforms, Sandalwood's main job is to buy native data or cleaned data from multiple raw data companies for sale.

Unlike those junior "data thieves", Sandalwood itself is a data platform, and claims that customers can access 7 unique data sources through this platform.

Sandalwood's clients are typically the buy side of the capital markets, the funds that invest in stocks and bonds on behalf of others - the most active participants in the search for excess returns.

To be more specific, Sandalwood's most valued clients are quantitative funds in U.S. hedge funds, which value and use data to generate "alpha".

As we all know, Wall Street has never stopped pursuing "alpha". In the past 150 years, the source of "alpha" has changed every 10-20 years on average.

In the 1950s, the first hedge funds invented long/short stock strategies. In the 1980s, math and computers were more powerful than handheld calculators. In the early 2000s, alpha was high-frequency trading.

These strategies or tools once gave those who were able to use them first an advantage over others. But as they became more common, their advantages disappeared, and investors had to find new strategies and tools.

It now appears that in today’s digital economy, using the unique information content that is hidden in the vast amounts of data, previously unknown to the financial markets, is the next source of “alpha”.

[[236356]]

Clients want to gain an edge from data, which places greater demands on the alternative data companies Sandalwood represents. They need data to be faster or more accurate than what is currently used, or it must provide unique insights that were not previously available.

Common data collection methods used by Sandalwood include web crawlers, credit card tracking, email cracking, geolocation software, satellites, mainstream APP applications, etc. We will reveal these one by one below.

However, in order to differentiate themselves from their competitors, all “data thieves” must desperately seek faster and more accurate data sources. To this end, some of these practitioners choose to trade directly with e-commerce platforms to obtain first-hand product sales data.

Where there is profit, there is trade. For data traders, what could be better than direct data obtained from e-commerce platforms? For e-commerce platforms, sitting on a huge amount of invaluable e-commerce data, why not monetize it?

With first-hand leading data, data traders can easily beat other competitors and help hedge funds gain "alpha".

02 Risks: Insider Trading and Privacy Protection

Hedge funds often spend a fortune on such useful information, and the annual fees of hundreds of thousands of dollars for data companies seem to be no problem.

JPMorgan Chase estimates that the investment management industry's spending on big data is between $2 billion and $3 billion, and the annual growth rate of this figure is running at a double-digit rate.

Is such a hot digital trading industry legal?

Previously, the U.S. Securities and Exchange Commission (SEC) had successfully prosecuted an insider trading case involving a data company. The case involved two data analysts who obtained material non-public information by analyzing credit card transactions.

Because they gained access without the consent of the data owners, they were prosecuted for insider trading and forced to pay more than $18 million in fines.

The SEC accused two data analysts employed by Capital One of searching a proprietary database of credit card transactions for at least 170 public companies between November 2013 and January 2015. The defendants, Bonan Huang and Nan Huang, used the data to trade stocks using options before the earnings reports of public companies were released.

Insider trading refers to the act of insiders who are aware of inside information of securities transactions and those who illegally obtain inside information using the inside information to buy or sell securities themselves, recommend others to buy or sell securities, or disclose inside information to allow others to use the information to buy or sell securities in order to make profits or avoid losses.

This involves several key concepts, including materiality, dissemination, and fiduciary duty, which can affect market value.

The risk of insider trading in the data trading industry chain lies in the fact that a considerable amount of data can bring advantages, that is, it can generate information that affects market value.

The problem is that because the data sets need to be purchased, some institutions have the channels to purchase them, but ordinary investors cannot access this information.

So while in theory data sets can be collected and purchased publicly, in reality this is not the case. Therefore, in some cases, if certain data is used or sold, especially sales data of listed companies before their quarterly reports are released, it will be suspected of insider trading.

In the United States, a conviction for insider trading requires not only proving that the information was material and non-public, but also proving a breach of fiduciary duty, meaning that the information was obtained without the owner's consent.

That condition is rarely met because many phone and credit card companies include clauses in their contracts that allow them to sell information. But as more data becomes available, the likelihood of inadequate consent increases, raising the risk of breaches of fiduciary duties.

In Europe, while this is not required to prove insider trading, the standards are higher in other areas.

Beyond that, privacy is a bigger issue -- have you forgotten the one that's still weighing on Facebook (NASDAQ:FB)'s (FB) mind?

03 Data collection methods: from crawlers to satellites

In order to win the favor of buyers, data companies use every possible means to collect data, including at least: web crawlers, credit card tracking, email cracking, geolocation software, satellites, mainstream APP applications...

【1】Crawler data

Web crawlers are a commonly used method for collecting data. Many raw data collection companies search for potentially valuable information on public websites, social media, online communities, and email plugins.

For example, from downloading apps and user reviews, to airlines and hotels receiving bookings through ticketing websites, and through social media sites, getting hints about consumer opinions and trends.

Web crawlers can track everything from price trends for groceries to car sales, and analysts can assess new product launches and product life cycles by scraping product reviews on consumer websites.

For example, the data sales company called Thinknum shown in the figure below not only provides leading data of many US listed companies, but also provides related investment analysis services:

Including Tesla car inventory data and interaction data with users on various social networks.

As well as the data of Xiaomi's social networking site, which was just listed in Hong Kong. Of course, this is the interface of the free version, and I believe the content provided in the paid version will be richer.

This data analysis company, Yipit Data, not only covers a number of listed Chinese concept stocks, but also has data on Pinduoduo, which just submitted its prospectus!

【2】Credit card tracking data

Another important data source is the tracking of consumer credit cards, which can directly show the real identity information of consumers and the products they spend money on.

Although it can only depict local sales trends, combined with other data sets, they can provide very important judgment basis for institutional investors.

Credit card companies have thus become a gold mine. Credit card transaction data is one of the most valuable market segments and a top indicator of revenue for consumer companies.

【3】Exhaust data

Data exhaust refers to data that is a byproduct of a company’s record keeping. Many technology companies generate data exhaust as a byproduct of their core activities, such as bank records, supermarket scanner data or supply chain data.

This data exhaust is generated by a number of storable options, actions and preferences such as log files, plugins, temporary files, and even information generated for each process or transaction completed digitally.

The most valuable of these is interface exhaust. This refers to the data interface that the website used before, but later it was no longer used and was not deleted. Some data companies can access these interfaces and obtain the data of listed companies.

【4】Geolocation information

Smartphones are equipped with location services that allow us to use map or weather features, but also let mobile operators know where we are at all times.

This data is extremely valuable to institutional investors who want to understand what stores, hotels or restaurants we are visiting and look for clues to consumer trends.

【5】Sensor and satellite data

Whether it is from satellites, smartphones, the Internet of Things, or others, sensor data is the fastest growing and increasingly valuable alternative data. Sensor data includes satellite imagery data, pedestrian and car traffic, and ship positions.

Sensor data is often unstructured and is a much larger stream of data than that generated by an individual or process. Satellite imaging is probably the most common example, but geolocation data is increasingly important as it is used to track foot traffic in retail stores.

Sensor data will become increasingly important as the Internet of Things, particularly the embedding of microprocessors and networking technologies into personal and commercial electronic devices, becomes more pervasive.

After all this, you may still be curious about how the fund manager at the beginning dealt with the data theft and the "alternative data". According to his statement, he printed out the email that night and put it in the folder for the meeting the next day.

<<:  When there is only one Java programmer left in the world

>>:  Practice of Android automated page speed measurement in Meituan

Recommend

Hong Raiders Trend Season Hunting Hunting C Intensive Training Camp

Hong Raiders Trend Season Hunting Hunting C Stren...

What are the techniques for KOL operation and promotion?

The development of KOLs has been accompanied by t...

Written on the day Wen Xin Yi Yan was released

I was on a business trip recently because of the ...

How to make your product spread wildly?

Generally, marketing will plan a series of activi...

Analysis of Douyin short video competitors!

Analysis of Douyin short video competitors 1. Bac...

[Case] ​​How to create video ads that better understand young people?

The trend of short videos has not only ignited th...

How to write a complete event planning proposal?

A complete event planning plan includes event the...

Many people don’t know all these things about wine!

Autumn is the season for grape harvest. Grapes ca...

Why does rhinitis get worse in winter in the north?

Speaking of rhinitis, it is estimated that many p...