[51CTO.com original article] The WOT 2016 Big Data Technology Summit hosted by 51CTO.com was held at the Beijing JW Marriott Hotel on November 25-26, 2016. Since 2012, the WOT brand conference has been successfully held for 12 sessions with the concept of "focusing on technology and serving technical personnel". It has not only accumulated a large number of expert resources, but also won the recognition and praise of IT practitioners and technology enthusiasts, and has become an important platform for technology sharing and networking in the industry. At the real-time computing theme session of the WOT2016 Big Data Technology Summit, [Umeng+] CDO Li Danfeng gave a speech entitled "Understanding the Business Code of Big Data from User Behavior Data". After the meeting, reporters interviewed him and he introduced the knowledge related to big data to everyone. Li Danfeng has worked in the field of data analysis and mining in the United States for more than 10 years, including companies at the forefront of data applications such as Yahoo, Microsoft, and FICO. He has accumulated rich practical experience in data mining and machine learning, and the products he has participated in are widely used in finance, insurance, search, Internet advertising, and retail. Reporter: Hello, everyone from 51CTO. This is the WOT2016 Big Data Summit. Sitting next to me is [UMeng+] CDO Li Danfeng. Li Danfeng : Hello everyone, I am Li Danfeng, working as CDO at [Umeng+]. I worked and lived in the United States for 18 years, and returned to China in 2014 to join the Chinese Internet industry. I have worked for several companies in the United States, including the more famous ones like Yahoo and Microsoft, and the slightly smaller ones such as the American credit reporting company FICO and a consulting company. I graduated from Tsinghua University, and after graduation I came to the United States and obtained a doctorate degree from the University of Illinois at Urbana-Champaign (UIUC). Reporter: You have some work experience in the financial industry. What applications do you think big data will have in the financial industry? Li Danfeng : The most important feature of big data in the financial industry is "big". The so-called big refers to a relatively large coverage rate. For [Umeng+], the most important feature is that the coverage is very large. We connect to the data of many companies. [Umeng+] has a coverage rate of more than 80% to 90%. Based on these data, [Umeng+] has explored many fields, including some of our recent attempts in the field of risk control. [Umeng+] The correlation between data and risk control does not seem to be particularly direct on the surface, but in fact, behavioral data is inherently related. For example, what kind of person you are is not determined by your behavioral characteristics in a specific field. Let me give you a simple example: In the United States, if you want to deceive a bank, you should be a citizen with very good credit in the early stage, borrow money and repay it on time, and your credit will get higher and higher. When the credit score is high enough, you can borrow a lot of money and run away with the money. Doing these things in the behavior directly related to risk can deceive the system. But behavioral data actually collects a person's daily behavior. Except for spies, few people can pretend in all aspects. Behavioral data is effective at this time. The characteristics of a person have been understood through data without being noticed. By combining this characteristic with other strongly related data, it can help loan companies make better risk judgments. The so-called behavioral big data here refers to the behavior of natural persons. When the coverage is very large, it can fully play a role in the field of risk control. Reporter: Is user behavior that evades financial risk control recognized by the industry? Li Danfeng : Recently, [Umeng+] and Rong360 jointly modeled and launched a risk control scoring product based on Internet users' behavior. The acceptance of the product is quite good. In the case of personal privacy avoidance, users can sign an agreement for each loan: for example, many P2P companies require users to provide Taobao accounts to understand some basic data for model building. The risk of users allowing behavioral data to make judgments at this level is not great. As for the current market acceptance, [Umeng+] is also waiting to see. Based on the product jointly modeled by [Umeng+] and Rong360, users are also testing it. Of course, risk control is not immediate. We have the patience to wait for a while. Maybe this time next year, we will have a better judgment. Reporter: Data may be inseparable from the analysis of user behavior. What value does user behavior data have in the overall industry? Li Danfeng : [Umeng+]'s exploration in the field of risk control is of great significance and value! Behavioral data is data that is difficult for users to falsify, and some inadvertent behaviors can easily reflect the nature of users. The most obvious application of behavioral data is in the field of advertising and marketing. We also look forward to good results in the financial field! We also hope that these data can play a big role in other scenarios. For example, in traditional fields, we help traditional companies to manage their customers and better understand their customers. This year, we tried to make some predictions on user life cycle and user churn, and [Umeng+] will continue to work in this field. Back to the original sentence, in China, [Umeng+] has the largest volume of user behavior data. [Umeng+] has the responsibility to tap the value of data and apply it to business to help companies better manage data. Reporter: How can we better highlight the value and efficiency of data analysis? Li Danfeng : First of all, we need to find an application outlet. Relying on pure analytical data without actual application is putting the cart before the horse. The so-called actual application is to form a product with a large customer base, rather than in a consulting way, because each customer has different needs. This is not the purpose of [Umeng+]. [Umeng+] hopes to form a platform product through the existing data volume, and many users will benefit from the product. This is the core of value mining. Reporter: Now we talk about data mining. When we mention data mining, we will also mention machine learning. What is the difference between the two? Li Danfeng : I think there is no difference between the two. Everyone has slightly different definitions of machine learning, artificial intelligence, and data mining. In actual applications, there is no essential difference in the methods and systems used. Our position is called data scientist. We should be suitable for data mining, machine learning, and artificial intelligence. It is rare to say that you are a data mining scientist or an artificial intelligence scientist. I personally think there is no difference between them. Reporter: How does [Umeng+] achieve accurate analysis of massive data? Li Danfeng : [Umeng+] precision analysis is divided into several levels, and the definition of precision is different. For example, I often read financial news, but I never click on financial ads. For an advertiser, I am a person who has no interest in financial ads. But for a news media, I am very interested in the financial category, so the definition of precision is completely different in these two scenarios. For news media, defining me as being interested in finance is very accurate. It is different for advertising. We need to distinguish different scenarios when mining data. For data science, the most important point of precision is to accurately define the goal. The objective function is a very important point before making any model. In different scenarios, different goals are defined, and there will be different training data to train the model. So in the future, [Umeng+] will do better in precision in this regard. There are other precision analyses, including short-term behavior and long-term behavior. [Umeng+] The benefit of precision is that we can use some relatively reliable labeled data. We can make better models with these relatively reliable labeled data, including gender and age, which are obtained through these very reliable data sources. Of course, the most important thing is: whether it is data or volume, define the actual meaning of goals and precision, so as to treat them differently. Different scenarios have different definitions of precision. Reporter: [Umeng+] In addition to some differences in precise analysis compared with other platforms, what are the biggest advantages? Li Danfeng : [Umeng+]*** has three advantages:
Reporter: Thank you Mr. Li Danfeng for the exclusive interview we have today. [51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites] |
<<: Writing testable JavaScript code
>>: Application of Image Technology in Live Broadcasting (Part 1) - Beauty Technology
As an important pillar of South Korea, the amount...
A good set of teeth affects not only your appeara...
On October 24, a bus driver took passengers from ...
It’s the end of the year again, and it’s time for...
Massive information flow is the first tier of the...
On July 9, the 2020 Shanghai Unified College Entr...
When it comes to domestic compact SUVs priced aro...
The persistence of domestic mobile phones in stri...
As an adult, who hasn’t experienced lower back pa...
Toyota Motor President Akio Toyoda said on Wednes...
Caiyou Academy · Douyin 0 basic short video pract...
Author's Note At the request of a friend, I w...
From January to now, the entire short video indus...
The 2018 Russia World Cup will open at 23:00 Beij...
Many people have written to me and asked me, what...