Today I want to talk to you about big data and action prediction models. Why do I want to talk about this? Mashang Consumer Finance is a licensed consumer finance company, which means it has a business license issued by the China Banking Regulatory Commission. We are first of all a startup company, because we are a very small team that started from scratch, and we are also an Internet company, because we are engaged in online business. At the same time, we are a big data company, which is what I want to briefly introduce to you today. Dr. Liu Zhijun, Deputy General Manager of Mashang Finance, was formerly the Senior Director of the Statistics and Analysis Department of Capital One, one of the top five banks in the United States. He has served as the top statistician of Equifax, a well-known credit reporting agency in the United States, and an associate professor at the University of Mississippi. Liu Zhijun holds a Ph.D. from Pennsylvania State University and a bachelor's degree from the University of Science and Technology of China. Our consumer finance business is essentially the same as other consumer finance and Internet companies, but there may be differences in the means. Our business is based on data, including credit data from the central bank's credit bureau, social security data and the Ministry of Public Security data, as well as data on the Internet. This large amount of diverse, high-dimensional, and dynamic data will support the entire business, including product design, marketing, risk control strategies, customer management, and debt collection. Data provides us with a basis for decision-making. Let's talk about the nature of the business. Consumer finance has several characteristics. The first is that it is small-amount. Personal consumer finance cannot be a particularly large amount. The upper limit is set at 200,000. The second is dispersion. We are not like banks that do big business and lend out hundreds of millions of yuan in a large order, so it is more concentrated. We are dispersed and face the people across the country. The third is large-scale. We have 1.4 billion people. Except for minors, all others are potential customers. The fourth is short-term. The predictions we make decisions do not need to be 10 or 20 years, but one or two years, or even a few months. There are three types of problems. One is clustering, which is to classify customers into a category. Another is pattern recognition, which is to set a goal in advance. Another is prediction, which is to predict the behavior of a specific customer based on the data you get. These three types of problems are ultimately prediction problems. Prediction is a very simple problem when it comes down to data or statistics. The problem is very simple to state, but the solution is not that simple. Many practical problems can be divided into binary regression models. For us, for example, risk can be set to 0 and 1, that is, there is or there is not. Specifically, there are two possibilities: I can get back a loan or I can't get it back. In this way, the target variable is called Y, which is 0 and 1. What to use for prediction depends on what data you can collect and how relevant these data are to it. This depends on two conditions: whether you have data or not, and the quality of your data and how relevant it is to the problem you are trying to solve. Now everyone is talking about big data. Everyone has data and thinks it is very valuable. It is indeed very valuable. However, how strong its correlation with various problems is yet to be verified. The stronger the correlation, the greater the value. What is the predicted value? It is a probability. How to set this problem specifically? There is a performance window. The observation value we can predict is at the beginning of the window. For example, when we make risk predictions, we use the data at the time when the customer applies for a loan as a prediction of how he will perform after the loan is issued. We need to observe how long it is appropriate to give him, which depends on your financial products and your specific business. For example, if you are a product with three installments over three months, you don’t need to run it for 12 months. More generally, we have a general regression model. For example, for consumer finance, we can make predictions about the amount of consumption, especially for credit cards. The amount of credit card loans is closely related to profitability. We have a batch of real data, real income, and we use our related variables to predict and estimate this income. This can be used to make a model. In other words, we use one type of data to predict another type of data. In this case, it becomes a regression model. Since it is a regression model, it can be abstracted into a very simple regression model. This model is a conditional expectation, that is, Y is compared with X, X is the data or variable predicted by your L, a conditional expectation. Prediction, that is, in the big data setting, we only care about correlation, not causality. I won't go into details about the modeling methods due to time constraints, but I will list some methods that you have often heard of. The more traditional and intuitive method is the parameter method. To put it simply, the parameter method is to divide your prediction variables into small blocks and look at the average of the observed values of the variable you want to predict on the blocks. It's that simple. Specific problems need to be analyzed specifically. You can only build a good model if you really understand what the problem you want to solve is. In my experience, the best method you will find is hybrid. The so-called hybrid is a model made by combining many different methods. Modeling is important, but how to use the model is actually more important. A better model is better than a general model. There is no specific cutoff value, but we have different risk policies for different risk customer groups. So to apply this model in a complex way, we need to figure out how to optimize it in other dimensions, so that the use of the model will be much better than simple cutting. *** Let me talk about the common problems in modeling. This is indeed a problem for the current domestic situation. There are problems in the United States, but they are not of this type. One is the problem of data coverage. I know that many institutions and many large companies have data, which are very valuable and difficult to share. This causes coverage problems. Each piece of data covers a part, and another piece of data covers another part. This is a problem. The second is that the quality standards are different. It may be data from the same source, but after processing, the standards and quality are different. This causes a large number of missing values and a lot of sample deviations. How to solve this problem is indeed a big problem we face, and I think this is also what should be solved with big data methods. To sum up, the characteristics of consumer finance are particularly suitable for big data as a prediction of behavior. There are many methods, depending on your understanding of the business and the methods. According to your actual situation, you can choose the most suitable method. Usually, there is not only one method, but you create a method by yourself, combining several methods to create a hybrid. Building a model is not the end. The most important part is that your model must be fully verified. Because a very important point here is correlation. If the correlation is not causal, it is very likely that you will not know what happened if the model fails. One day, the model will be useless and you will not know how it happened. Because it is not a problem of causality, but a problem of correlation. Correlation is generated under specific conditions. Once this specific condition is gone, the correlation does not exist. Therefore, verification and stability are very important. Another point is that modeling is important, but application is more important. ***I hope that our data sharing can be promoted faster and more widely, and I also hope that everyone will work together to solve the problems we are currently facing. |
<<: [Bugly practical sharing] Android APP quick pad implementation
>>: #Developer Benefits# Baidu MTC launches Galaxy S7 new phone App compatibility test
On October 9, 2022, there was a very shocking and...
Chapter 1: Learning Taobao Express from scratch 1...
What is unexpected is that in the peak tourist se...
First of all, I want to say that the essence of b...
Building a user growth incentive system can help ...
Changes in consumer main forces and alterations i...
On August 14, China Auto Rental released its 2018...
Preface In the process of developing APP, enginee...
Although Unc0ver has already jailbroken iOS 14, w...
The GAIN Index is published by Anluqin and is a c...
April is almost a third of the way through, and t...
China is the country with the highest lung cancer...
A few days ago, a notice about "mask subsidi...
In the era of super users , if I had to sum it up...
The topic I want to share with you today is: How ...