At the WOT "Internet +" era big data technology summit held by 51CTO, Wang Tianqing, chief architect of Madai Finance from Shanghai Kai'an Information Technology Co., Ltd., gave a speech on the theme of "Madai Finance Big Data Platform and Financial Risk Control Practice Case Analysis". This article organizes the highlights of this sharing into text form and presents it to the majority of users:
I am very happy and honored to share with you here today. Our company was established not long ago, and we have done some practices in the direction of big data, as well as some simple cases, which I would like to share with you today as a starting point. Madai Finance comes from CITIC Group and is mainly engaged in Internet consumer finance. In fact, it connects two Ps, one P is the borrower and the other P is the lender. Rich people lend money to those who are short of money. There is a lot of knowledge in this. Of course, the borrower is not necessarily an individual, but also a company. Madai Finance is our online financial management platform, and CTCF is our offline company that deals with these borrowers. Let me briefly talk about the industry background. P2P is already familiar to everyone. From the wild era in 2013 and 2014, it has gradually become formalized this year. Now the People's Bank of China, together with 10 ministries and commissions including the China Banking Regulatory Commission, the China Securities Regulatory Commission, and the China Insurance Regulatory Commission, jointly issued the "Guiding Opinions on Promoting the Healthy Development of Internet Finance". There are four important points in it. The first point is to encourage innovation, the second is to prevent risks, the third is to seek benefits and avoid harm, and the fourth is healthy development. Internet finance has been around for a while, and risk prevention is the lifeblood of every company. For example, banks have a long history and a certain brand premium. Banks are backed by the government, but if an Internet finance company's website is down, the first question everyone will ask is whether it has run away. We discussed with colleagues in the business department and made a simple classification of industry risks. The first is information security, which is basically similar to traditional information security. The second is operational risk, the third is fraud risk, and the fourth is credit risk. From a technical perspective. First, the data types. We want to obtain a lot of data, but we are not a bank, so the data we get from cooperating with them is very small. Second, we also try to get some data from various channels, but the correlation between these data is relatively small. In terms of the characteristics of the data, the value density of each type of data is relatively low, because it is not real data or list data in the true sense, and it must be used in a comprehensive manner. Another is that the types are more complex, for example, it has structured data, database methods, and semi-structured text methods. In data analysis, you need to do real-time analysis and real-time judgment sometimes. In the final analysis, the concept of big data we are talking about has three Vs. The first is that the volume is very large, the second is that there are many types, and the third is that the speed is very fast. The amount of data generated is very large and very fast. In the life cycle of big data, the first step is to obtain more data. This data may be obtained by cooperating with a third party, and the data is provided to us by users. The second step is to store all this data. In fact, each type of data will have a historical version, including basic information of users, and we need to store all this data. The third step is to use data mining algorithms to analyze this data, matrix analysis, association analysis, etc. The fourth step is optimization, because the results of machine algorithm analysis are not necessarily useful, so we need to see what we need to adjust. The last step is to create value. Madai Finance has online and offline businesses. The online business is on the cloud, and the offline business is in the IDC. We use this virtualization platform, and of course we also use Docker now. The core data is in our IDC. Some data accessed by applications on the cloud will go to the cloud and be synchronized to our IDC. For real-time data, we use Kafka and Spark to do this work. First, we collect all application status and performance. We organize some important key data, such as user login time and user withdrawal time, into Kafka. We also use ERP for full-text search, and all actual data is stored on HDMS. We deployed this big data platform in IDC, with HBFS underneath to handle some interactive data. Data is divided into external data and internal data. External data is the billing data submitted by users, as well as some social data and credit data. These data are aggregated into HBFS. Then we have internal systems, including credit systems, accounting systems, and collection systems. They all have their own databases. These data can be regularly synchronized to our HBFS through Saoop. Of course, we will also do some data cleaning and aggregation. ***There are two major application scenarios. One is to do traditional DI, and the other is to use Tez to present a report. We do in-depth analysis and mining, mainly using SAS software, and we also use R/Python. Python has a data mining library, and we will use it directly. ***The results are formed into rules that can be applied to the business system to drive the upgrade of the business system. This is roughly the process. HBFS carries all our data. This is what we now call a platform that supports real-time analysis, P processing, and historical analysis. ***All we have to do is to solve the three major philosophical questions: who are you, where do you come from, and where are you going. In short, risk control is a necessary condition for the success or failure of Internet finance. It is not a sufficient condition, but a necessary condition. In the context of the Internet, data is diverse, massive, and needs to be processed in real time. Once a loss or risk occurs, it is too late. You must make a judgment before the risk occurs. Therefore, establishing a big data platform is a necessary technical means for Internet finance, and this effect cannot be achieved using traditional methods. |
<<: Boys should not choose to be coders casually
>>: Nine blogs to watch for hybrid mobile app developers
In the blink of an eye, 2019 has come to an end. ...
Although a short video is only 60 seconds long, e...
When we think of Qingming What other associations...
Recently, has your circle of friends been flooded...
On December 23, according to the latest statistic...
E-commerce detail page conversion skills worth 50...
What are the criteria for a good title ? This is ...
This time, the editor will continue to share with...
In today's digital age, computer bugs not onl...
When farming on the Internet, traffic is like the...
【Sports Rehabilitation】Introduction to fake wide ...
I'm Zongzi, a post-90s planner who combines ...
Recently, according to feedback from some iOS use...
In the middle of this month, Apple released the o...
Steve Jobs said, I never rely on market research,...