统计研究 ›› 2018, Vol. 35 ›› Issue (12): 92-101.doi: 10.19343/j.cnki.11-1302/c.2018.12.008

• • 上一篇    下一篇

基于多源数据融合的个人信用评分研究

方匡南 赵梦峦   

  • 出版日期:2018-12-25 发布日期:2018-12-28

A Study on Credit Scoring Based on Multi-source Data Integration

Fang Kuangnan & Zhao Mengluan   

  • Online:2018-12-25 Published:2018-12-28

摘要: 随着信息技术的发展,数据来源越来越多,一方面可以更加精准、科学地刻画个人信用状况,但另一方面,由于数据来源多、结构复杂等问题,对传统的征信技术带来了挑战。本文提出了基于多源数据融合的个人信用模型,可以同时对多个数据集进行建模和变量选择,同时考虑了数据集间的相似性和异质性。通过模拟实验发现,本文所提出的整合模型在变量选择和分类效果方面都具有明显的优势。最后,将整合模型应用于城市和农村两个数据集的个人信用评分中。

关键词: 多源数据, 整合分析, logistic回归, 信用评分

Abstract: With the development of internet technology, data sources become diversified. It is possible to get more accurate personal credit status on one hand, but on the other hand, due to multi data sources and complicated data structure, it is a great challenge to the traditional credit collection techniques. This paper proposes a new credit scoring model based on multi-source data integration. It can simultaneously build up models and select variables using multiple data sets, taking stock of the homogeneity and heterogeneity of the data sets, but also considering the similarity between the data sets. It is found in the simulation that, the integrated model proposed has a significant advantage in both variable selection and effective classification. Finally, the urban and rural data sets in China are applied to the integrated personal credit scoring model.

Key words: Multi-source Data, Integrative Analysis, Logistic Regression, Credit Scoring