数据挖掘模型在小企业主信用评分领域的应用

统计研究

数据挖掘模型在小企业主信用评分领域的应用

王磊等

出版日期:2014-10-15 发布日期:2014-10-14

Application of Data Mining Models in Credit Scoring for Small Business Owners

Wang Lei et al.

Online:2014-10-15 Published:2014-10-14

摘要/Abstract

摘要： 国际经验表明，信用评分技术可较好地解决小企业贷款高成本、高风险及信息不对称难题。本文广泛选取了可适用于小企业主信用评分领域的12个数据挖掘模型（包括本文的改进模型门限Logistic），并以3个银行微观客户数据集(样本量分别为30488、1000和700)为案例，通过10折交叉验证和预期分类错误成本的方式，检验了这些模型的综合信用评分能力。分析结果及稳健性检验表明，本文改进的门限Logistic模型在模型预测能力及预期错误分类成本等多方面均表现优秀；而基于决策树的组合方法也表现良好。本研究对国内商业银行建立合适的小企业主贷款信用评分模型具有参考意义；而该类模型的实施可推动银行微观金融统计进而宏观政府金融统计工作的完善。

关键词: 数据挖掘, 门限Logistic, 小企业主, 信用评分

Abstract: As an international experience, credit scoring technology can effectively solve the problems of small business loans, such as high cost, high risk and asymmetric information. This paper selected 12 data mining models (including the threshold Logistic model which was improved by this paper) which may be suitable for the topic. Three banks’ microscopic customer data sets (sample size was 30488,1000 and 700 respectively) were employed in the case study. This paper assessed the performance of the 12 credit scoring models by using 10-fold cross validation and the expected misclassification costs methods. Analysis results and robustness tests showed that the improved threshold Logistic model outperforms other approaches while the combination methods based on decision trees also performs well. This paper is useful for the domestic commercial banks to establish appropriate credit scoring models for small business owners loan. The implementation of such models can be expected to promote the micro-finance data statistics, and then the macro-government finance statistics.

Key words: Data Mining, Threshold Logistic, Small Business Owners, Credit Scoring

王磊等. 数据挖掘模型在小企业主信用评分领域的应用 [J]. 统计研究, 2014, 31(10): 89-98.

Wang Lei et al.. Application of Data Mining Models in Credit Scoring for Small Business Owners[J]. Statistical Research, 2014, 31(10): 89-98.

[1]	张峰等. 工业取用水监测奇异数据挖掘与重构方法[J]. 统计研究, 2019, 36(9): 68-.
[2]	黎春周振宇. 信用评分模型中拒绝推断问题研究：基于半监督协同训练法的改进[J]. 统计研究, 2019, 36(9): 82-.
[3]	方匡南赵梦峦. 基于多源数据融合的个人信用评分研究 [J]. 统计研究, 2018, 35(12): 92-101.
[4]	王小燕等. Logistic回归的双层变量选择研究[J]. 统计研究, 2014, 31(9): 107-112.
[5]	黄恒君漆威. 海量半结构化数据采集、存储及分析——基于实时空气质量数据处理的实践[J]. 统计研究, 2014, 31(5): 10-16.
[6]	李卉等. 大数据在我国高速公路超限问题研究中的应用初探[J]. 统计研究, 2014, 31(10): 70-73.
[7]	刘云霞曾五一. 关于综合利用Benford法则与其他方法评估统计数据质量的进一步研究[J]. 统计研究, 2013, 30(8): 3-9.
[8]	谢佳斌金勇进谢邦昌. 数据挖掘方法应用于调查数据的抽样权重问题[J]. 统计研究, 2009, 26(4): 101-104.
[9]	何海鹰朱建平谢邦昌. 证券投资意识调查分析 —基于数据挖掘的视角 [J]. 统计研究, 2008, 25(9): 49-53.
[10]	殷瑞飞, 朱建平. 数据挖掘中一种新的聚类方法——基于对应分析与因子旋转 [J]. 统计研究, 2008, 25(1): 93-97.
[11]	朱建平来升强. 流式数据挖掘的现状及统计的研究趋势[J]. 统计研究, 2007, 24(7): 84-87.
[12]	刘云霞曾五一. 数据挖掘中基于可辨识矩阵的连续属性离散化方法[J]. 统计研究, 2007, 24(4): 8-10.
[13]	来升强;朱建平. 数据挖掘中高维定性数据的粗糙集聚类 [J]. 统计研究, 2005, 22(8): 56-5.
[14]	石庆焱 . 一个基于神经网络——Logistic回归的混合两阶段个人信用评分模型研究[J]. 统计研究, 2005, 22(5): 45-5.
[15]	滕广青, 毛英爽. 国外数据挖掘应用研究与发展分析[J]. 统计研究, 2005, 22(12): 68-3.