统计研究

• 论文 • 上一篇    下一篇

数据挖掘模型在小企业主信用评分领域的应用

王磊等   

  • 出版日期:2014-10-15 发布日期:2014-10-14

Application of Data Mining Models in Credit Scoring for Small Business Owners

Wang Lei et al.   

  • Online:2014-10-15 Published:2014-10-14

摘要: 国际经验表明,信用评分技术可较好地解决小企业贷款高成本、高风险及信息不对称难题。本文广泛选取了可适用于小企业主信用评分领域的12个数据挖掘模型(包括本文的改进模型门限Logistic),并以3个银行微观客户数据集(样本量分别为30488、1000和700)为案例,通过10折交叉验证和预期分类错误成本的方式,检验了这些模型的综合信用评分能力。分析结果及稳健性检验表明,本文改进的门限Logistic模型在模型预测能力及预期错误分类成本等多方面均表现优秀;而基于决策树的组合方法也表现良好。本研究对国内商业银行建立合适的小企业主贷款信用评分模型具有参考意义;而该类模型的实施可推动银行微观金融统计进而宏观政府金融统计工作的完善。

关键词: 数据挖掘, 门限Logistic, 小企业主, 信用评分

Abstract: As an international experience, credit scoring technology can effectively solve the problems of small business loans, such as high cost, high risk and asymmetric information. This paper selected 12 data mining models (including the threshold Logistic model which was improved by this paper) which may be suitable for the topic. Three banks’ microscopic customer data sets (sample size was 30488,1000 and 700 respectively) were employed in the case study. This paper assessed the performance of the 12 credit scoring models by using 10-fold cross validation and the expected misclassification costs methods. Analysis results and robustness tests showed that the improved threshold Logistic model outperforms other approaches while the combination methods based on decision trees also performs well. This paper is useful for the domestic commercial banks to establish appropriate credit scoring models for small business owners loan. The implementation of such models can be expected to promote the micro-finance data statistics, and then the macro-government finance statistics.

Key words: Data Mining, Threshold Logistic, Small Business Owners, Credit Scoring