统计研究 ›› 2021, Vol. 38 ›› Issue (2): 99-113.doi: 10.19343/j.cnki.11-1302/c.2021.02.008

• • 上一篇    下一篇

基于整合治愈率模型的信贷违约时点预测

范新妍 方匡南 郑陈璐 张志远   

  • 出版日期:2021-02-25 发布日期:2021-02-25

Prediction of Credit Default Point Based on Integrative Cure Rate Model

Fan Xinyan Fang Kuangnan Zheng Chenlu Zhang Zhiyuan   

  • Online:2021-02-25 Published:2021-02-25

摘要: 传统信用评分方法主要利用统计分类方法,只能预测借款人是否会发生违约,但不能预测违约发生的时点。治愈率模型是二分类和生存分析的混合模型,不仅可以预测是否会发生违约,而且可以预测违约发生的时点,比传统二分类方法可以提供更多的信息。另外,随着大数据的发展,数据源越来越多,针对相同或者相似任务,可以收集到多个数据集,本文提出了融合多源数据的整合治愈率模型,可以对多个数据集同时建模和估计参数,通过复合惩罚函数进行组间和组内双层变量选择,并通过促进两个子模型回归系数符号相同,提高模型的可解释性。通过数值模拟发现,所提方法在变量选择和参数估计上均有明显优势。最后,将所提方法应用于信用贷款的违约时点预测中,模型表现良好。

关键词: 多源数据, 整合治愈率模型, 违约日期预测, 信用评分

Abstract: Traditional credit scoring method, based on statistical classification, can only predict whether an applicant will default in the future, but cannot predict when he is likely to default. The cure rate model,which incorporates two submodels—binary classification and survival model, can predict not only whether a default will occur but also when it will occur. Furthermore, with the development of big data, more and more data sources have emerged. One can collect multiple data sets for the same or similar tasks. Motivated by this,an integrative cure rate model with multi-source data has been proposed in this paper, which can simultaneously model on multiple datasets and estimate parameters. Composite penalty function is adopted to select important groups as well as important members of those groups. Similarity in signs of two submodels’ coefficients is promoted to improve the interpretability of the model. Numerical simulation shows the obvious advantages of our proposal in both variable selection and parameter estimation. Finally, the proposed method is applied to the default point prediction and performs well.

Key words: Multi-source Data, Integrative Cure Rate Model, Default Point Prediction, Credit Scoring