统计研究 ›› 2024, Vol. 41 ›› Issue (1): 148-156.doi: 10.19343/j.cnki.11–1302/c.2024.01.012

• • 上一篇    

基于网络关系的分类变量预测研究

丁 月 方匡南 兰 伟 徐 顺   

  • 出版日期:2024-01-25 发布日期:2024-01-25

Research on Predicting Discrete Variables Based on Network Relationship

Ding Yue Fang Kuangnan Lan Wei Xu Shun   

  • Online:2024-01-25 Published:2024-01-25

摘要: 传统的预测方法通常基于个体的协变量信息进行建模和预测,少有考虑个体间的网络结构信息。事实上,网络节点间的关联信息能够为节点的响应变量预测提供信息,为此本文提出网络标签传播算法。基于半监督学习框架,以邻接矩阵为节点相似性推断依据,通过节点间的连接信息和已知节点的响应变量信息,来推断未知节点的响应变量信息。该算法适用于响应变量为分类变量的不完整网络数据。在网络服从随机分块模型的设定下,本文证明了该算法能够一致地预测未知节点的响应变量。数值模拟和实证数据分析结果显示,该算法预测效果较好。

关键词: 不完整网络, 网络插补, 网络标签传播, 分类变量, 信用风险评估

Abstract: Traditional prediction methods usually model and predict the responses based on the covariate information, but seldom consider the network connection of individuals. However, the relationship of network nodes can provide information for the prediction of nodal responses. Based on this finding, this study proposes a network label propagation algorithm. Based on the framework of semi-supervised learning, this study takes the adjacency matrix as the basis of nodal similarity inference, then infers the response of unknown nodes through the connection information between nodes and the response of known nodes. The algorithm is suitable for incomplete data responses whose response variables are discrete. Under the assumption that the network follows the stochastic block model, this study proves that the response of unknown nodes can be predicted consistently by the algorithm. Numerical simulation and empirical research show that this algorithm performs well in prediction.

Key words: Incomplete Network, Network Imputation, Network Label Propagation, Categorical Variables, Credit Risk Rating