统计研究 ›› 2018, Vol. 35 ›› Issue (11): 93-104.doi: 10.19343/j.cnki.11-1302/c.2018.11.008

• • 上一篇    下一篇

基于分层模型的缺失数据插补方法研究

于力超 金勇进   

  • 出版日期:2018-11-25 发布日期:2018-11-23

Research on Comparison of Missing Data Imputation Methods Based on Multilevel Models

Yu Lichao & Jin Yongjin   

  • Online:2018-11-25 Published:2018-11-23

摘要: 大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。

关键词: 分层结构数据, 多重插补法, Gibbs算法, 固定效应模型, 随机效应模型

Abstract: Complicated sampling design are usually used in sample surveys to get multilevel survey data with hierarchical nested structures. Missing data problem is often encountered in sample surveys, however, research on imputation strategies for the multilevel structures that are often found in complex survey data is limited. In this dissertation, it tries to use Gibbs algorithm to draw imputation values for multilevel missing data, and uses fixed effect imputation model and random effect imputation model in the process of multiple imputation.Through theoretical derivation and computer simulation, under different circumstance (includingmissingness rate,intraclass correlation and cluster size, etc), it compares the result of parameter estimation from the aspects of unbiasedness and effectiveness, and also gives the selection method of imputation model.The results show that when the random effect model is used as imputation model, the estimation results are more accurate, while the fixed effect model is easy to operate. When missingness rate is small, the intraclass correlation is large and the cluster size is large, the fixed effect imputation model can be adopted, otherwise the random effect imputation model is recommended.

Key words: Multilevel structure data, Multiple imputation method, Gibbs algorithm, Fixed effect model, Random effect model