统计研究 ›› 2019, Vol. 36 ›› Issue (1): 104-114.doi: 10.19343/j.cnki.11-1302/c.2019.01.009

• • 上一篇    下一篇

零膨胀计数数据的联合建模及变量选择

胡亚南 田茂再   

  • 出版日期:2019-01-25 发布日期:2019-01-16

Joint Modeling and Variable Selection from Zero-Inflated Count Data

Hu Yanan & Tian Maozai   

  • Online:2019-01-25 Published:2019-01-16

摘要: 零膨胀计数数据破坏了泊松分布的方差-均值关系,可由取值服从泊松分布的数据和取值为零(退化分布)的数据各占一定比例所构成的混合分布所解释。本文基于自适应弹性网技术, 研究了零膨胀计数数据的联合建模及变量选择问题.对于零膨胀泊松分布,引入潜变量,构造出零膨胀泊松模型的完全似然, 其中由零膨胀部分和泊松部分两项组成.考虑到协变量可能存在共线性和稀疏性,通过对似然函数加自适应弹性网惩罚得到目标函数,然后利用EM算法得到回归系数的稀疏估计量,并用贝叶斯信息准则BIC来确定最优调节参数.本文也给出了估计量的大样本性质的理论证明和模拟研究,最后把所提出的方法应用到实际问题中。

关键词: 零膨胀泊松模型, 变量选择, 联合建模

Abstract: Zero-inflated count data damage the mean-variance relation in Poisson distribution, which can be explained by the mixture distribution composed pro rata of data subject to Poisson distribution and zero-valued observations (degradation distribution). This paper studies the joint modeling and variable selection from zero-inflated count data based on the adaptive elastic-net technique. As to the zero-inflated Poisson distribution, some latent variables are induced into constructing a complete likelihood of the regression model, consisted of two components (zero-inflated and Poisson). Taking the possible collinearity and sparsity of covariates into account, the objective function is obtained by adding the adaptive elastic-net penalty to the likelihood function. Then the sparse estimator of the regression coefficient is achieved by using the EM algorithm to optimize the objective function. The Bayesian information criterion (BIC) is employed to determine the optimal tuning parameter. This paper also presents the performance of the proposed estimator with large sample properties through a theoretical demonstration and simulation study, and then applied to the practical issues with the real data.

Key words: Zero-inflated Poisson Model, Variable Selection, Joint Modeling