统计研究 ›› 2021, Vol. 38 ›› Issue (1): 147-160.doi: 10.19343/j.cnki.11-1302/c.2021.01.012

• • 上一篇    

基于随机化适应性Lasso的高维变量选择

闫懋博 田茂再   

  • 出版日期:2021-01-25 发布日期:2021-01-26

Selection of High Dimensional Variables Based on Randomized Adaptive Lasso

Yan Maobo Tian Maozai   

  • Online:2021-01-25 Published:2021-01-26

摘要: Lasso等惩罚变量选择方法选入模型的变量数受到样本量限制。文献中已有研究变量系数显著性的方法舍弃了未选入模型的变量含有的信息。本文在变量数大于样本量即p>n的高维情况下,使用随机化bootstrap方法获得变量权重,在计算适应性Lasso时构建选择事件的条件分布并剔除系数不显著的变量,以得到最终估计结果。本文的创新点在于提出的方法突破了适应性Lasso可选变量数的限制,当观测数据含有大量干扰变量时能够有效地识别出真实变量与干扰变量。与现有的惩罚变量选择方法相比,多种情境下的模拟研究展示了所提方法在上述两个问题中的优越性。实证研究中对NCI-60癌症细胞系数据进行了分析,结果较以往文献有明显改善。

关键词: 随机化适应性Lasso, 高维变量选择, 选择性推断

Abstract: The number of variables selected into the model with penalty variable selection methods such as Lasso is limited by the sample size. In the literature, the method of coefficient significance of variables has abandoned the information contained in variables that are not selected into the model. In this paper, we use the randomization bootstrap method to obtain the weight of variables when the number of variables is larger than the sample size (p>n). In order to get the final estimation result, the conditional distribution of the selected event is constructed and the variable whose coefficient is not significant is eliminated when calculating the adaptive Lasso. The innovation of this paper is that the proposed method breaks through the limitation of the number of variables that adaptive Lasso can choose. When the observed data contain a large number of noise variables, it can effectively identify the real variables and noise variables. Compared with the existing penalty variable selection methods, the simulation studies in various scenarios show the superiority of the proposed method in the above two problems. The data of NCI-60 Cancer Cell Line are analyzed in the empirical study, and the results are much better than those in the previous literature.

Key words: Randomized Adaptive Lasso, High Dimensional Variables Selection, Selective Inference