统计研究 ›› 0, Vol. ›› Issue (): 117-128.doi: 10.19343/j.cnki.11-1302/c.2020.08.009

• • 上一篇    

基于秩能量距离的超高维特征筛选研究

何胜美 李高荣 许王莉   

  • 出版日期:2020-08-21 发布日期:2020-08-21

A Feature Screening for Ultra-high Dimensional Discriminant Analysis Using Rank-based Energy Distance

He Shengmei Li Gaorong Xu Wangli   

  • Online:2020-08-21 Published:2020-08-21

摘要: 特征筛选是超高维数据分析中常用的快速降维方法。本文首先基于秩能量距离提出了一种新的适用于超高维判别分析的特征筛选方法(RED-SIS)。该方法无需假定模型结构和有限矩条件,对厚尾协变量数据具有较好的稳健性。其次,本文研究了该方法的理论性质,并在几个较为宽松的正则条件下,证明了确定筛选性质和排序相合性。结果表明,RED-SIS能有效处理变量维数p和样本量n满足logp=O(nα)的超高维判别分析特征筛选问题,且随着样本量的增加,筛选出的特征集合包含全部真实重要特征集合的概率趋近于1。最后,蒙特卡罗模拟研究该方法的有限样本性质,并和现有的超高维特征筛选方法进行比较。数值模拟结果表明,该方法在厚尾数据情况下具有明显的优越性,同时,实际数据分析的研究结果也说明RED-SIS方法的有效性。

关键词: 超高维数据, 特征筛选, 秩能量距离, 确定筛选性质

Abstract: Feature screening is a common method for dimensionality reduction in ultra-high dimensional data analysis.In this paper,a new feature screening procedure,named RED-SIS,is first proposed based on rank-based energy distance.This procedure does not need to assume model structure and finite moment conditions,and is robust for heavy-tailed covariate. Secondly,the asymptotical properties of the proposed method are studied,the sure screening property and ranking consistency property are proved under some mild regularity conditions.It shows that the proposed RED-SIS can effectively deal with the ultra-high dimensional discriminant analysis with the sample size n and the dimension number p satisfying logp=O(nα).Also,as the sample size increases, the screened set contains all true important feature sets with the probability tending to 1.Last,we present the finite sample performance of the proposed method by numerical analysis,and compare the proposed method with the existing methods for the feature screening in ultra-high dimensional discriminant analysis.Both simulation and real data analysis shows that RED-SIS can be more competitive for feature screening with heavy-tailed distribution.

Key words: Ultra-high Dimensional Data, Feature Screening, Rank-based Energy Distance, Sure Screening Property