统计研究 ›› 2019, Vol. 36 ›› Issue (4): 119-128.doi: 10.19343/j.cnki.11-1302/c.2019.04.011

• • 上一篇    

基于fused惩罚的稀疏主成分分析

张波 刘晓倩   

  • 出版日期:2019-04-25 发布日期:2019-04-22

Sparse Principal Component Analysis with Fused Penalty

Zhang Bo & Liu Xiaoqian   

  • Online:2019-04-25 Published:2019-04-22

摘要: 本文旨在研究基于fused惩罚的稀疏主成分分析方法,以适用于相邻变量之间高度相关甚至完全相等的数据情形。首先,从回归分析角度出发,提出一种求解稀疏主成分的简便思路,给出一种广义的稀疏主成分模型—— GSPCA模型及其求解算法,并证明在惩罚函数取1-范数时,该模型与现有的稀疏主成分模型——SPC模型的求解结果一致。其次,本文提出将fused惩罚与主成分分析相结合,得到一种fused稀疏主成分分析方法,并从惩罚性矩阵分解和回归分析两个角度,给出两种模型形式。在理论上证明了两种模型的求解结果是一致的,故将其统称为FSPCA模型。模拟实验显示,FSPCA模型在处理相邻变量之间高度相关甚至完全相等的数据集上的表现良好。最后,将FSPCA模型应用于手写数字识别,发现与SPC模型相比,FSPCA模型所提取的主成分具备更好的解释性,这使得该模型更具实用价值。

关键词: 主成分分析, 稀疏化方法, fused惩罚, 手写数字识别, 可解释性

Abstract: This paper mainly studies sparse principal component analysis with fused penalty, so as to solve problems with features which are naturally ordered or variables which are related or even equal to their neighbors. First, we propose a simple approach to obtain sparse PCs from the perspective of regression. A new generalized sparse PCA model is introduced, namely generalized sparse PCA (GSPCA), and the corresponding algorithm is offered. Also, we prove that the solution of GSPCA is equivalent to that of SPC, an existing sparse PCA model, when the penalty is 1-norm. Next, we propose combining the fused penalty and sparse PCA to get a fused sparse PCA method, and introduce the corresponding model with two forms on the basis of PMD and regression. After theoretical derivation, we find that the solutions of the two model forms are consistent, so we call the model FSPCA without discrimination. The simulation reveals that FSPCA has a good performance on datasets where variables are related or even equal to their neighbors. At last, we apply the FSPCA to handwritten numeral recognition. It turns out that compared with SPC, FSPCA can extract PCs which have better interpretability, and this makes FSPCA of higher practical value.

Key words: Principal Component Analysis, Sparsity Method, Fused Penalty, Handwritten Numeral Recognition, Interpretability