基于fused惩罚的稀疏主成分分析

doi:10.19343/j.cnki.11-1302/c.2019.04.011

统计研究 ›› 2019, Vol. 36 ›› Issue (4): 119-128.doi: 10.19343/j.cnki.11-1302/c.2019.04.011

• • 上一篇

基于fused惩罚的稀疏主成分分析

张波刘晓倩

出版日期:2019-04-25 发布日期:2019-04-22

Sparse Principal Component Analysis with Fused Penalty

Zhang Bo & Liu Xiaoqian

Online:2019-04-25 Published:2019-04-22

摘要/Abstract

摘要： 本文旨在研究基于fused惩罚的稀疏主成分分析方法，以适用于相邻变量之间高度相关甚至完全相等的数据情形。首先，从回归分析角度出发，提出一种求解稀疏主成分的简便思路，给出一种广义的稀疏主成分模型—— GSPCA模型及其求解算法，并证明在惩罚函数取1-范数时，该模型与现有的稀疏主成分模型——SPC模型的求解结果一致。其次，本文提出将fused惩罚与主成分分析相结合，得到一种fused稀疏主成分分析方法，并从惩罚性矩阵分解和回归分析两个角度，给出两种模型形式。在理论上证明了两种模型的求解结果是一致的，故将其统称为FSPCA模型。模拟实验显示，FSPCA模型在处理相邻变量之间高度相关甚至完全相等的数据集上的表现良好。最后，将FSPCA模型应用于手写数字识别，发现与SPC模型相比，FSPCA模型所提取的主成分具备更好的解释性，这使得该模型更具实用价值。

关键词: 主成分分析, 稀疏化方法, fused惩罚, 手写数字识别, 可解释性

Abstract: This paper mainly studies sparse principal component analysis with fused penalty, so as to solve problems with features which are naturally ordered or variables which are related or even equal to their neighbors. First, we propose a simple approach to obtain sparse PCs from the perspective of regression. A new generalized sparse PCA model is introduced, namely generalized sparse PCA (GSPCA), and the corresponding algorithm is offered. Also, we prove that the solution of GSPCA is equivalent to that of SPC, an existing sparse PCA model, when the penalty is 1-norm. Next, we propose combining the fused penalty and sparse PCA to get a fused sparse PCA method, and introduce the corresponding model with two forms on the basis of PMD and regression. After theoretical derivation, we find that the solutions of the two model forms are consistent, so we call the model FSPCA without discrimination. The simulation reveals that FSPCA has a good performance on datasets where variables are related or even equal to their neighbors. At last, we apply the FSPCA to handwritten numeral recognition. It turns out that compared with SPC, FSPCA can extract PCs which have better interpretability, and this makes FSPCA of higher practical value.

Key words: Principal Component Analysis, Sparsity Method, Fused Penalty, Handwritten Numeral Recognition, Interpretability

张波刘晓倩. 基于fused惩罚的稀疏主成分分析 [J]. 统计研究, 2019, 36(4): 119-128.

Zhang Bo & Liu Xiaoqian. Sparse Principal Component Analysis with Fused Penalty[J]. Statistical Research, 2019, 36(4): 119-128.

[1]	王守坤. 僵尸企业与污染排放：基于识别与机理的实证分析[J]. 统计研究, 2018, 35(10): 58-68.
[2]	王洁丹等. 函数型死亡率预测模型[J]. 统计研究, 2013, 30(9): 87-93.
[3]	林海明杜子芳. 主成分分析综合评价应该注意的问题[J]. 统计研究, 2013, 30(8): 25-31.
[4]	苏治傅晓媛. 核主成分遗传算法与SVR选股模型改进 [J]. 统计研究, 2013, 30(5): 54-62.
[5]	陈骥王炳兴. 基于正态分布点值化的区间主成分评价法及应用[J]. 统计研究, 2012, 29(7): 91-95.
[6]	李小胜陈珍珍. 如何正确应用SPSS软件做主成分分析[J]. 统计研究, 2010, 27(8): 105-108.
[7]	莫鸿, 陈圻, 刘豫. 中国物流业发展中的体制性障碍因素调查 ——江苏省实地调查报告 [J]. 统计研究, 2008, 25(8): 35-39.
[8]	王斌会. 稳健主成分分析方法研究及其在经济管理中的应用[J]. 统计研究, 2007, 24(8): 72-76.
[9]	严明义. 函数性数据的统计分析：思想、方法和应用[J]. 统计研究, 2007, 24(2): 87-94.
[10]	刘炳辉李晓青. 海峡西岸经济区产业竞争力实证研究[J]. 统计研究, 2007, 24(12): 18-21.
[11]	项泾渭傅德印. 基于SPSS的二次开发直接求解主成分[J]. 统计研究, 2006, 23(4): 73-75.
[12]	陈钰芬. 区域智力资本测度指标体系的构建[J]. 统计研究, 2006, 23(2): 24-27.
[13]	林海明, 张文霖. 主成分分析与因子分析的异同和SPSS软件——兼与刘玉玫、卢纹岱等同志商榷[J]. 统计研究, 2005, 22(3): 65-5.
[14]	刘玉玫, 张芃. 经济全球化程度的量化研究[J]. 统计研究, 2003, 20(12): 13-6.
[15]	吴长凤, 李花. 因子多元ARCH模型的因子选择及其应用[J]. 统计研究, 2001, 18(6): 47-49.

基于fused惩罚的稀疏主成分分析

Sparse Principal Component Analysis with Fused Penalty

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 10