统计研究

• 论文 • 上一篇    

适用于大数据集的广义可加模型

许亦频 倪苹   

  • 出版日期:2016-04-15 发布日期:2016-04-05

Generalized additive models for large data sets

Xu Yipin&Ni Ping   

  • Online:2016-04-15 Published:2016-04-05

摘要: 通常情况下,对用电量进行预测的问题可以采用广义可加模型(GAM),但当数据集很大时,在计算机上实现起来就非常困难,甚至是不可行的。因此,本文给出了大数据集下实用的广义可加模型拟合方法,模型中的平滑项用惩罚回归样条函数来表示。只需保证在任何时候模型矩阵的子矩阵可以在计算机上实现,该方法就可以通过迭代更新的方式得到模型矩阵的因子。本文研究证明,该方法可以有效地对平滑参数进行估计。当有新数据加入时,用电量预测模型需要不断地拟合更新,并且需要对新的用电量数据序列的自相关性进行处理。本文给出了处理这些问题的方法,以及在计算机上的实现过程。该方法可以实现使用一般的中型计算机来处理大数据集的广义可加模型的估计问题。最后,对法国用电量预测的实证研究表明,降秩样条平滑方法也能够很好地处理复杂的模型问题。

关键词: 相关可加模型, 用电量预测, 广义可加模型估计

Abstract: We consider an application in electricity grid load prediction, where generalized additive models are appropriate, but where the data set’s size can make their use practically intractable with existing methods. We therefore develop practical generalized additive model fitting methods for large data sets in the case in which the smooth terms in the model are represented by using penalized regression splines. The methods use iterative update schemes to obtain factors of the model matrix while requiring only subblocks of the model matrix to be computed at any one time. We show that efficient smoothing parameter estimation can be carried out in a well-justified manner. The grid load prediction problem requires updates of the model fit, as new data become available, and some means for dealing with residual auto-correlation in grid load. Methods are provided for these problems and parallel implementation is covered. The methods allow estimation of generalized additive models for large data sets by using modest computer hardware, and the grid load prediction problem illustrates the utility of reduced rank spline smoothing methods for dealing with complex modelling problems.

Key words: Correlated additive model, Electricity load prediction, Generalized additive model estimation