统计研究

• 论文 • 上一篇    下一篇

基于最近邻分析的空气质量时空数据异常点识别

聂斌等   

  • 出版日期:2017-08-15 发布日期:2017-08-25

Outlier Detection from Air Quality Spatio-temporal Data Based on Nearest Neighbor Analysis

Nie Bin et al.   

  • Online:2017-08-15 Published:2017-08-25

摘要: 空气质量问题近年来受到广泛关注。由于空气质量数据具有在时间上连续、空间上相关的特点,所以提高了异常点识别难度。本文提出在时间维度上运用移动平均法,而在空间维度上运用反距离加权法对观测值进行预测并求残差,从而将时空数据的异常点识别问题转化为二维残差值的异常点检测问题。在残差值的二维空间中,通过最近邻算法计算每个点相对于多个邻近点的异常程度。当异常程度大于阈值的概率超过预定值时判定为异常点。通过仿真验证表明新方法具有良好的检出力。最后将新方法应用于北京市实际观测数据,取得了满意的识别效果。

关键词: 空气质量, 时空数据, 异常点识别, 最近邻分析

Abstract: Air quality issues have received worldwide attention in recent years. Due to the continuous and spatial features of the air quality data, the outlier detection becomes much difficult. In this paper, the residual errors are predicted and calculated by applying the moving average method in the time dimension and the inverse distance weighted method in the space dimension, so that outlier detection from the spatio-temporal data can be transformed into outlier detection from the two-dimensional residual error value. In the two dimensions of the residual error value, the intensity of anomaly of each point to multiple neighboring points is calculated by the nearest neighbor analysis. The outlier can be defined when the probability of the intensity of anomaly greater than the threshold value exceeds the predetermined value. The simulation results show that the new method has a strong detection power. At the end, a satisfactory result for outlier detection is achieved while the real observation data set are applied with this new method.