统计研究 ›› 2013, Vol. 30 ›› Issue (8): 3-9.

• 论文 •    下一篇

关于综合利用Benford法则与其他方法评估统计数据质量的进一步研究

刘云霞 曾五一   

  • 出版日期:2013-08-15 发布日期:2013-08-05

Further Research about the Comprehensive Utilization of Benford’s Law and Other Methods to Evaluate the Statistical Data Quality

Liu Yunxia & Zeng Wuyi   

  • Online:2013-08-15 Published:2013-08-05

摘要: 利用Benford法则对数据质量进行检验是一种已经在实践中得到广泛应用的重要方法。但该方法也存在一些局限性,针对其存在的问题,本文进一步探讨了如何将其与异常值探测、数据挖掘技术等方法相结合,从而找出可能存在数据质量问题的具体样本及其规律的方法。并利用该方法对我国保险行业2006年-2011年主要经济指标的数据质量进行了实证分析,结果表明这种方法是合理且有效的。

关键词: 数据质量, Benford 法则, 异常值探测, 数据挖掘

Abstract: Benford’s law is an important method which is widely used in data quality detection. However, Benford’s law has some limitations. To solve these problems, we further discussed how to combine Benford’s law with anomaly detection and data mining. Thus, we can identify specific sample which may have data quality problem and look for the law it’s appeared. Finally, we did empirical analysis on the quality of China's insurance industry data in 2006 - 2011 by the proposed method. The results showed that this method is reasonable and effective.

Key words: Data Quality, Benford’s Law, Anomaly Detection, Data Mining