统计研究

• 论文 • 上一篇    下一篇

一种基于网络爬虫技术的价格指数计算模型

孙易冰等   

  • 出版日期:2014-10-15 发布日期:2014-10-14

A Model of Compiling Price Index Based on the ‘Web Scraping’ Technology

Sun Yibing et al.   

  • Online:2014-10-15 Published:2014-10-14

摘要: 近年来国内外机构已经开展基于大数据的网络购物价格指数分析研究。本文参照官方CPI的制度方法,设计了一种基于网络爬虫技术的价格指数计算模型。通过模型试算值与官方数据的比较,以及对原始数据的特征挖掘,我们发现该种模型具有时效性强和灵敏度高的优点。

关键词: 价格指数, 网络爬虫, 聚类分析, 幂律分布

Abstract: In recent years,some domestic and foreign institutions have been conducting research on using big data in compiling online price indexes. This paper designs a model of compiling price index based on the ‘web scraping’ technology by referring to the official CPI methodology. By comparing results of this model with official CPI data,and analyzing characteristics of raw data,we find out that the model has the advantages of strong timeliness and high sensitivity.

Key words: Price Index, Web scraping, Cluster Analysis, Power-Law Distribution