统计研究

• 论文 • 上一篇    下一篇

基于网络搜索数据的房地产价格预测

董倩等   

  • 出版日期:2014-10-15 发布日期:2014-10-14

Real Estate Price Prediction based on Searching Data

Dong Qian et al.   

  • Online:2014-10-15 Published:2014-10-14

摘要: 本文以北京、上海、天津、重庆等16个大中城市的二手房价格和新房价格为研究对象,以来自我国最大搜索引擎的百度搜索指数为数据基础,使用 6种计量模型分别对16个城市的二手房价格和新房价格进行了拟合和预测,得到预测二手房和新房价格变动情况的最优模型。结果显示:网络搜索数据不但能够较好地预测房价指数,而且能够分析经济主体行为的趋势与规律,有一定的时效性。预测的月度房地产价格能够比官方数据发布提前约两周时间。

关键词: 网络搜索数据, 房地产价格预测, 交叉验证, 支持向量机, 随机森林

Abstract: The real estate industry is one of the economics drivers of the Chinese economy, and the housing price has been earning constant attention ever since. But the data published by government statistical agencies are usually delayed, thus cannot fulfill the public demand. This article provides an optimal modal of predicting the price trends in new and secondary housing market in 16 cities including those in Beijing, Shanghai, Tianjin, and some other relatively developed cities in China. Based on the Baidu Search Index (BSI), we picked 12 keywords that influence the second-hand housing price most and 8 keywords that influence the price of new houses most. With the cross-validation technique, we fitted and forecasted the housing prices in both markets by using 6 analytical models including Linear Regression, Regression Tree, Random Forests, Bagging, m-Boosting and Support Vector Machine (SVM). Among the 6 models we used, the SVM and Random Forest models predicted the best, while the Regression Tree model predicted the worst one. Most of the public attention is on the aspects of the transaction and housing policy among the key factors that influence the price of secondary housing; while the price trends and real estate policy are the focus of public attention among the key factors that influence the price of new houses. We concluded that the data collected through website searching could not only predict the housing prices, but it could also derive some specific patterns and trends of economic behavior among major society. Besides, this prediction model is highly timely since it could predict the price trends of the real estate industry two weeks prior to the data published by official statistic agencies.

Key words: Searching Data, Real Estate Price Prediction, Cross Validation, SVM, Random Forest