统计研究 ›› 2021, Vol. 38 ›› Issue (12): 145-156.doi: 10.19343/j.cnki.11-1302/c.2021.12.011

• • 上一篇    

基于行政大数据的失业率估计:以某四百万人口城市为例

中国经济大数据研究组   

  • 出版日期:2021-12-25 发布日期:2021-12-25

Unemployment Rate Estimation with Administrative Big Data: Case Study in a City of 4-Million Population

China’s Economic Big Data Research Group   

  • Online:2021-12-25 Published:2021-12-25

摘要: 我国城镇登记失业率指标稳定在4%左右,难以较为准确反映就业动态;而劳动力调查样本量有限,城镇调查失业率对省以下各级行政区域代表性不足。本文将针对大数据的机器学习算法与针对传统统计数据的核算思想结合起来,基于某四百万人口城市2016—2018年的全样本行政大数据,利用机器学习算法,对每个城镇居民每个月的就业状态进行预测,再利用统计核算方法,估计出该城市的失业率。在个人层面,本文的模型在样本外测试集上的准确率达到96.7%。经过统计核算加总,本文估计的当地失业率在合理区间范围内,并表现出明显的周期性特征,对就业形势动态变化的刻画明显优于当地一年发布一次的登记失业率数据。本文基于个人层面的预测结果,进一步探讨了当地失业人口 的性别与文化程度特征,以及再就业的时间规律。本文针对如何使用行政大数据辅助经济决策提出了新的范式,对大数据时代如何理解经济与制定政策具有参考意义。

关键词: 行政大数据, 机器学习, 统计核算, 失业率

Abstract: Among one of the major indicators of the economy, the unemployment rate figures yet confront lots of skepticism in China. The register-based urban unemployment rate has been stable at around 4% with very low fluctuations over time, while the survey-based unemployment rate is calculated from a relatively small sample which is difficult to deliver accurate provincial or prefecture-level estimates. In this paper, we address these challenges by combining machine learning algorithms with traditional national account systems to estimate the unemployment rate. Essentially, we train a machine learning model to predict an individual 's monthly employment status based on administrative big data in a city with a four million population and estimate the unemployment rate of the city with a national account method. At the individual level, our model achieves an accuracy of 96. 7% on the out-of-sample test set. Summing up in the national account system, the estimated local unemployment rate fluctuates in a reasonable range and exhibits periodic characteristics. Our estimates provide a better economic indicator than the register-based unemployment rate released once every year. We also study the gender, educational level, and the pattern of reemployment of the local unemployed population with individual-level data. Our paper proposes a new approach of using administrative big data to understand the economic conditions and to facilitate policy-making in the age of big data.

Key words: Administrative Big Data, Machine Learning, National Accounts, Unemployment Rate