统计研究 ›› 2019, Vol. 36 ›› Issue (9): 93-.doi: 10.19343/j.cnki.11-1302/c.2019.09.008

• • 上一篇    下一篇

大数据背景下网络调查样本的建模推断问题研究——以广义Boosted模型的倾向得分推断为例

刘展 潘莹丽   

  • 出版日期:2019-09-25 发布日期:2019-09-25

Research on the Modeling Inference of Web Survey Samples In the Context of Big Data: Taking Propensity Score Inference of Generalized Boosted Model as an Example

Liu Zhan & Pan Yingli   

  • Online:2019-09-25 Published:2019-09-25

摘要: 随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路:一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。

关键词: 大数据, 网络调查样本, 入样概率, 目标变量, 建模推断

Abstract: With the development of big data and internet, web surveys are becoming more and more extensive. However, most of web survey samples belong to non-probability samples. It is difficult to apply the traditional inference theory of probability sampling to web survey samples. Therefore, how to solve inference problems of web survey samples is the urgent need for the development of web surveys in the context of big data. The research proposes some basic ideas to solve this problem from the perspective of modeling for the first time. First, inclusion probabilities can be estimated via modeling for inference. That is, propensity score models based on machine learning and variable selection can be constructed to estimate inclusion probabilities. Second, target variables can be estimated via modeling for inference. It can be considered to establish parametric, non-parametric or semi-parametric superpopulation models of target variables for estimating the population. Third, both inclusion probabilities and target variables can be estimated via modeling for inference. The weighted estimation and hybrid inference of propensity score models and superpopulation models can be considered. Finally, the modeling inference method of inclusion probabilities based on generalized boosted model is taken as an example to discuss concrete solutions to the modeling inference problem of web survey samples.

Key words: Big Data, Web Survey Samples, Inclusion Probability, Target Variables, Modeling Inference