### 大数据背景下网络调查样本的建模推断问题研究——以广义Boosted模型的倾向得分推断为例

### Research on the Modeling Inference of Web Survey Samples In the Context of Big Data: Taking Propensity Score Inference of Generalized Boosted Model as an Example

Liu Zhan & Pan Yingli

• Online:2019-09-25 Published:2019-09-25

Abstract: With the development of big data and internet, web surveys are becoming more and more extensive. However, most of web survey samples belong to non-probability samples. It is difficult to apply the traditional inference theory of probability sampling to web survey samples. Therefore, how to solve inference problems of web survey samples is the urgent need for the development of web surveys in the context of big data. The research proposes some basic ideas to solve this problem from the perspective of modeling for the first time. First, inclusion probabilities can be estimated via modeling for inference. That is, propensity score models based on machine learning and variable selection can be constructed to estimate inclusion probabilities. Second, target variables can be estimated via modeling for inference. It can be considered to establish parametric, non-parametric or semi-parametric superpopulation models of target variables for estimating the population. Third, both inclusion probabilities and target variables can be estimated via modeling for inference. The weighted estimation and hybrid inference of propensity score models and superpopulation models can be considered. Finally, the modeling inference method of inclusion probabilities based on generalized boosted model is taken as an example to discuss concrete solutions to the modeling inference problem of web survey samples.