统计研究

• 论文 • 上一篇    下一篇

大数据背景下非概率抽样的统计推断问题

金勇进 刘展   

  • 出版日期:2016-03-15 发布日期:2016-03-21

Statistical Inference Problems of Non-probability Sampling under the Background of Big Data

Jin Yongjin & Liu Zhan   

  • Online:2016-03-15 Published:2016-03-21

摘要:

利用大数据进行抽样,很多情况下抽样框的构造比较困难,使得抽取的样本属于非概率样本,难以将传统的抽样推断理论应用到非概率样本中,如何解决非概率抽样的统计推断问题,是大数据背景下抽样调查面临的严重挑战。本文提出了解决非概率抽样统计推断问题的基本思路:一是抽样方法,可以考虑基于样本匹配的样本选择、链接跟踪抽样方法等,使得到的非概率样本近似于概率样本,从而可采用概率样本的统计推断理论;二是权数的构造与调整,可以考虑基于伪设计、模型和倾向得分等方法得到类似于概率样本的基础权数;三是估计,可以考虑基于伪设计、模型和贝叶斯的混合概率估计。最后,以基于样本匹配的样本选择为例探讨了具体解决方法。

关键词: 大数据, 非概率抽样, 统计推断

Abstract:

When sampling is done with big data, the construction of sampling frame is difficult in many cases, so that the sample belongs to non-probability sample, and it is difficult to apply the traditional inference theory of sampling to the non-probability sample. Therefore, under the background of big data it is a serious challenge to sampling survey to solve the statistical inference problems of non-probability sampling. The research proposes some basic ideas to solve the statistical inference problems of non-probability sampling. First, sampling methods such as the sample selection method based on sample matching and the method of link-tracing sampling can be considered, so that the obtained non-probability sample approximates to probability sample and then the statistical inference theory of probability sample can be used. Second, the construction and adjustment methods of weights based on pseudo design, models and propensity score can be considered to obtain the base weights similar to probability sample. Third, the estimation methods based on pseudo design, models and Bayesian hybrid probability can be considered. Finally, the sample selection method based on sample matching is taken as an example to discuss concrete solutions to the statistical inference problems of non-probability sampling.

Key words: Big Data, Non-probability Sampling, Statistical Inference