统计研究 ›› 2019, Vol. 36 ›› Issue (7): 119-128.doi: 10.19343/j.cnki.11-1302/c.2019.07.010

• • 上一篇    

网络社区发现算法在流动表建模中的设计与应用

孙旭等   

  • 出版日期:2019-07-25 发布日期:2019-07-29

Design and Application of Network Community Discovery Algorithm in Flow Table Modeling

Sun Xu et al.   

  • Online:2019-07-25 Published:2019-07-29

摘要: 代际流动表可以统计子代与其父代社会地位配对数据的交互频数,反映了社会资源占有的优劣势在父子两代人之间的比较。对财富、阶级、特权等社会基本特征演变的实证考察,均依赖于代际流动表的量化分析。对数线性模型是流动表建模分析的基本工具,通过对列联表单元格频数进行拟合,可以识别流动表行分类与列分类之间的强弱交互效应,刻画父子社会地位间的交互结构。本文利用复杂网络社区发现算法分析父子社会地位的关联结构,针对简约对数线性模型拟合精度不够的问题,提出一种新的建模思路:利用社区发现算法对简约对数线性模型的残差列联表进行关联关系挖掘,将发现的社区效应作为附加参数约束引入原对数线性模型,以改善数据的拟合情况。由于该方法只在原简约对数线性模型中增加了一个参数约束,因此仍可以保证建模结果的简洁性及理论意义,同时社区效应补充了原对数线性模型对经验数据结构的解读。论文用此方法对来源于中国综合社会调查数据的经验代际职业流动表进行建模分析,较好地解释了子代职业阶层与父代职业阶层间的关联模式。

关键词: 社区发现算法, 代际流动表, 对数线性模型, 矩阵谱分析

Abstract: The frequency of interaction between the data of the intergenerational flow table and the data of the parent’s social status reflects the superiority and inferiority of social resources in the comparison between the parent and son. The empirical investigation of the evolution of the basic social characteristics of wealth, class, privilege, etc. depends on the quantitative analysis of the intergenerational flow table. Log-linear model is the basic tool for flow table modeling analysis. By fitting the cell frequency of the contingency table, we can identify the strong and weak interactions between the row classification and the column classification of the flow table, then describe the interaction structure of the social status between parent and son. However, in the process of modeling empirical data, it is often encountered that the fitting accuracy of the reduced log-linear model cannot pass the test. Existing linear equation variable selection methods can improve the fitting effect of the model, but there are problems that the modeling results are difficult to match the social mobility theory, and there is no clear guiding significance for the induction and abstract social mobility models. In the social flow table, the social status between the son and the parent constitutes a social network. The paper applies the complex network community discovery algorithm to discover the social association structure between parent and son. Aiming at the problem of insufficient precision of the reduced loglinear model, a new modeling idea is proposed: the community discovery algorithm is used to mine the residual contingency table of the reduced loglinear model, and the discovered community effect is added. Parameter constraint is introduced into the original loglinear model to improve the fitting of the data. Since only one parameter constraint is added to the original reduced loglinear model, the simplicity and theoretical significance of the modeling result can be guaranteed. At the same time, the community effect complements the interpretation of the empirical data structure by the original log-linear model. We use this method to model and analyze the empirical intergenerational occupational flow table derived from China’s comprehensive social survey data, which better explains the association model between the son occupational class and the parent’s occupational class.

Key words: Community Discovery Algorithm, Intergenerational Flow Table, Loglinear Model, Matrix Spectrum Analysis