统计研究 ›› 2012, Vol. 29 ›› Issue (6): 95-98.

• 论文 • 上一篇    下一篇

非平衡数据集的改进SMOTE再抽样算法

薛薇   

  • 出版日期:2012-06-15 发布日期:2012-06-20

An Improved SMOTE Algorithm for Re-Sampling Imbalanced Data Sets

Xue Wei   

  • Online:2012-06-15 Published:2012-06-20

摘要: 非平衡数据集的不均衡学习特点通常表现为负类的分类效果不理想。改进SMOTE再抽样算法,将过抽样和欠抽样方式有机结合,有针对性地选择近邻并采用不同策略合成样本。实验表明,分类器在经此算法处理后的非平衡数据集的正负两类上,均可获得较理想的分类效果。

关键词: SMOTE算法, 再抽样, 非平衡数据集

Abstract: The inharmonious status on training the imbalanced data sets usually show the bad performance on classifying the negative class. By the combination of the over-sampling and under-sampling approaches, The Re-Sampling method based on SMOTE algorithm could control the synthesis of samples choosing the different nearest-neighbors as well as the different strategies. Our experiments show that, general classifier could get relative good results both on positive and negative class after processing by this algorithm.

Key words: SMOTE Algorithm, Re-Sampling, Imbalanced Data Set