统计研究 ›› 2005, Vol. 22 ›› Issue (2): 71-4.

• 论文 • 上一篇    下一篇

连续属性决策树所建立的垃圾邮件识别器的稳定性研究

王星;谢邦昌   

  1. 中国人民大学
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2005-02-15 发布日期:2005-02-15

  • Received:1900-01-01 Revised:1900-01-01 Online:2005-02-15 Published:2005-02-15

Abstract: Avoiding spare mial is one of the most critical problem in Internet technology, finding the most important attribute or the attribute combination to identify which email is normal and which email is spam mail, is the bottleneck of discriminate of the spam. Recent years, decision tress is popular used for excellent with good expression and capable to output rules, and then becomes the core technique in predicting spam mail. However, many famous decision trees such as CA .5 and CART is not very robust,that make the output is not stable which distrubing the construction of the identifying classification. In this paper, we studied the robust of CART algorithm, point out the robust problem when using the decision tree classifier on identifying Spam from normal email with interval attribute, then we try to using BAGGING algorithm to gain more robust model, an at the same time increase the performance of the initial models