[Reading Notes] 2010 ICISTM Can Your Judge a Man by His Friends?-Enhancing Spammer Detection on the Twitter Microblogging Platform Using Friends and Followers

方法

这篇文章利用机器学习与社交网络信任传播相结合的方法来检测微博中的作弊者。该方法分为两步：第一步，首先利用用户的基础属性（和该用户相关的一些特征）构建一个基学习器（分类器），然后利用该基学习器以及人工标记的一些训练集来预测其他的用户的类别（作弊与非作弊）；第二步，首先利用第一步中得到的结果，结合用户社交网络的信任传播模型得到一个用户的扩展属性集，然后利用这些扩展属性集构建一个学习器，再利用该学习器及训练样本预测其他用户的类别。总过程如下图所示。

数据集

作弊用户的收集：采用网页twitspam.org（一个用户可以提交怀疑为Twitter作弊用户的地方）；
可信用户的收集：作者采用他们自己follow的用户作为可信用户。

最终作者获取了77作弊用户，155个可信用户。另外，每个用户的followers的信息也被收集了(上限：200)。

属性集

基本属性集
- follower-friend ratio
- number of posts marked as favorites
- friends added per day
- followers added per day
- account is protected?
- updates per day
- has url?
- number of digits in account name
- reciprocity
扩展属性1：好友和跟随者相关的属性集
- follower-friend ratio
- updates per day
- friends added per day
- followers added per day
- reciprocity
- account is protected?
扩展属性2：信任传播