2014-11-20:PageRank学习记录

Stanford CS224W Problem Set 0

昨天做的是第二题,给出一个数据集(stackoverflow-Java.txt),运用相关工具求出

  • 弱连通分量(weakly connected components)
  • 最大弱连通分量中节点个数、边的个数
  • PageRank算法排名最高的前三个的ID以及score
  • HITS算法排名前三个hubs以及authorities

题目比较简单,主要考查的是对工具的掌握情况,具体讲解在以下代码中说明。

##################################################################
# Project: Finding experts on the java programming
#           language on Stack-Overflow 
# Tool: SNAP 
# Version:1.0
# Date: 2014-11-19
# Author: Chuanting.Zhang
# Email: chuanting.zhang@gmail.com 
# Ps: Written by Burning at SDU
###################################################################
import snap;
from snap import PNGraph;
from audioop import reverse
#加载进数据,数据包含N行两列 Graph
= snap.LoadEdgeListStr(PNGraph, 'stackoverflow-Java.txt', 0, 1) print 'Number of nodes in the network: %d' % Graph.GetNodes()
#涉及到图的分量时要先定义一个变量来存储连通分量 Components
= snap.TCnComV() snap.GetWccs(Graph, Components) print 'Number of weakly connected components in the network: %d' %Components.Len()
#最大弱联通分量 mxWcc
= snap.GetMxWcc(Graph)# Return a graph print 'Number of edges and nodes in the largest weakly connected components: %d %d' %(mxWcc.GetEdges(), mxWcc.GetNodes())
#PageRank算法
#PageRank返回的是节点ID以及排名值,所以把它存储到一个哈希表中 pRankH
= snap.TIntFltH() snap.GetPageRank(Graph, pRankH)
#逆序排序 slist
= sorted(pRankH, key = lambda key: pRankH[key], reverse = True) for item in slist[:10]: print 'id: %7s, pagerank: %.6f' %(item,pRankH[item]) NIdHubH = snap.TIntFltH() NIdAuthH = snap.TIntFltH()

#HITS算法 snap.GetHits(Graph, NIdHubH, NIdAuthH) slistHub
= sorted(NIdHubH, key = lambda key: NIdHubH[key], reverse = True) for item in slistHub[:3]: print 'id: %7s, hub: %.6f' %(item, NIdHubH[item]) slistAurth = sorted(NIdAuthH, key = lambda key: NIdAuthH[key], reverse = True) for item in slistAurth[:3]: print 'id: %7s, authority: %.6f' %(item, NIdAuthH[item])

 题目虽然简单,但是里面涉及的算法还是需要搞明白。

 

posted on 2014-12-08 13:17  湘江楚云  阅读(304)  评论(0)    收藏  举报

导航