2014-11-20:PageRank学习记录
Stanford CS224W Problem Set 0
昨天做的是第二题,给出一个数据集(stackoverflow-Java.txt),运用相关工具求出
- 弱连通分量(weakly connected components)
- 最大弱连通分量中节点个数、边的个数
- PageRank算法排名最高的前三个的ID以及score
- HITS算法排名前三个hubs以及authorities
题目比较简单,主要考查的是对工具的掌握情况,具体讲解在以下代码中说明。
################################################################## # Project: Finding experts on the java programming # language on Stack-Overflow # Tool: SNAP # Version:1.0 # Date: 2014-11-19 # Author: Chuanting.Zhang # Email: chuanting.zhang@gmail.com # Ps: Written by Burning at SDU ################################################################### import snap; from snap import PNGraph; from audioop import reverse
#加载进数据,数据包含N行两列 Graph = snap.LoadEdgeListStr(PNGraph, 'stackoverflow-Java.txt', 0, 1) print 'Number of nodes in the network: %d' % Graph.GetNodes()
#涉及到图的分量时要先定义一个变量来存储连通分量 Components = snap.TCnComV() snap.GetWccs(Graph, Components) print 'Number of weakly connected components in the network: %d' %Components.Len()
#最大弱联通分量 mxWcc = snap.GetMxWcc(Graph)# Return a graph print 'Number of edges and nodes in the largest weakly connected components: %d %d' %(mxWcc.GetEdges(), mxWcc.GetNodes())
#PageRank算法
#PageRank返回的是节点ID以及排名值,所以把它存储到一个哈希表中 pRankH = snap.TIntFltH() snap.GetPageRank(Graph, pRankH)
#逆序排序 slist = sorted(pRankH, key = lambda key: pRankH[key], reverse = True) for item in slist[:10]: print 'id: %7s, pagerank: %.6f' %(item,pRankH[item]) NIdHubH = snap.TIntFltH() NIdAuthH = snap.TIntFltH()
#HITS算法 snap.GetHits(Graph, NIdHubH, NIdAuthH) slistHub = sorted(NIdHubH, key = lambda key: NIdHubH[key], reverse = True) for item in slistHub[:3]: print 'id: %7s, hub: %.6f' %(item, NIdHubH[item]) slistAurth = sorted(NIdAuthH, key = lambda key: NIdAuthH[key], reverse = True) for item in slistAurth[:3]: print 'id: %7s, authority: %.6f' %(item, NIdAuthH[item])
题目虽然简单,但是里面涉及的算法还是需要搞明白。
浙公网安备 33010602011771号