摘要: 单站点的爬取与检索测试1, 创建urls文件夹,在文件夹下面创建seed.txt文件, 在seed.txt文件中输入要爬取的站点例如: www.osu.edumkdir -p urls cd urlstouch seed.txt to create a text file seed.txt under urls/ with the following content (one URL per line for each site you want Nutchto crawl). 2,修改conf/crawl-urlfilter.txt将MY.DOMAIN.NAME替换为osu.edu原来为:. 阅读全文
posted @ 2013-06-27 19:18 free_thinker 阅读(220) 评论(0) 推荐(0)