06 2013 档案

摘要:单站点的爬取与检索测试1, 创建urls文件夹,在文件夹下面创建seed.txt文件, 在seed.txt文件中输入要爬取的站点例如: www.osu.edumkdir -p urls cd urlstouch seed.txt to create a text file seed.txt under urls/ with the following content (one URL per line for each site you want Nutchto crawl). 2,修改conf/crawl-urlfilter.txt将MY.DOMAIN.NAME替换为osu.edu原来为:. 阅读全文
posted @ 2013-06-27 19:18 free_thinker 阅读(228) 评论(0) 推荐(0)
摘要:安装配置JDK首先从官方网站下载JDK.我的下载的到了目录:/home/gsli/Downloads 使用命令: sudo ./jdk-6u21-linux-i586.bin 他就会自动安装安装完成之后.打开/etc/profile在文件最下面添加gsli@ubuntu:~/Downloads/jdk1.6.0_21$ sudo vi /etc/profile#set Java Environmentexport JAVA_HOME=/home/gsli/Downloads/jdk1.6.0_21export CLASSPATH=".:$JAVA_HOME/lib:$CLA... 阅读全文
posted @ 2013-06-25 16:35 free_thinker 阅读(199) 评论(0) 推荐(0)