基于apache lucene的solr站内搜索引擎搭配手记



   基于apache lucene的solr站内搜索引擎搭配手记
   [ 预备警员.10078 @ 2009-03-23 17:15:30 ]


由于工作关系,断断续续的测试了solr的搭建和配置的工作一周,这个企业级的全站搜索工具,应该说是专业搜索引擎的有益补充,之所以存在这样的工具,可 能会是,再好的搜索引擎都很难对一个站点的所有有价值的内容进行及时有效全部的索引,并按一定的规则组织和呈现给调用者。

1. 初尝试lucene,lucene在apache的站点可以下载到: http://lucene.apache.org/

下载到一个最近的包之后,解压,里面带着的一个example,可以很容易的就开展起来,尤其负责lucene的index和search服务,通过 Indexer 和 Searcher 两个对象,可以在命令下实现建立索引和查询,其余接口也都较为丰富,由于接下来会重点说一下基于 lucene的 solr的配置,所以底层的lucene怎么来配选,简略一下。

lucene有较为丰富的文档,可以在线翻阅,同时lucene的贡献者在搜索领域有几项专利,也是这方面的专家,相信其所设计的一些底层应该没有问题。

2. solr的安装与配置

2.1 现有平台的环境
openSuSE Linux 10, DELL PE 2950的机器,上面部署了 Apache+Resin+MySQL 的应用。

针对Solr往现有平台的迁入,主要动了如下几个地方:

2.2.1
下载安装包:http://www.apache.org/dyn/closer.cgi/lucene/solr/
到一个叫做 /opt/src/ (没有的话,先 mkdir -p /opt/src 一个)下面

shell> cd /opt/src
shell> wget "http://apache.mirror.phpchina.com/lucene/solr/1.3.0/apache-solr-1.3.0.tgz"
shell> tar xzvf apache-solr-1.3.0.tgz
shell> cd /opt/src/apache-solr-1.3.0

这样就解开了压缩包并备用状态, 里面有个jetty的 WEB Server, 与solr结合的较好,可以马上就开始。 下面是从 apache solr wiki的站点cp的 get started内容,贴在这里备查参考(http://lucene.apache.org/solr /tutorial.html#Getting+Started):

Overview
This document covers the basics of running Solr using an example schema, and some sample data.

Requirements
To follow along with this tutorial, you will need...

1.Java 1.5 or greater. Some places you can get it are from Sun, IBM, or BEA.
Running java -version at the command line should indicate a version number starting with 1.5.
2.A Solr release.
3.FireFox or Mozilla is the preferred browser to view the admin pages, as the current stylesheet doesn't look good on Internet Explorer.
Getting Started
Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.

Begin by unziping the Solr release and changing your working directory to be the "example" directory. (Note that the base directory name may vary with the version of Solr downloaded.)

chrish@asimov:~solr$ ls
solr-nightly.zip
chrish@asimov:~solr$ unzip -q solr-nightly.zip
chrish@asimov:~solr$ cd solr-nightly/example/
Solr can run in any Java Servlet Container of your choice, but to simplify this tutorial, the example index includes a small installation of Jetty. In order to compile JSPs, this version of Jetty requires that you run "java" from a JDK, not from a JRE.

To launch Jetty with the Solr WAR, and the example configs, just run the start.jar ...

chrish@asimov:~/solr/example$ java -jar start.jar
1 [main] INFO org.mortbay.log - Logging to org.slf4j.impl.SimpleLogger@1f436f5 via org.mortbay.log.Slf4jLog
334 [main] INFO org.mortbay.log - Extract jar:file:/home/chrish/solr/example/webapps/solr.war!/ to /tmp/Jetty__solr/webapp
Feb 24, 2006 5:54:52 PM org.apache.solr.servlet.SolrServlet init
INFO: user.dir=/home/chrish/solr/example
Feb 24, 2006 5:54:52 PM org.apache.solr.core.SolrConfig <clinit>
INFO: Loaded Config solrconfig.xml

...

1656 [main] INFO org.mortbay.log - Started SelectChannelConnector @ 0.0.0.0:8983
This will start up the Jetty application server on port 8983, and use your terminal to display the logging information from Solr.

You can see that the Solr is running by loading http://localhost:8983/solr/admin/ in your web browser. This is the main starting point for Administering Solr.

Indexing Data
Your Solr server is up and running, but it doesn't contain any data. You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.

The exampledocs directory contains samples of the types of instructions Solr expects, as well as a java utility for posting them from the command line (a post.sh shell script is also available, but for this tutorial we'll use the cross-platform Java client).

To try this, open a new terminal window, enter the exampledocs directory, and run "java -jar post.jar" on some of the XML files in that directory, indicating the URL of the Solr server:

chrish@asimov:~/solr/example/exampledocs$ java -jar post.jar solr.xml monitor.xml
SimplePostTool: version 1.2 ..
posted @ 2009-10-10 18:57  searchDM  阅读(934)  评论(0编辑  收藏  举报