给你的网站加上站内搜索---Spring+Hibernate基于Compass(基于Lucene)实现

给你的网站加上站内搜索---Compass入门教程

syxChina(syxchina.cnblogs.com)

 

 

Compass(基于Lucene)入门教程

1 序言

2 Compass介绍

3 单独使用Compass

4 spring+hibernate继承compass

4-1 jar包

4-2 配置文件

4-3 源代码

4-4 说明

4-5 测试

5 总结下吧

1 序言

这些天一直在学点新的东西,想给毕业设计添加点含量,长时间的SSH项目也想尝试下新的东西和完善以前的技术,搜索毋容置疑是很重要的。作为javaer,作为apache的顶级开源项目lucene应该有所耳闻吧,刚学完lucene,知道了基本使用,学的程度应该到可以使用的地步,但不的不说lucene官方给的文档例子不是很给力的,还好互联网上资料比较丰富!在搜索lucene的过程中,知道了基于lucenecompasslucene-nutchlucene可以对给定内容加上索引搜索,但比如搜索本地数据库和web网页,你需要把数据给拿出来索引再搜索,所以你就想可不可以直接搜索数据库,以数据库内容作为索引,并且伴随着数据库的CRUD,索引也会更新,compass出现了,compass作为站内搜索那是相当的方便的,并且官方提供了springhibernate的支持,更是方便了。Lucene-nutch是基于lucene搜索web页面的,如果有必要我在分享下lucenelecene-nutch的学习经验,快速入门,其他的可以交给文档和谷歌了。

不得不提下,compass09年貌似就不更新了,网上说只支持lucene3.0以下版本,蛮好的项目不知道为什么不更新了,试了下3.0以后的分词器是不能使用了,我中文使用JE-Analyzer.jar。我使用的环境:

Spring3.1.0+Hibernate3.6.6+Compass2.2.0。

2 Compass介绍

Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括

* 搜索引擎抽象层(使用Lucene搜索引荐),

* OSEM (Object/Search Engine Mapping) 支持,

* 事务管理,

* 类似于Google的简单关键字查询语言,

* 可扩展与模块化的框架,

* 简单的API.

官方网站:谷歌

3 单独使用Compass

Compass可以不继承到hibernatespring中的,这个是从网上摘录的,直接上代码:

wps_clip_image-6849wps_clip_image-20611wps_clip_image-27320

 

@Searchable

public class Book {

private String id;//编号

private String title;//标题

private String author;//作者

private float price;//价格

public Book() {

}

public Book(String id, String title, String author, float price) {

super();

this.id = id;

this.title = title;

this.author = author;

this.price = price;

}

@SearchableId

public String getId() {

return id;

}

@SearchableProperty(boost = 2.0F, index = Index.TOKENIZED, store = Store.YES)

public String getTitle() {

return title;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public String getAuthor() {

return author;

}

@SearchableProperty(index = Index.NO, store = Store.YES)

public float getPrice() {

return price;

}

public void setId(String id) {

this.id = id;

}

public void setTitle(String title) {

this.title = title;

}

public void setAuthor(String author) {

this.author = author;

}

public void setPrice(float price) {

this.price = price;

}

@Override

public String toString() {

return "[" + id + "] " + title + " - " + author + " $ " + price;

}

}

public class Searcher {

protected Compass compass;

public Searcher() {

}

public Searcher(String path) {

compass = new CompassAnnotationsConfiguration()//

.setConnection(path).addClass(Book.class)//

.setSetting("compass.engine.highlighter.default.formatter.simple.pre", "<font color='red'>")//

.setSetting("compass.engine.highlighter.default.formatter.simple.post", "</font>")//

.buildCompass();//

Runtime.getRuntime().addShutdownHook(new Thread() {

public void run() {

compass.close();

}

});

 

}

/**

* 新建索引

* @param book

*/

public void index(Book book) {

CompassSession session = null;

CompassTransaction tx = null;

try {

session = compass.openSession();

tx = session.beginTransaction();

session.create(book);

tx.commit();

} catch (RuntimeException e) {

if (tx != null)

tx.rollback();

throw e;

} finally {

if (session != null) {

session.close();

}

}

}

/**

* 删除索引

* @param book

*/

public void unIndex(Book book) {

CompassSession session = null;

CompassTransaction tx = null;

try {

session = compass.openSession();

tx = session.beginTransaction();

session.delete(book);

tx.commit();

} catch (RuntimeException e) {

tx.rollback();

throw e;

} finally {

if (session != null) {

session.close();

}

}

}

 

/**

* 重建索引

* @param book

*/

public void reIndex(Book book) {

unIndex(book);

index(book);

}

 

/**

* 搜索

* @param queryString

* @return

*/

public List<Book> search(String queryString) {

CompassSession session = null;

CompassTransaction tx = null;

try {

session = compass.openSession();

tx = session.beginTransaction();

CompassHits hits = session.find(queryString);

int n = hits.length();

if (0 == n) {

return Collections.emptyList();

}

List<Book> books = new ArrayList<Book>();

for (int i = 0; i < n; i++) {

books.add((Book) hits.data(i));

}

hits.close();

tx.commit();

return books;

} catch (RuntimeException e) {

tx.rollback();

throw e;

} finally {

if (session != null) {

session.close();

}

}

}

public class Main {

static List<Book> db = new ArrayList<Book>();

static Searcher searcher = new Searcher("index");

 

public static void main(String[] args) {

add(new Book(UUID.randomUUID().toString(), "Thinking in Java", "Bruce", 109.0f));

add(new Book(UUID.randomUUID().toString(), "Effective Java", "Joshua", 12.4f));

add(new Book(UUID.randomUUID().toString(), "Java Thread Programing", "Paul", 25.8f));

long begin = System.currentTimeMillis();

int count = 30;

for(int i=1; i<count; i++) {

if(i%10 == 0) {

long end = System.currentTimeMillis();

System.err.println(String.format("当时[%d]条,剩[%d]条,已用时间[%ds],估计时间[%ds].", i,count-i,(end-begin)/1000, (int)((count-i)*((end-begin)/(i*1000.0))) ));

}

String uuid = new Date().toString();

add(new Book(uuid, uuid.substring(0, uuid.length()/2), uuid.substring(uuid.length()/2), (float)Math.random()*100));

}

int n;

do {

n = displaySelection();

switch (n) {

case 1:

listBooks();

break;

case 2:

addBook();

break;

case 3:

deleteBook();

break;

case 4:

searchBook();

break;

case 5:

return;

}

} while (n != 0);

}

 

static int displaySelection() {

System.out.println("\n==select==");

System.out.println("1. List all books");

System.out.println("2. Add book");

System.out.println("3. Delete book");

System.out.println("4. Search book");

System.out.println("5. Exit");

int n = readKey();

if (n >= 1 && n <= 5)

return n;

return 0;

}

 

/**

* 增加一本书到数据库和索引中

*

* @param book

*/

private static void add(Book book) {

db.add(book);

searcher.index(book);

}

 

/**

* 打印出数据库中的所有书籍列表

*/

public static void listBooks() {

System.out.println("==Database==");

int n = 1;

for (Book book : db) {

System.out.println(n + ")" + book);

n++;

}

}

 

/**

* 根据用户录入,增加一本书到数据库和索引中

*/

public static void addBook() {

String title = readLine(" Title: ");

String author = readLine(" Author: ");

String price = readLine(" Price: ");

Book book = new Book(UUID.randomUUID().toString(), title, author, Float.valueOf(price));

add(book);

}

 

/**

* 删除一本书,同时删除数据库,索引库中的

*/

public static void deleteBook() {

listBooks();

System.out.println("Book index: ");

int n = readKey();

Book book = db.remove(n - 1);

searcher.unIndex(book);

}

 

/**

* 根据输入的关键字搜索书籍

*/

public static void searchBook() {

String queryString = readLine(" Enter keyword: ");

List<Book> books = searcher.search(queryString);

System.out.println(" ====search results:" + books.size() + "====");

for (Book book : books) {

System.out.println(book);

}

}

 

public static int readKey() {

BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

try {

int n = reader.read();

n = Integer.parseInt(Character.toString((char) n));

return n;

} catch (Exception e) {

throw new RuntimeException();

}

}

 

public static String readLine(String propt) {

System.out.println(propt);

BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

try {

return reader.readLine();

} catch (Exception e) {

throw new RuntimeException();

}

}

}

wps_clip_image-5530

这种方法向数据库插入数据和加索引速度很慢,下面方法可以提高,注意这上面没设置分词器,所以使用默认的,如果是中文的话会分隔为一个一个的。

4 spring+hibernate继承compass

4-1 jar包

wps_clip_image-25759wps_clip_image-5116wps_clip_image-20051wps_clip_image-11513wps_clip_image-29500wps_clip_image-6212

4-2 配置文件

wps_clip_image-9831

Beans.xml

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"

xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx"

xsi:schemaLocation="http://www.springframework.org/schema/beans

         http://www.springframework.org/schema/beans/spring-beans-3.0.xsd

         http://www.springframework.org/schema/context

         http://www.springframework.org/schema/context/spring-context-3.0.xsd

         http://www.springframework.org/schema/tx

      http://www.springframework.org/schema/tx/spring-tx-3.0.xsd

         http://www.springframework.org/schema/aop

         http://www.springframework.org/schema/aop/spring-aop-3.0.xsd">

<context:annotation-config />

<context:component-scan base-package="com.syx.compass"></context:component-scan>

<aop:aspectj-autoproxy></aop:aspectj-autoproxy>

 

<import resource="hibernate-beans.xml"/>

<import resource="compass-beans.xml"/>

</beans>

compass-beans.xml

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="...">

 

<!--compass主配置 -->

<bean id="compass" class="org.compass.spring.LocalCompassBean">

<property name="compassSettings">

<props>

<prop key="compass.engine.connection">file://compass</prop><!-- 数据索引存储位置 -->

<prop key="compass.transaction.factory">

org.compass.spring.transaction.SpringSyncTransactionFactory</prop>

<prop key="compass.engine.analyzer.default.type">

jeasy.analysis.MMAnalyzer</prop><!--定义分词器-->  

<prop key="compass.engine.highlighter.default.formatter.simple.pre">

<![CDATA[<font color="red"><b>]]></prop>

<prop key="compass.engine.highlighter.default.formatter.simple.post">

<![CDATA[</b></font>]]></prop>

</props>

</property>

<property name="transactionManager">

<ref bean="txManager" />

</property>

<property name="compassConfiguration"  ref="annotationConfiguration" /> 

<property name="classMappings"> 

            <list> 

                <value>com.syx.compass.test1.Article</value> 

            </list> 

        </property> 

</bean>

<bean id="annotationConfiguration"

class="org.compass.annotations.config.CompassAnnotationsConfiguration">

</bean>

<bean id="compassTemplate" class="org.compass.core.CompassTemplate">

<property name="compass" ref="compass" />

</bean>

<!-- 同步更新索引, 数据库中的数据变化后同步更新索引 -->

<bean id="hibernateGps" class="org.compass.gps.impl.SingleCompassGps"

init-method="start" destroy-method="stop">

<property name="compass">

<ref bean="compass" />

</property>

<property name="gpsDevices">

<list>

<ref bean="hibernateGpsDevice"/>

</list>

</property>

</bean>

<!--hibernate驱动 链接compass和hibernate -->

<bean id="hibernateGpsDevice"

class="org.compass.spring.device.hibernate.dep.SpringHibernate3GpsDevice">

<property name="name">

<value>hibernateDevice</value>

</property>

<property name="sessionFactory">

<ref bean="sessionFactory" />

</property>

<property name="mirrorDataChanges"> 

            <value>true</value> 

        </property> 

</bean>

<!-- 定时重建索引(利用quartz)或随Spring ApplicationContext启动而重建索引 --> 

    <bean id="compassIndexBuilder" 

        class="com.syx.compass.test1.CompassIndexBuilder" 

        lazy-init="false"> 

        <property name="compassGps" ref="hibernateGps" /> 

        <property name="buildIndex" value="false" /> 

        <property name="lazyTime" value="1" /> 

    </bean> 

     <!-- 搜索引擎服务类 --> 

    <bean id="searchService"  class=" com.syx.compass.test1.SearchServiceBean"> 

        <property name="compassTemplate"> 

            <ref bean="compassTemplate" /> 

        </property> 

    </bean> 

</beans>

hibernate-beans.xml

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="...">

 

<!-- DataSource -->

<bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource">

<property name="driverClass" value="${jdbc.driverClassName}" />

<property name="jdbcUrl" value="${jdbc.url}" />

<property name="user" value="${jdbc.username}" />

<property name="password" value="${jdbc.password}" />

<property name="autoCommitOnClose" value="true" />

<property name="checkoutTimeout" value="${cpool.checkoutTimeout}" />

<property name="initialPoolSize" value="${cpool.minPoolSize}" />

<property name="minPoolSize" value="${cpool.minPoolSize}" />

<property name="maxPoolSize" value="${cpool.maxPoolSize}" />

<property name="maxIdleTime" value="${cpool.maxIdleTime}" />

<property name="acquireIncrement" value="${cpool.acquireIncrement}" />

<!-- <property name="maxIdleTimeExcessConnections" value="${cpool.maxIdleTimeExcessConnections}"/> -->

</bean>

<bean

class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

<property name="locations">

<value>classpath:jdbc.properties</value>

</property>

</bean>

<!-- SessionFacotory -->

<bean id="sessionFactory"

class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">

<property name="dataSource" ref="dataSource" />

<property name="annotatedClasses">

<list>

<value>com.syx.compass.model.Article</value>

<value>com.syx.compass.model.Author</value>

<value>com.syx.compass.test1.Article</value>

</list>

</property>

<property name="hibernateProperties">

<props>

<prop key="hibernate.dialect">org.hibernate.dialect.MySQLDialect</prop>

<prop key="hibernate.current_session_context_class">thread</prop>

<prop key="javax.persistence.validation.mode">none</prop>

<prop key="hibernate.show_sql">true</prop>

<prop key="hibernate.format_sql">false</prop>

<prop key="hibernate.hbm2ddl.auto">update</prop>

</props>

</property>

</bean>

<bean id="hibernateTemplate" class="org.springframework.orm.hibernate3.HibernateTemplate">

<property name="sessionFactory" ref="sessionFactory"></property>

</bean>

<bean id="txManager"

class="org.springframework.orm.hibernate3.HibernateTransactionManager">

<property name="sessionFactory" ref="sessionFactory" />

</bean>

</beans>

jdbc.properties

jdbc.driverClassName=com.mysql.jdbc.Driver

jdbc.hostname=localhost

jdbc.url=jdbc:mysql://localhost:3306/compass

jdbc.username=root

jdbc.password=root

cpool.checkoutTimeout=5000

cpool.minPoolSize=1

cpool.maxPoolSize=4

cpool.maxIdleTime=25200

cpool.maxIdleTimeExcessConnections=1800

cpool.acquireIncrement=5

log4j.properties

log4j.appender.stdout=org.apache.log4j.ConsoleAppender

log4j.appender.stdout.Target=System.out

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

log4j.rootLogger=error, stdout

4-3 源代码

wps_clip_image-5691

@Searchable(alias = "article")

@Entity(name="_article")

public class Article {

private Long ID; // 标识ID

private String content; // 正文

private String title; // 文章标题

private Date createTime; // 创建时间

public Article(){}

public Article(Long iD, String content, String title, Date createTime) {

ID = iD;

this.content = content;

this.title = title;

this.createTime = createTime;

}

public String toString() {

return String.format("%d,%s,%s,%s", ID, title, content, createTime.toString());

}

@SearchableId

@Id

@GeneratedValue

public Long getID() {

return ID;

}

public void setID(Long id) {

ID = id;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public String getContent() {

return content;

}

public void setContent(String content) {

this.content = content;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public String getTitle() {

return title;

}

public void setTitle(String title) {

this.title = title;

}

@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)

public Date getCreateTime() {

return createTime;

}

public void setCreateTime(Date createTime) {

this.createTime = createTime;

}

}

public class CompassIndexBuilder implements InitializingBean {   

   

    // 是否需要建立索引,可被设置为false使本Builder失效.    

    private boolean buildIndex = false;    

   

    // 索引操作线程延时启动的时间,单位为秒    

    private int lazyTime = 10;    

   

    // Compass封装    

    private CompassGps compassGps;    

   

    // 索引线程    

    private Thread indexThread = new Thread() {    

   

        @Override   

        public void run() {    

            try {    

                Thread.sleep(lazyTime * 1000);    

                System.out.println("begin compass index...");    

                long beginTime = System.currentTimeMillis();    

                // 重建索引.    

                // 如果compass实体中定义的索引文件已存在,索引过程中会建立临时索引,    

                // 索引完成后再进行覆盖.    

                compassGps.index();    

                long costTime = System.currentTimeMillis() - beginTime;    

                System.out.println("compss index finished.");    

                System.out.println("costed " + costTime + " milliseconds");    

            } catch (InterruptedException e) {    

                e.printStackTrace();    

            }    

        }    

    };    

   

    /**  

     * 实现<code>InitializingBean</code>接口,在完成注入后调用启动索引线程.

     */   

    public void afterPropertiesSet() throws Exception {    

        if (buildIndex) {    

            indexThread.setDaemon(true);    

            indexThread.setName("Compass Indexer");    

            indexThread.start();    

        }    

    }    

   

    public void setBuildIndex(boolean buildIndex) {    

        this.buildIndex = buildIndex;    

    }    

   

    public void setLazyTime(int lazyTime) {    

        this.lazyTime = lazyTime;    

    }    

   

    public void setCompassGps(CompassGps compassGps) {    

        this.compassGps = compassGps;    

    }    

}  

public class SearchServiceBean {

 

private CompassTemplate compassTemplate;

 

/** 索引查询 * */

public Map find(final String keywords, final String type, final int start, final int end) {

return compassTemplate.execute(new CompassCallback<Map>() {

 

public Map doInCompass(CompassSession session) throws CompassException {

List result = new ArrayList();

int totalSize = 0;

Map container = new HashMap();

CompassQuery query = session.queryBuilder().queryString(keywords).toQuery();

CompassHits hits = query.setAliases(type).hits();

totalSize = hits.length();

container.put("size", totalSize);

int max = 0;

if (end < hits.length()) {

max = end;

} else {

max = hits.length();

}

 

if (type.equals("article")) {

for (int i = start; i < max; i++) {

Article article = (Article) hits.data(i);

String title = hits.highlighter(i).fragment("title");

if (title != null) {

article.setTitle(title);

}

String content = hits.highlighter(i).setTextTokenizer(CompassHighlighter.TextTokenizer.AUTO).fragment("content");

if (content != null) {

 

article.setContent(content);

}

result.add(article);

}

}

container.put("result", result);

return container;

}

});

}

 

public CompassTemplate getCompassTemplate() {

return compassTemplate;

}

 

public void setCompassTemplate(CompassTemplate compassTemplate) {

this.compassTemplate = compassTemplate;

}

}

public class MainTest {

public static ClassPathXmlApplicationContext applicationContext;

private static HibernateTemplate hibernateTemplate;

@BeforeClass

public static void init() {

System.out.println("sprint init...");

applicationContext = new ClassPathXmlApplicationContext("beans.xml");

hibernateTemplate = applicationContext.getBean(HibernateTemplate.class);

System.out.println("sprint ok");

}

 

@Test

public void addData() {

System.out.println("addDate");

//把compass-beans.xml 中 bean id="compassIndexBuilder"

//buildIndex=true lazyTime=1

//会自动的根据数据库中的数据重新建立索引

try {

Thread.sleep(10000000);

} catch (InterruptedException e) {

e.printStackTrace();

}

}

@Test

public void search() {

String keyword = "全文搜索引擎";

SearchServiceBean ssb = applicationContext.getBean(SearchServiceBean.class);

Map map = ssb.find(keyword, "article", 0, 100);//第一次搜索加载词库

long begin = System.currentTimeMillis();

map = ssb.find(keyword, "article", 0, 100);//第二次才是搜索用时

long end = System.currentTimeMillis();

System.out.println(String.format(

"搜索:[%s],耗时(ms):%d,记录数:%d", keyword, end-begin, map.get("size")));

List<Article> list = (List<Article>) map.get("result");

for(Article article : list) {

System.out.println(article);

}

}

4-4 说明

compass-beans.xml中可以设置建立索引的目录和分词器,测试的时候我们使用数据库添加数据,启动的建立索引,测试速度。

4-5 测试

使用mysql,写了一个添加数据的函数:

DELIMITER $$

CREATE

    FUNCTION `compass`.`addDateSyx`(num int(8))

    RETURNS varchar(32)

    BEGIN

declare i int(8);

set i = 0;

while ( i < num) DO

insert into _article (title,content, createTime) values (i, num-i, now());

set i = i + 1;

end while;

return "OK";

    END$$

DELIMITER ;

4-5-1 10000条重复的中文数据测试

数据库函数的时候修改下insert

insert into _article (title,content, createTime) values ('用compass实现站内全文搜索引擎()', 'Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括

* 搜索引擎抽象层(使用Lucene搜索引荐),

* OSEM (Object/Search Engine Mapping) 支持,

* 事务管理,

* 类似于Google的简单关键字查询语言,

* 可扩展与模块化的框架,

* 简单的API.

 

如果你需要做站内搜索引擎,而且项目里用到了hibernate,那用compass是你的最佳选择。 ', now());

插入数据:

select addDateSyx1(10000);//hibernate 中的 hibernate.hbm2ddl.auto=update

wps_clip_image-569wps_clip_image-11587

建立索引:

wps_clip_image-15051

wps_clip_image-4911

wps_clip_image-16445

10000条,8045ms,速度还不错。

索引大小:

wps_clip_image-10964

搜索:

wps_clip_image-6267

的确分词了,如果使用默认的分词,中文会每个中文分一个,速度比较快,如果使用JE-Anaylzer 116ms也是可以接受的。

4-5-2 10w条重复的中文数据测试

插入数据:

wps_clip_image-32560

Mysql 10w大约12s左右。

建立索引:

wps_clip_image-21575

wps_clip_image-12492索引大小和我想象的差不多,就是时间比我像的长多了,但我不想在试了。

搜索:

wps_clip_image-24973

10w的是数据,243ms还是很不错的,看来只要索引建好,搜索还是很方便的。

5 总结下吧

Compass用起来还是挺顺手的,应该基本需求可以满足的,不知道蛮好的项目怎么就不更新了,不然hibernate search就不会有的。

因为compass的不更新,所以lucene3.0以后的特性就不能用了,蛮可以的,虽然compass可以自动建索引(当然也可以手动CRUD),但如果封装下lucene来完成compass应该可以得到比较好的实现,期待同学们出手了。

 

参考文章:

compass实现站内全文搜索引擎()

再谈compass:集成站内搜索

compass快速给你的网站添加搜索功能

ITEYE上一篇也不错,不小心页面关了...

 

 

posted @ 2011-12-29 22:54  BuildNewApp  阅读(5196)  评论(1编辑  收藏  举报