全文检索主索引+增量索引方案

sphinx配置文件mkmfcm.conf如下:

#名可名非常名 主索引源
source mkmfcm_main
{
    type		= mysql
    sql_host		= localhost
    sql_user		= root
    sql_pass		= root
    sql_db		= mkmfcm
    sql_port		= 3306
    sql_query_pre	= SET NAMES utf8
    sql_query_pre	= REPLACE INTO sph_counter SELECT 1, MAX(forum_id) FROM forum
    sql_query		= SELECT forum_id as id, forum_title, forum_content FROM forum
}
#名可名非常名 增量索引源
source mkmfcm_delta : mkmfcm_main
{
    sql_query_pre	= SET NAMES utf8
    sql_query		= SELECT forum_id as id, forum_title, forum_content FROM forum \
				WHERE forum_id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}
 
#名可名非常名 主索引
index question_main
{
    source		= mkmfcm_main	#对应的source名称
    path		= /usr/local/coreseek/var/data/question_main
    docinfo		= extern
    mlock		= 0
    morphology		= none
    min_word_len        = 2
    html_strip		= 0
 
    charset_dictpath	= /usr/local/mmseg/etc/ #BSD、Linux环境下设置,/符号结尾
    charset_type	= zh_cn.utf-8
}
#名可名非常名 增量索引
index question_delta : question_main
{
    source		= mkmfcm_delta
    path		= /usr/local/coreseek/var/data/question_delta
}
 
#全局索引配置
indexer
{
    mem_limit            = 128M
}
 
#searchd服务定义
searchd
{
    listen		= 9312
    #listen		= 9306:mysql41
    read_timeout        = 5
    max_children        = 30
    max_matches		= 1000
    seamless_rotate	= 0
    preopen_indexes	= 0
    unlink_old		= 1
    pid_file		= /usr/local/coreseek/var/log/searchd_mysql.pid
    log			= /usr/local/coreseek/var/log/searchd_mysql.log
    query_log		= /usr/local/coreseek/var/log/query_mysql.log
    binlog_path		=	#关闭binlog日志
}
View Code

索引更新方案有两种:

第一次创建全部索引: ./indexer -c /var/www/coreseek/mkmfcm.conf --all
运行检索监听程序:./searchd -c /var/www/coreseek/mkmfcm.conf
方案一:
    每天凌晨3点合并索引:./indexer -c /var/www/coreseek/oseye.conf --merge question_main question_delta --rotate
    每3分钟更新增量索引:./indexer -c /var/www/coreseek/oseye.conf question_delta --rotate
    优缺点,合并非常快,但没办法更新索引最大值得,造成重复。
方案二:
    每天凌晨3点重建主索引:./indexer -c /var/www/coreseek/oseye.conf question_main --rotate
    每3分钟更新增量索引:./indexer -c /var/www/coreseek/oseye.conf question_delta --rotate
    优缺点,速度慢,且有重复数据。

目前先采用方案二,继续探寻其他方式,编辑/etc/crontab增加shell如下:

* */3	* * *	root	cd /usr/local/coreseek/bin && ./indexer -c /var/www/coreseek/mkmfcm.conf question_main --rotate
*/3 *	* * *	root	cd /usr/local/coreseek/bin && ./indexer -c /var/www/coreseek/mkmfcm.conf question_delta --rotate

重启cron:

/etc/init.d/cron restart
posted @ 2012-05-28 21:52  码农神说  阅读(228)  评论(0编辑  收藏  举报