Fork me on GitHub
当Erlang遇到Solr

当Erlang遇到Solr

  Joe Armstrong的访谈中有一段关于"打开黑盒子"的阐述,给我留下很深的印象:Joe Armstrong在做XWindows开发时没有使用对应的类库,而是在了解XWindows底层实现后选择了直接和套接字通信,"把这20条消息映射到Erlang术语上,变个小魔术,然后可以向窗口直接发送消息,它们就开始执行动作了". [访谈全文] 回到今天的任务:Erlang使用Solr服务?当问题落实到数据通信协议的时候,就豁然开朗了,转换为我们熟悉的技术方案组合.先看下Solr的简介:
 

  Solr 

 
   Solr (pronounced "solar") is an open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable. Solr is the most popular enterprise search engine. Solr 4 adds NoSQL features.    Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.
 
 
  搭建全文搜索服务Solr的确是一个不错的选择,分分钟就可以搭建起来Solr的环境,配置好IK什么的,那Erlang应用如何使用Solr服务呢?从上面维基百科的介绍中,我们可以捕捉到一些信息:REST-full API,XML,JSON,HTTP.看到这里已经全是我们熟悉的技术方案了,我们深入进去看下:
 

esolr

 
     2008年ppolv (Pablo Polvorin)在trapexit.org提交了一个Solr的功能模块,[地址: http://forum.trapexit.org/viewtopic.php?t=13059 ],完成了操作Solr的基本功能:

     |> Add/Update documents esolr:add/2
     |> Delete documents esolr:delete/2
     |> Search esolr:search/3
 
先看看怎么使用这些上面的接口:
 
复制代码
%% 测试代码

-module(t).

-compile(export_all).


start()->
   SearchUrl="http://192.168.0.160:8080/solr/hear_me/select",
   UpdateUrl="http://192.168.0.160:8080/solr/hear_me/update",
   MltUrl="http://192.168.0.160:8080/solr/hear_me/mlt",
   {ok,Pid}=esolr:start([{select_url, SearchUrl}, {update_url, UpdateUrl}, {morelikethis_url, MltUrl}]),
   Pid.

search(SolrPid)->
  esolr:search("10",[{fields,"*,*"}],SolrPid).


add(SolrPid) ->
   esolr:add([{doc,[{id,"ai234"}, {text,<<"Look me mom!, I'm searching now">>}]}],SolrPid),
   esolr:add([{doc,[{id,"a3456"}, {text,<<"Look me mom!, I'm searching now">>}]}],SolrPid),
   esolr:commit(SolrPid).
复制代码

测试结果如下:

复制代码
Eshell V5.9  (abort with ^G)
1> P=t:start().
<0.34.0>
2> t:add(P).
ok
3> esolr:search("searching",[{fields,"*,*"}],P).
{ok,[{"numFound",2},{"start",0}],
    [{doc,[{"id",<<"ai234">>},
           {"_version_",1440978100186775552}]},
     {doc,[{"id",<<"a3456">>},
           {"_version_",1440978100212989952}]}],
    []}
4> t:search(P).
{ok,[{"numFound",9},{"start",0}],
    [{doc,[{"c_type",1},
           {"c_tags",
            [<<"女人">>,
             <<230,148,190,229,188,131>>,
             <<"家庭">>,
             <<229,165,179,229,143,139>>,
             <<229,165,179,229,173,169,229,173,144>>,
             <<229,176,143,229,173,169,229,173,144>>,
             <<231,166,187,229,169,154>>,
             <<229,135,186,230,137,139>>,
             <<229,133,132,229,188,159>>]},
           {"c_pub_date",<<"2013-07-12T16:29:11.593Z">>},
           {"id",<<"97">>},
           {"_version_",1440342611812417536}]},
     {doc,[{"c_type",1},
           {"c_tags",
            [<<231,189,145,231,187,156>>,
             <<229,165,179,229,143,139>>,
             <<228,187,139,231,187,141>>,
             <<233,171,152,228,184,173>>,
             <<229,144,140,229,173,166>>,
             <<230,156,139,229,143,139>>,
             <<229,140,151,228,186,172>>,
 ..... ...
复制代码

 

代码实现 

  翻开代码,下面这个方法包含了大部分技术要点:
 
复制代码
make_post_request(Request,PendingInfo,
State=#esolr{update_url=URL,pending=P,auto_commit=AC,dirty=Dirty},
Timeout) ->
     {ok,RequestId} = httpc:request(post,{URL,[{"connection", "close"}],"text/xml",Request},[{timeout,Timeout}],[{sync,false}]),
     Pendings = gb_trees:insert(RequestId,PendingInfo,P),
     if
          (AC == always) and Dirty -> 
                      CommitRequest = encode_commit(),
                      {ok,C_RequestId} = httpc:request(post,{URL,[{"connection", "close"}],"text/xml",CommitRequest},
                                                    [{timeout,State#esolr.commit_timeout}],[{sync,false}]),
                      Pendings2 = gb_trees:insert(C_RequestId,{auto,auto_commit},Pendings),
                      error_logger:info_report([{auto_commit,send}]),
                        {noreply,State#esolr{pending=Pendings2,dirty=false}};
         
          true -> {noreply,State#esolr{pending=Pendings}}
     end.
复制代码
 
首先在init阶段开启了inets:start(),make_post_request发起HTTP请求靠的是httpc,每一次请求之后都会把RequestId和请求发起者({From,_}里面的From)对应关系存储到gb_tree.在后面的handle_info代码段,可以看到对HTTPResponse的消息的接收.
 
复制代码
% @hidden
handle_info({http,{RequestId,HttpResponse}},State = #esolr{pending=P}) ->
     case gb_trees:lookup(RequestId,P) of
          {value,{Client,RequestOp}} -> handle_http_response(HttpResponse,RequestOp,Client),
                              {noreply,State#esolr{pending=gb_trees:delete(RequestId,P)}};
          none -> {noreply,State}
                    %% the requestid isn't here, probably the request was deleted after a timeout
     end;

parse_search_response(Response,Client) ->
     {value,{"response",{obj,SearchRespFields}},RestResponse} = lists:keytake("response",1, Response),
     {value,{"docs",Docs},RespFields} =  lists:keytake("docs",1,SearchRespFields),
     gen_server:reply(Client,{ok,RespFields,[{doc,DocFields} || {obj,DocFields}<-Docs],RestResponse}).
复制代码
在parse_search_response方法里面gen_server:reply调用最终完成了对请求的应答.
 
XML & Json
 
既然要处理XML,当然要用到xmerl模块了,encode_*系列模块基本上都是用它完成数据的encode,比如:
 
Eshell V5.10.2  (abort with ^G)
1> xmerl:export_simple([{commit,[]}],xmerl_xml).
["<?xml version=\"1.0\"?>",[["<","commit","/>"]]]
2>

 HTTPResponse解析还会用到xmerl_scan,xmerl_xpath

 
复制代码
handle_http_response({{_HttpV,200,_Reason},_Headers,Data},Op,Client) ->
     {Response,[]} = xmerl_scan:string(binary_to_list(Data)),
     [Header] = xmerl_xpath:string("/response/lst[@name='responseHeader']",Response),
     case parse_xml_response_header(Header) of
          {ok,QTime} ->  parse_xml_response(Op,Response,QTime,Client);
          {error,Error} ->  response_error(Op,Client,Error)
     end;
复制代码

 除了XML之外,还要解析JSON,这里使用的是RFC4627.

 

扩展

 
这个简单的功能模块,呃,是不是太简陋了?而且你会发现代码太老了?这段代码后续被修改应用在了Zotontic项目实现搜索功能,之前盘点Erlang Web Server和Web Framework的时候提到过这个CMS系统 [地址:https://github.com/arjan/mod_search_solr] 这个项目里面把原有代码做了重构,并增加了很多实用的接口比如翻页 "MoreLikeThis"功能封装.可以在Github上获取代码试一下,Zotontic的代码略显庞大,只取必需的模块编译即可;
 
 
ok,今天就到这里.
 
 
最后小图一张 Miss Nine
 
 
 

 

 

坚强2002和你一起回头再说... gmail 

guess         read my mind!

 
分类: Erlang
标签: erlangsolr
posted on 2013-07-23 23:21  HackerVirus  阅读(267)  评论(0)    收藏  举报