君子博学而日参省乎己 则知明而行无过矣

博客园 首页 新随笔 联系 订阅 管理

      这里我的Solr版本是Solr1.4.0,数据库是Sql Server2005.其他数据库可能有些不适用(请在其他数据库运行成功的同学也分享下),但根据这个思路应该都有自己的解决方案.
      DataImportHandler中从数据库导入数据进行索引主要通过JDBC进行处理.由于自己对JDBC的认识浅薄,一直认为JDBC是一次性将要查询的数据从数据库中数据读取过去.但没有想到其实从数据库获取数据其实也是以流的形式,可以一段段的获取.也就是可以在客户端每获取一条再从流中取新的一条数据如此取完为止(这里感谢高手提示).由于使用的数据库是Sql Server2005,本想通过它的sqljdbc.jar中获取些提示(尝试下源码没有成功),从Jar中大概发现Sql Server有这样的设置,于是上微软官网的MSDN获取到了答案(
 URL:http://msdn.microsoft.com/zh-cn/library/ms378663(SQL.90).aspx ):

      Sets the default cursor type that is used for all result sets that are created by using this SQLServerDataSource object.
        public void setSelectMethod(java.lang.String selectMethod)
        A String value that contains the default cursor type.
       The selectMethod is the default cursor type that is used for a result set. This property is useful when you are dealing with large result  sets and do not want to store the whole result set in memory on the client side. By setting the property to "cursor," you can create a  server-side cursor that can fetch smaller chunks of data at a time. If the selectMethod property is not set, getSelectMethod returns the  default value of "direct".



<dataSource name="dsSqlServer" type="JdbcDataSource" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" batchSize="3000" url="jdbc:sqlserver://; DatabaseName=testDatabase;responseBuffering=adaptive;selectMethod=cursor" user="sa" password="12345" />

That's not really true. DataImportHandler streams the result from database query and adding documents into index. So it shouldn't load all database data into memory. Disabling autoCommit, warming queries and spellcheckers usually decreases required amount of memory during
  indexing process.Please share your hardware details, jvm options, solrconfig and schema configuration, etc.




      You can _batch_ import your data using full import command by providing additional request parameter (see
 http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters ), i.e. 
  query="SELECT * FROM my_table ORDER BY id LIMIT 1000000 OFFSET ${dataimporter.request.offset}" 
  and then calling full-import command several times:
  1) /dataimport?clean=true&offset=0
  2) /dataimport?clean=false&offset=1000000
  3) /dataimport?clean=false&offset=2000000



<entity name="TestEntity" dataSource="dsSqlServer" pk="Id" query="SELECT Id,Title,Author,Content,Url,AddOn FROM Test WHERE Id>=${dataimporter.request.offset} And Id<=${dataimporter.request.offset}+10000" >

   1) /dataimport?clean=true&offset=0
  2) /dataimport?clean=false&offset=1000000
  3) /dataimport?clean=false&offset=2000000

posted on 2012-08-02 20:54  刺猬的温驯  阅读(367)  评论(0编辑  收藏  举报