刘超觉先

2010年2月12日

Notes for Advanced Linux Programming - 3. Processes

摘要： 3. Processes Each process is identified by its unique process ID Every process has a parent process. Processes are arranged in a tree, with the init process at its root A program can obtain the process ID with getpid() and can obtain the process ID of its parent process with the getppid(). #incl... 阅读全文

posted @ 2010-02-12 10:48 刘超觉先阅读(911) 评论(0) 推荐(0) 编辑

2010年2月11日

Notes for Advanced Linux Programming - 1. Getting Started

摘要： 1. Getting Started 1.1. Compiling with GCC 1.1.1. Create the source code files (main.c) C source file—main.c #include <stdio.h> #include “reciprocal.hpp” int main (int argc, char **argv) { int i; i = atoi (argv[1]); printf (“The reciprocal of %d is %g\n”, i, reciprocal (i)); return 0; } (rec.. 阅读全文

posted @ 2010-02-11 11:52 刘超觉先阅读(973) 评论(0) 推荐(0) 编辑

Notes for Advanced Linux Programming - 2. Writing Good GNU/Linux Software

摘要： 2. Writing Good GNU/Linux Software 2.1. Interaction With the Execution Environment 2.1.1. Command Line When a program is invoked from the shell, the argument list contains the entire both the name of the program and any command-line arguments provided. % ls -s / The argument list has three element.. 阅读全文

posted @ 2010-02-11 11:52 刘超觉先阅读(715) 评论(0) 推荐(0) 编辑

2010年2月8日

有关Lucene的问题(4):影响Lucene对文档打分的四种方式

摘要：在索引阶段设置Document Boost和Field Boost，存储在(.nrm)文件中。如果希望某些文档和某些域比其他的域更重要，如果此文档和此域包含所要查询的词则应该得分较高，则可以在索引阶段设定文档的boost和域的boost值。这些值是在索引阶段就写入索引文件的，存储在标准化因子(.nrm)文件中，一旦设定，除非删除此文档，否则无法改变。如果不进行设定，则Document Boost和Field Boost默认为1。 Document Boost及FieldBoost的设定方式如下： Document doc = new Document(); Field f = n... 阅读全文

posted @ 2010-02-08 23:44 刘超觉先阅读(5422) 评论(2) 推荐(0) 编辑

2010年2月6日

有关Lucene的问题(3): 向量空间模型与Lucene的打分机制

摘要：问题：在你的文章中提到了：于是我们把所有此文档中词(term)的权重(term weight) 看作一个向量。 Document = {term1, term2, …… ,term N} Document Vector = {weight1, weight2, …… ,weight N} 同样我们把查询语句看作一个简单的文档，也用向量来表示。 Query = {term1, term 2, …… , term N} Query Vector = {weight1, weight2, …… , weight N} 于是我们把所有此文档中词(term)的权重(term weight... 阅读全文

posted @ 2010-02-06 13:05 刘超觉先阅读(5214) 评论(0) 推荐(1) 编辑

有关Lucene的问题(2):stemming和lemmatization

摘要：问题：我试验了一下文章中提到的 stemming 和 lemmatization 将单词缩减为词根形式，如“cars”到“car”等。这种操作称为：stemming。将单词转变为词根形式，如“drove”到“drive”等。这种操作称为：lemmatization。试验没有成功代码如下： public class TestNorms { public void createIndex() throws IOException { Directory d = new SimpleFSDirectory(new File("d:/falconTest/lucene3/... 阅读全文

posted @ 2010-02-06 13:04 刘超觉先阅读(6053) 评论(1) 推荐(0) 编辑

2010年2月3日

算法之一：老掉牙的问题

摘要：搜索有以下几种算法：枚举算法：也即列举问题的所有状态从而寻找符合问题的解的方法。适合用于状态较少，比较简单的问题上。广度优先搜索：从初始点开始，根据规则展开第一层节点，并检查目标节点是否在这些节点上，若没有，再将所有的第一层的节点逐一展开，得到第二层节点，如没有，则扩展下去，直到发现目标节点为止。比较适合求最少步骤或最短解序列的题目。一般设置一个队列queue，将起始节点放入队列中，然后从队列头取出一个节点，检查是否是目标节点，如不是则进行扩展，将扩展出的所有节点放到队尾，然后再从队列头取出一个节点，直至找到目标节点。深度优先搜索：一般设置一个栈sta... 阅读全文

posted @ 2010-02-03 00:31 刘超觉先阅读(3086) 评论(0) 推荐(1) 编辑

2010年2月2日

Lucene学习总结之四：Lucene索引过程分析(4)

摘要： 6、关闭IndexWriter对象代码： writer.close(); --> IndexWriter.closeInternal(boolean) --> (1) 将索引信息由内存写入磁盘: flush(waitForMerges, true, true); --> (2) 进行段合并: mergeScheduler.merge(this); 对段的合并将在后面的章节进行讨论，此处仅仅讨论将索引信息由写入磁盘的过程。代码： IndexWriter.flush(boolean triggerMerge, boolean flushDocStores, boole... 阅读全文

posted @ 2010-02-02 02:02 刘超觉先阅读(6270) 评论(5) 推荐(3) 编辑

Lucene学习总结之四：Lucene索引过程分析(3)

摘要： 5、DocumentsWriter对CharBlockPool，ByteBlockPool，IntBlockPool的缓存管理在索引的过程中，DocumentsWriter将词信息(term)存储在CharBlockPool中，将文档号(doc ID)，词频(freq)和位置(prox)信息存储在ByteBlockPool中。在ByteBlockPool中，缓存是分块(slice)分配的，块(slice)是分层次的，层次越高，此层的块越大，每一层的块大小事相同的。 nextLevelArray表示的是当前层的下一层是第几层，可见第9层的下一层还是第9层，也就是说最高有9层。 le... 阅读全文

posted @ 2010-02-02 02:01 刘超觉先阅读(6476) 评论(1) 推荐(2) 编辑

Lucene学习总结之四：Lucene索引过程分析(2)

摘要： 3、将文档加入IndexWriter 代码： writer.addDocument(doc); -->IndexWriter.addDocument(Document doc, Analyzer analyzer) -->doFlush = docWriter.addDocument(doc, analyzer); --> DocumentsWriter.updateDocument(Document, Analyzer, Term) 注：--> 代表一级函数调用 IndexWriter继而调用DocumentsWriter.addDocument，其又调用Docume 阅读全文

posted @ 2010-02-02 01:59 刘超觉先阅读(10869) 评论(1) 推荐(2) 编辑

公告