蝉音

python调用NLPIR - ICTCLAS2013实现中文分词

环境:win7、VS2008、Python2.7.3

第一步:照着文档[2]将NLPIR库封装成Python的扩展;

第二步:新建一个名为“nlpir_demo”的目录,将第一步最后得到的名为“nlpirpy_ext”的文件夹拷贝到“.../nlpir_demo/”目录下;

第三步:在文档[2]尾部提供的“seg.py”基础上,在“.../nlpir_demo/nlpirpy_ext/”目录下,新建一个名为“C_NLPIR_ICTCLAS2013.py”的文件,内容如下,目的是将NLPIR进一步封装成一个Python类;

 1 #-*- encoding: utf-8 -*-
 2 import NLPIR
 3 import os
 4 
 5 class C_NLPIR_ICTCLAS2013:
 6     def __init__(self,s_code='GBK'):
 7         dataurl = os.path.join(os.path.dirname(__file__))
 8         isinit = 0
 9         if s_code == 'GBK':
10             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.GBK_CODE)
11         elif s_code == 'UTF-8':
12             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.UTF8_CODE)
13         elif s_code == 'BIG5':
14             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.BIG5_CODE)
15         elif s_code == 'GBK_FANTI':
16             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.GBK_FANTI_CODE)
17         if isinit:
18             print 'NLPIR 初始化成功'
19         else:
20             print 'NLPIR 初始化失败'
21 
22     def stringSeg(self, s_string, i_bPOStagged=0):
23         """
24         Function: Process one string;
25         Parameters: @s_string - The string to be analyed,
26                     @i_bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:0.
27         Return Value: the pointer of result buffer.
28         """
29         return NLPIR.NLPIR_ParagraphProcess(s_string, i_bPOStagged)
30 
31     def fileSeg(self,s_sourceFile,s_targetFile, i_bPOStagged=0):
32         """
33         Function: Process one text file and save the result into one file;
34         Parameters: @s_sourceFile -  The source file name to be analysized,
35                     @s_targetFile - The result file name to store the results.
36                     @i_bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:0.
37         Return Value: the processing speed if processing succeed. Otherwise return false.
38         """
39         return NLPIR.NLPIR_FileProcess(s_sourceFile, s_targetFile, i_bPOStagged)
40 
41     def importUserDict(self,s_userDictFile):
42         """
43         Functin: Import user-defined dictionary from a text file;
44         Parameters: @s_userDictFile - the filename saved user dictionary text;
45         Return Value: The number of lexical entry imported successfully
46         ???: What's the writting style of the userDicFile ?
47         """
48         return NLPIR.NLPIR_ImportUserDict(s_userDictFile)
49     
50     def addUserWord(self,s_word):
51                 '''
52                 Function: Add a word to the user dictionary;
53                 Parameters: @s_Word - the word added.
54                 Return Value: 1 if add succeed. Otherwise return 0.
55                 '''
56         return NLPIR.NLPIR_AddUserWord(s_word)
57         
58     def saveTheUserDict(self):
59                 '''
60                 Function: Save the user dictionary to disk.
61                 Parameters: none;
62                 Return Value:  1 if save succeed,otherwise return 0.
63                 ???: Where's the file_direction of "disk" ?
64                 '''
65         return NLPIR.NLPIR_SaveTheUsrDic()
66 
67     def delUserWord(self,s_word):
68                 '''
69                 Function: Delete a word from the  user dictionary;
70                 Parameters: @s_word - the word to be deleted;
71                 Return Value: -1 if the word not exist in the user dictionary, otherwise the handle of the word deleted.
72                 '''
73         return NLPIR.NLPIR_DelUsrWord(s_word)
74 
75     def exit(self):
76                 '''
77                 Return value: true if succeed, otherwise false.
78                 '''
79                 return NLPIR.NLPIR_Exit()
80 
81 if __name__ == '__main__':
82 
83     O_C_NLPIR_ICTCLAS2013 = C_NLPIR_ICTCLAS2013('UTF-8')
84     raw_input('\n~!')
View Code

 

第四步:在“.../nlpir_demo/”目录下,新建一个名为“NLPIR_demo.py”的文件,内容如下,试着调用“.../nlpir_demo/nlpirpy_ext/C_NLPIR_ICTCLAS2013.py”中定义的类C_NLPIR_ICTCLAS2013;

 1 #-*-encoding:utf-8-*-
 2 from nlpirpy_ext.C_NLPIR_ICTCLAS2013 import C_NLPIR_ICTCLAS2013
 3 
 4 if __name__ == '__main__':
 5 
 6     o_C_NLPIR_ICTCLAS2013 = C_NLPIR_ICTCLAS2013('UTF-8')
 7     raw_input('\n~!')
 8 
 9     s_test = '1989年春夏之交的政治风波1989年政治风波24小时降雪量24小时降雨量863计划ABC防护训练APEC会议BB机BP机C2系统C3I系统C3系统C4ISR系统C4I系统CCITT建议'
10     result = o_C_NLPIR_ICTCLAS2013.stringSeg(s_test)
11 
12     raw_input(result)
View Code

 

第五步:执行文件“.../nlpir_demo/NLPIR_demo.py”,即可~!

 

说明:关于文档[2]中提到的SWIG,可见文档[1]提供了另外两篇文档~!

 

 

参考文档:

[1]Python、Ruby中的SWIG使用案例, http://www.cnblogs.com/chanyin/p/3340780.html

[2]NLPIR(ICTCLAS2013) Python版, http://www.nilday.com/nlpirictclas2013-python%E7%89%88/

 

posted on 2013-09-26 20:25  蝉音  阅读(2043)  评论(0)    收藏  举报