python调用NLPIR - ICTCLAS2013实现中文分词
环境:win7、VS2008、Python2.7.3
第一步:照着文档[2]将NLPIR库封装成Python的扩展;
第二步:新建一个名为“nlpir_demo”的目录,将第一步最后得到的名为“nlpirpy_ext”的文件夹拷贝到“.../nlpir_demo/”目录下;
第三步:在文档[2]尾部提供的“seg.py”基础上,在“.../nlpir_demo/nlpirpy_ext/”目录下,新建一个名为“C_NLPIR_ICTCLAS2013.py”的文件,内容如下,目的是将NLPIR进一步封装成一个Python类;
1 #-*- encoding: utf-8 -*- 2 import NLPIR 3 import os 4 5 class C_NLPIR_ICTCLAS2013: 6 def __init__(self,s_code='GBK'): 7 dataurl = os.path.join(os.path.dirname(__file__)) 8 isinit = 0 9 if s_code == 'GBK': 10 isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.GBK_CODE) 11 elif s_code == 'UTF-8': 12 isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.UTF8_CODE) 13 elif s_code == 'BIG5': 14 isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.BIG5_CODE) 15 elif s_code == 'GBK_FANTI': 16 isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.GBK_FANTI_CODE) 17 if isinit: 18 print 'NLPIR 初始化成功' 19 else: 20 print 'NLPIR 初始化失败' 21 22 def stringSeg(self, s_string, i_bPOStagged=0): 23 """ 24 Function: Process one string; 25 Parameters: @s_string - The string to be analyed, 26 @i_bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:0. 27 Return Value: the pointer of result buffer. 28 """ 29 return NLPIR.NLPIR_ParagraphProcess(s_string, i_bPOStagged) 30 31 def fileSeg(self,s_sourceFile,s_targetFile, i_bPOStagged=0): 32 """ 33 Function: Process one text file and save the result into one file; 34 Parameters: @s_sourceFile - The source file name to be analysized, 35 @s_targetFile - The result file name to store the results. 36 @i_bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:0. 37 Return Value: the processing speed if processing succeed. Otherwise return false. 38 """ 39 return NLPIR.NLPIR_FileProcess(s_sourceFile, s_targetFile, i_bPOStagged) 40 41 def importUserDict(self,s_userDictFile): 42 """ 43 Functin: Import user-defined dictionary from a text file; 44 Parameters: @s_userDictFile - the filename saved user dictionary text; 45 Return Value: The number of lexical entry imported successfully 46 ???: What's the writting style of the userDicFile ? 47 """ 48 return NLPIR.NLPIR_ImportUserDict(s_userDictFile) 49 50 def addUserWord(self,s_word): 51 ''' 52 Function: Add a word to the user dictionary; 53 Parameters: @s_Word - the word added. 54 Return Value: 1 if add succeed. Otherwise return 0. 55 ''' 56 return NLPIR.NLPIR_AddUserWord(s_word) 57 58 def saveTheUserDict(self): 59 ''' 60 Function: Save the user dictionary to disk. 61 Parameters: none; 62 Return Value: 1 if save succeed,otherwise return 0. 63 ???: Where's the file_direction of "disk" ? 64 ''' 65 return NLPIR.NLPIR_SaveTheUsrDic() 66 67 def delUserWord(self,s_word): 68 ''' 69 Function: Delete a word from the user dictionary; 70 Parameters: @s_word - the word to be deleted; 71 Return Value: -1 if the word not exist in the user dictionary, otherwise the handle of the word deleted. 72 ''' 73 return NLPIR.NLPIR_DelUsrWord(s_word) 74 75 def exit(self): 76 ''' 77 Return value: true if succeed, otherwise false. 78 ''' 79 return NLPIR.NLPIR_Exit() 80 81 if __name__ == '__main__': 82 83 O_C_NLPIR_ICTCLAS2013 = C_NLPIR_ICTCLAS2013('UTF-8') 84 raw_input('\n~!')
第四步:在“.../nlpir_demo/”目录下,新建一个名为“NLPIR_demo.py”的文件,内容如下,试着调用“.../nlpir_demo/nlpirpy_ext/C_NLPIR_ICTCLAS2013.py”中定义的类C_NLPIR_ICTCLAS2013;
1 #-*-encoding:utf-8-*- 2 from nlpirpy_ext.C_NLPIR_ICTCLAS2013 import C_NLPIR_ICTCLAS2013 3 4 if __name__ == '__main__': 5 6 o_C_NLPIR_ICTCLAS2013 = C_NLPIR_ICTCLAS2013('UTF-8') 7 raw_input('\n~!') 8 9 s_test = '1989年春夏之交的政治风波1989年政治风波24小时降雪量24小时降雨量863计划ABC防护训练APEC会议BB机BP机C2系统C3I系统C3系统C4ISR系统C4I系统CCITT建议' 10 result = o_C_NLPIR_ICTCLAS2013.stringSeg(s_test) 11 12 raw_input(result)
第五步:执行文件“.../nlpir_demo/NLPIR_demo.py”,即可~!
说明:关于文档[2]中提到的SWIG,可见文档[1]提供了另外两篇文档~!
参考文档:
[1]Python、Ruby中的SWIG使用案例, http://www.cnblogs.com/chanyin/p/3340780.html
[2]NLPIR(ICTCLAS2013) Python版, http://www.nilday.com/nlpirictclas2013-python%E7%89%88/
浙公网安备 33010602011771号