jieba—parallel

jieba 并行处理进行测试，注意：并行分词仅支持默认分词器 jieba.dt 和 jieba.posseg.dt

import sys
import time
import jieba

jieba.enable_parallel()

#url = sys.argv[1]
content = open("/ssd/ailab-dataset/THUCNewsSubset/cnews.train.txt","rb").read()
t1 = time.time()
words = "/ ".join(jieba.cut(content))

t2 = time.time()
tm_cost = t2-t1

log_f = open("1.log","wb")
log_f.write(words.encode('utf-8'))

print('speed %s bytes/second' % (len(content)/tm_cost))

测试结果：

#把jieba.enable_parallel()注释掉了
[root@n6 jieba-parallel-test]# python test.py      
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.289 seconds.
Prefix dict has been built succesfully.
speed 259919.622884 bytes/second

#加上了jieba.enable_parallel()
[root@n6 jieba-parallel-test]# vi test.py
[root@n6 jieba-parallel-test]# vi test.py
[root@n6 jieba-parallel-test]# python test.py
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.263 seconds.
Prefix dict has been built succesfully.
speed 2215307.40079 bytes/second

加了并行，快很多哟！！！

posted on 2018-09-12 11:30 笨拙的忍者阅读(696) 评论(0) 收藏举报