2018年11月19日

摘要: def frequncy(data,n): import numpy as np import jieba.posseg as pog text = '' for i in np.arange(n): text += str(data.ix[i, 'comment']) stop_property 阅读全文
posted @ 2018-11-19 17:29 happygril3 阅读(373) 评论(0) 推荐(0)
摘要: import gensimfrom gensim.models import word2vecimport loggingimport jiebaimport osimport numpy as npdef cut_txt(old_file): import jieba global cut_fil 阅读全文
posted @ 2018-11-19 11:48 happygril3 阅读(293) 评论(0) 推荐(0)
摘要: import jiebafrom jieba import analyseimport numpyimport gensimimport codecsimport pandas as pdimport jieba.posseg as pogfrom gensim.models import Word 阅读全文
posted @ 2018-11-19 11:36 happygril3 阅读(602) 评论(0) 推荐(0)
摘要: import mathfrom math import isnanimport pandas as pd#结巴分词,切开之后,有分隔符def jieba_function(sent): import jieba sent1 = jieba.cut(sent) s = [] for each in s 阅读全文
posted @ 2018-11-19 10:48 happygril3 阅读(498) 评论(0) 推荐(0)
摘要: #提取关键词#关键词向量化#相似度计算from jieba import analyseimport numpyimport gensim# 实现给出任意字符串,获取字符串中某字符的位置以及出现的总次数def get_char_pos(string,char): chPos=[] try: chPo 阅读全文
posted @ 2018-11-19 10:32 happygril3 阅读(4145) 评论(0) 推荐(0)
摘要: from gensim.models import Word2Vecfrom gensim.models.word2vec import LineSentence# 原始的训练语料转化成一个sentence的迭代器,每一次迭代返回的sentence是一个word(utf8格式)的列表def vcto 阅读全文
posted @ 2018-11-19 10:30 happygril3 阅读(356) 评论(0) 推荐(0)

导航