随笔档案「2018年11月19日」：词频统计 ... - happygril3

词频统计

摘要： def frequncy(data,n): import numpy as np import jieba.posseg as pog text = '' for i in np.arange(n): text += str(data.ix[i, 'comment']) stop_property 阅读全文

posted @ 2018-11-19 17:29 happygril3 阅读(374) 评论(0) 推荐(0)

wordvec_词的相似度

摘要： import gensimfrom gensim.models import word2vecimport loggingimport jiebaimport osimport numpy as npdef cut_txt(old_file): import jieba global cut_fil 阅读全文

posted @ 2018-11-19 11:48 happygril3 阅读(296) 评论(0) 推荐(0)

Wordvec_句子相似度

摘要： import jiebafrom jieba import analyseimport numpyimport gensimimport codecsimport pandas as pdimport jieba.posseg as pogfrom gensim.models import Word 阅读全文

posted @ 2018-11-19 11:36 happygril3 阅读(602) 评论(0) 推荐(0)

句子相似度_tf/idf

摘要： import mathfrom math import isnanimport pandas as pd#结巴分词，切开之后，有分隔符def jieba_function(sent): import jieba sent1 = jieba.cut(sent) s = [] for each in s 阅读全文

posted @ 2018-11-19 10:48 happygril3 阅读(500) 评论(0) 推荐(0)

word2vec_文本相似度

摘要： #提取关键词#关键词向量化#相似度计算from jieba import analyseimport numpyimport gensim# 实现给出任意字符串，获取字符串中某字符的位置以及出现的总次数def get_char_pos(string,char): chPos=[] try: chPo 阅读全文

posted @ 2018-11-19 10:32 happygril3 阅读(4149) 评论(0) 推荐(0)

word2vec_训练模型

摘要： from gensim.models import Word2Vecfrom gensim.models.word2vec import LineSentence# 原始的训练语料转化成一个sentence的迭代器,每一次迭代返回的sentence是一个word（utf8格式）的列表def vcto 阅读全文

posted @ 2018-11-19 10:30 happygril3 阅读(358) 评论(0) 推荐(0)

happygril3

导航

公告