happygril3

word2vec_训练模型

摘要： from gensim.models import Word2Vecfrom gensim.models.word2vec import LineSentence# 原始的训练语料转化成一个sentence的迭代器,每一次迭代返回的sentence是一个word（utf8格式）的列表def vcto 阅读全文

posted @ 2018-11-19 10:30 happygril3 阅读(356) 评论(0) 推荐(0)

情感分析_积极消极词库

摘要： import jiebaimport numpy as np# 打开词典文件，返回列表def open_dict(Dict='hahah',path = 'C:\E\Textming\Textming/'): path = path + '%s.txt' %Dict dictionary = ope 阅读全文

posted @ 2018-11-16 17:35 happygril3 阅读(2277) 评论(1) 推荐(0)

主题提取_自己的代码

摘要： def cmp(e1,e2): #输出关键词，按照关键词的计算分值排序，在得分相同，根据关键词排序 import numpy as np res=np.sign(e1[1]-e2[1]) if res!=0: return res else: a=e1[0]+e2[0] b=e2[0]+e1[0] 阅读全文

posted @ 2018-11-16 17:14 happygril3 阅读(383) 评论(0) 推荐(0)

gensim_主题提取

摘要： # https://blog.csdn.net/whzhcahzxh/article/details/17528261# gensim包中引用corpora,models, similarities，分别做语料库建立，模型库和相似度比较库from gensim import corpora, mod 阅读全文

posted @ 2018-11-16 16:38 happygril3 阅读(935) 评论(0) 推荐(0)

Snownlp

摘要： from snownlp import SnowNLP text='宝贝自拍很帅！！！注意休息～'s=SnowNLP(text)#分词print(s.words)#词性for tag in s.tags: print(tag)#情感度,积极的概率print(s.sentiments)#关键词prin 阅读全文

posted @ 2018-11-16 15:48 happygril3 阅读(198) 评论(0) 推荐(0)

词云

摘要： from scipy.misc import imread # 这是一个处理图像的函数from wordcloud import WordCloud,STOPWORDS,ImageColorGeneratorimport matplotlib.pyplot as pltimport pandas a 阅读全文

posted @ 2018-11-16 14:59 happygril3 阅读(174) 评论(0) 推荐(0)

降维

摘要： #CPA 无监督,不利用类别标签from sklearn.decomposition import PCAdata_CPA=PCA(n_components=2).fit_transform(iris.data)# print('data_CPA',data_CPA)#线性判别法有监督,利用数据的阅读全文

posted @ 2018-11-15 18:37 happygril3 阅读(132) 评论(0) 推荐(0)

特征选择

摘要： #特征选择# (1)filter#1.1 方差：先要计算各个特征的方差，然后根据阈值，选择方差大于阈值的特征from sklearn.feature_selection import VarianceThresholddata_var=VarianceThreshold(threshold=3).f 阅读全文

posted @ 2018-11-15 18:37 happygril3 阅读(256) 评论(0) 推荐(0)

数据标准化_1

摘要： from sklearn.datasets import load_irisiris=load_iris()#Z-score 数据标准化from sklearn.preprocessing import StandardScalerdata_standard=StandardScaler().fit 阅读全文

posted @ 2018-11-15 18:14 happygril3 阅读(145) 评论(0) 推荐(0)

偏差和方差

摘要：误差（泛化误差）=偏差(bias),+方差(variance) +噪声(noise) 偏差：【预测值-真实值的偏离程度】--【算法的拟合能力】--boosting Boosting使loss减少，可以降低bias。这里的模型之间并不独立，所以不能显著减少variance 方差：【结果的波动程度】-- 阅读全文

posted @ 2018-11-15 15:44 happygril3 阅读(428) 评论(0) 推荐(0)

导航

公告