happygril3

k-means聚类

摘要：算法： (1) 随机选择k个初始中心点。(2) 计算每个数据点到中心点的距离，数据点距离哪个中心点最近就划分到哪一类中。 (3) 把中心点转移到得到的cluster内部的数据点的平均位置。(4) 重复以上步骤，直到每一类中心在每次迭代后变化不大为止。 k值确定：拐点图：组内误差平方和，SSE（sum 阅读全文

posted @ 2018-11-26 11:28 happygril3 阅读(238) 评论(0) 推荐(0)

贝叶斯分类

摘要：原理：基于条件概率，适用于不同维度之间相关性较小的时候，比较容易解释。公式：p(c/x) = p(c,x) / p(x) = p(x/c)*p(c) / p(x) 解释：假设某个体有n个特征（feature),分别为F1,F2,........Fn 有m个类别（catogery)，分别为C1,C 阅读全文

posted @ 2018-11-22 18:11 happygril3 阅读(336) 评论(0) 推荐(0)

损失函数

摘要：最下二乘法平方损失函数： L( Y, f(X) ) = ∑【Y- f(X)】^2 逻辑回归对数损失函数： L( Y, P(Y|X) )=-log P(Y|X) 朴素贝叶斯 0/1损失函数： L( Y, P(Y|X) )=1，if Y!= f(X) 0, if Y=f(X) Adboost 指数损阅读全文

posted @ 2018-11-22 15:58 happygril3 阅读(98) 评论(0) 推荐(0)

排序

摘要：时间复杂度空间复杂度稳定性内部排序插入排序直接插入排序 o(n^2) o(1) 稳定希尔排序 o(n) o(n*sqrt(n)) 不稳定选择排序简单选择排序 o(n^2) o(1) 不稳定堆排序 o(n*log(n)) o(1) 不稳定交换排序冒泡排序 o(n^2) o(1) 阅读全文

posted @ 2018-11-22 09:57 happygril3 阅读(65) 评论(0) 推荐(0)

排序_2

摘要： import numpy as np#插入排序【直接插入排序】：把新数与前面的数字比较，如果小于前面的数字，则插入数据def insert_sort(array): for i in range(len(array)): for j in range(i): if array[i]<array[j] 阅读全文

posted @ 2018-11-22 09:57 happygril3 阅读(107) 评论(0) 推荐(0)

词频统计

摘要： def frequncy(data,n): import numpy as np import jieba.posseg as pog text = '' for i in np.arange(n): text += str(data.ix[i, 'comment']) stop_property 阅读全文

posted @ 2018-11-19 17:29 happygril3 阅读(374) 评论(0) 推荐(0)

wordvec_词的相似度

摘要： import gensimfrom gensim.models import word2vecimport loggingimport jiebaimport osimport numpy as npdef cut_txt(old_file): import jieba global cut_fil 阅读全文

posted @ 2018-11-19 11:48 happygril3 阅读(296) 评论(0) 推荐(0)

Wordvec_句子相似度

摘要： import jiebafrom jieba import analyseimport numpyimport gensimimport codecsimport pandas as pdimport jieba.posseg as pogfrom gensim.models import Word 阅读全文

posted @ 2018-11-19 11:36 happygril3 阅读(602) 评论(0) 推荐(0)

句子相似度_tf/idf

摘要： import mathfrom math import isnanimport pandas as pd#结巴分词，切开之后，有分隔符def jieba_function(sent): import jieba sent1 = jieba.cut(sent) s = [] for each in s 阅读全文

posted @ 2018-11-19 10:48 happygril3 阅读(500) 评论(0) 推荐(0)

word2vec_文本相似度

摘要： #提取关键词#关键词向量化#相似度计算from jieba import analyseimport numpyimport gensim# 实现给出任意字符串，获取字符串中某字符的位置以及出现的总次数def get_char_pos(string,char): chPos=[] try: chPo 阅读全文

posted @ 2018-11-19 10:32 happygril3 阅读(4149) 评论(0) 推荐(0)

导航

公告