2018 年 10月随笔档案 - 六盘水月照

按照固定字符数切割字符串基于python的re正则表达式

摘要：def cut_text(text,lenth): textArr = re.findall('.{'+str(lenth)+'}', text) textArr.append(text[(len(textArr)*lenth):]) return textArr print(cut_text('123456789abcdefg',3)) 阅读全文

posted @ 2018-10-16 15:35 六盘水月照阅读(995) 评论(0) 推荐(0)

百度AI开放平台情感倾向分析实例以及gbk编码解决

摘要：f=open('test.txt','a+',encoding='utf-8') for index,row in cxzg.iterrows(): text=str(row['text']) text=text.encode('gb18030','ignore').decode('gbk','ignore') qgdict=client.sentimentClassif... 阅读全文

posted @ 2018-10-16 15:32 六盘水月照阅读(919) 评论(0) 推荐(0)

根据cid获取哔哩哔哩弹幕

摘要：def biliget(cid): headers = { "Accept": "*/*", "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) G... 阅读全文

posted @ 2018-10-16 15:31 六盘水月照阅读(1363) 评论(0) 推荐(0)

摘要：import sys #doc2vev import gensim import sklearn import numpy as np from gensim.models.doc2vec import Doc2Vec, LabeledSentence TaggededDocument = gensim.models.doc2vec.TaggedDocument def get... 阅读全文

posted @ 2018-10-16 15:30 六盘水月照阅读(3421) 评论(0) 推荐(0)

python3 LDA主题模型以及TFIDF实现

摘要：import codecs #主题模型 from gensim import corpora from gensim.models import LdaModel from gensim import models from gensim.corpora import Dictionary te = [] fp = codecs.open('input.txt','r') for line i... 阅读全文

posted @ 2018-10-16 15:29 六盘水月照阅读(4915) 评论(0) 推荐(0)

出辞气远鄙倍

kashikoi kawaii kliometrician

10 2018 档案

按照固定字符数切割字符串基于python的re正则表达式

百度AI开放平台情感倾向分析实例以及gbk编码解决

根据cid获取哔哩哔哩弹幕

python3 doc2vec文本聚类实现

python3 LDA主题模型以及TFIDF实现

公告

出辞气远鄙倍

kashikoi kawaii kliometrician

10 2018 档案

按照固定字符数切割字符串 基于python的re正则表达式

百度AI开放平台 情感倾向分析实例以及gbk编码解决

根据cid获取哔哩哔哩弹幕

python3 doc2vec文本聚类实现

python3 LDA主题模型以及TFIDF实现

公告

按照固定字符数切割字符串基于python的re正则表达式

百度AI开放平台情感倾向分析实例以及gbk编码解决