TextBlob Quick Start

安装

pip install textblob

import nltk
pip install nltk
nltk.download('punkt')   # 安装一些语料库, 国内安装有问题
nltk.download('averaged_perceptron_tagger')

基本操作

情感分析

该sentiment属性返回形式的namedtuple 。极性在[-1.0,1.0]范围内浮动。主观性是在[0.0,1.0]范围内的浮动,其中0.0是非常客观的,而1.0是非常主观的。Sentiment(polarity, subjectivity)

text = "I am happy today. I feel sad today."

from textblob import TextBlob
blob = TextBlob(text)
blob.sentences   # 能够分句

print(blob.sentences[0].sentiment)
print(blob.sentences[1].sentiment)
print(blob.sentiment)
# Sentiment(polarity=0.15000000000000002, subjectivity=1.0)

拼写纠正

>>> b = TextBlob("I havv goood speling!")
>>> print(b.correct())
I have good spelling!

返回的貌似是字符数组,最好用str转成字符串.

原理见 How to Write a Spelling Corrector

例子:amazon评论情感分析

1. 获取原始评论数据

import pandas as pd

df=pd.read_csv('hair_dryer.tsv', sep='\t', usecols=['review_body','review_date'])

df.head()

2. 由于评论中有大量的拼写错误,加个单词纠正

def correct(text):
  b = TextBlob(text)
  return str(b.correct())


df['corrected_review_body'] = df['review_body'].map(correct)
df.head()

3. 得到情感极性和主观性

def get_polarity(text):
  testimonial = TextBlob(text)
  return testimonial.sentiment.polarity

def get_subjectivity(text):
  testimonial = TextBlob(text)
  return testimonial.sentiment.subjectivity


df['polarity'] = df['review_body'].map(get_polarity)
df['subjectivity'] = df['review_body'].map(get_subjectivity)

print(df)

4. 统计词频

统计词频主要是依靠 collections.Counter(word_list).most_common() 方法来自动统计并排序,如下:

row = df.shape[0]
words = ""
for i in range(0, row):
  words = words + " " + p[i]

print(words)   #汇总每个评论

import collections
collections.Counter(words.split()).most_common(50)  #显示top50

具体得到某个单词的频数:

coll = collections.Counter(words.split())
coll["good"]

5. 保存结果到CSV

df.to_csv("情感分析.csv")

 

 

参考链接:

1. https://textblob.readthedocs.io/en/dev/quickstart.html#get-word-and-noun-phrase-frequencies

2. https://blog.csdn.net/heykid/article/details/62424513#%E7%BB%9F%E8%AE%A1%E8%AF%8D%E9%A2%91 

3. https://blog.csdn.net/sinat_29957455/article/details/79059436

posted @ 2020-03-07 19:18  Rogn  阅读(599)  评论(0编辑  收藏  举报