pandas筛选出某列中含有特定文本内容的所在行
1、所在行内容是单一的或者是标量
df_fintech = df_text[df_text['业务一级分类']=="金融科技"]
2、所在行内容是割裂的
先转成str格式再用contains筛选
df_fintech = df_text[df_text['业务一级分类'].str.contains("金融科技")]
3、筛选出列值属于某个范围内的行,用isin
df.loc[df['column_name'].isin(some_values)] # some_values是可迭代对象
4、多种条件限制时使用&,&的优先级高于>=或<=,所以要注意括号的使用
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
5、筛选出列值不等于某个/些值的行
利用反选的思想:
df.loc[df['column_name'] != 'some_value']
df.loc[~df['column_name'].isin('some_values')] #~取反 if values are str, remember to pass a list ['str1','str2']
在字符串pandas列中查找多个关键字的更有效方法示例(也就是上面第2个方法)
import pandas as pd
# create regex pattern out of the list of words
positive_kw = '|'.join(['rise','positive','high','surge'])
negative_kw = '|'.join(['sink','lower','fall','drop','slip','loss','losses'])
neutral_kw = '|'.join(['flat','neutral'])
# creating some fake data for demonstration
words = [
'rise high',
'positive attitude',
'something',
'foo',
'lowercase',
'flat earth',
'neutral opinion'
]
df = pd.DataFrame(data=words, columns=['words'])
df['positive'] = df['words'].str.contains(positive_kw).astype(int)
df['negative'] = df['words'].str.contains(negative_kw).astype(int)
df['neutral'] = df['words'].str.contains(neutral_kw).astype(int)
print(df)
6、用groupby分组并将每一组单独保存为excel文件(get_group)
import pandas as pd
from styleframe import StyleFrame
file_name = "总表.xlsx"
df = pd.read_excel(file_name, skiprows=1)
rows = list(set(df["列标题"]))
group = df.groupby("列标题")
for row in rows:
count = len(group.get_group(row))
group.get_group(row).to_excel(row+ str(count) + ".xlsx")
参考:https://blog.csdn.net/weixin_43557139/article/details/109459352
https://www.coder.work/article/4980040
浙公网安备 33010602011771号