pu369com

pandas筛选出某列中含有特定文本内容的所在行

1、所在行内容是单一的或者是标量

df_fintech = df_text[df_text['业务一级分类']=="金融科技"]

  

2、所在行内容是割裂的

先转成str格式再用contains筛选

df_fintech = df_text[df_text['业务一级分类'].str.contains("金融科技")]

  

3、筛选出列值属于某个范围内的行,用isin

df.loc[df['column_name'].isin(some_values)]  # some_values是可迭代对象

  

4、多种条件限制时使用&,&的优先级高于>=或<=,所以要注意括号的使用

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

  

5、筛选出列值不等于某个/些值的行

利用反选的思想:

df.loc[df['column_name'] != 'some_value']
df.loc[~df['column_name'].isin('some_values')] #~取反 if values are str, remember to pass a list ['str1','str2']

 

在字符串pandas列中查找多个关键字的更有效方法示例(也就是上面第2个方法)

 

import pandas as pd

# create regex pattern out of the list of words
positive_kw = '|'.join(['rise','positive','high','surge'])
negative_kw = '|'.join(['sink','lower','fall','drop','slip','loss','losses'])
neutral_kw = '|'.join(['flat','neutral'])

# creating some fake data for demonstration
words = [
        'rise high',
        'positive attitude',
        'something',
        'foo',
        'lowercase',
        'flat earth',
        'neutral opinion'
        ]

df = pd.DataFrame(data=words, columns=['words'])

df['positive'] = df['words'].str.contains(positive_kw).astype(int)
df['negative'] = df['words'].str.contains(negative_kw).astype(int)
df['neutral'] = df['words'].str.contains(neutral_kw).astype(int)

print(df)

  

 

6、用groupby分组并将每一组单独保存为excel文件(get_group)

import pandas as pd
from styleframe import StyleFrame

file_name = "总表.xlsx"
df = pd.read_excel(file_name, skiprows=1)
rows = list(set(df["列标题"]))
group = df.groupby("列标题")
for row in rows:
    count = len(group.get_group(row))    
    group.get_group(row).to_excel(row+ str(count) + ".xlsx")

  

 

参考:https://blog.csdn.net/weixin_43557139/article/details/109459352

https://www.coder.work/article/4980040

posted on 2023-04-26 10:40  pu369com  阅读(405)  评论(0编辑  收藏  举报

导航