正则表达式

. 匹配除换行符以外的任意字符。
^ 匹配字符串的开头。
$ 匹配字符串的结尾。
* 匹配前一个字符零次或多次。
+ 匹配前一个字符一次或多次。
? 匹配前一个字符零次或一次。非贪婪匹配，只要后面的满足就停止
{n} 匹配前一个字符恰好 n 次。
{n,} 匹配前一个字符至少 n 次。
{n,m} 匹配前一个字符至少 n 次，至多 m 次。
[] 匹配中括号内的任意一个字符。
() 创建一个组。
\d 匹配一个数字。
\w 匹配一个单词字符（字母、数字、下划线）。
\s 匹配一个空白字符（空格、制表符、换行符）。

' ' 这里面的都是规则

match

pattern = r"hello"
text = "hello world"
match = re.match(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match")

search

pattern = r"world"
text = "hello world"
search = re.search(pattern, text)
if search:
    print("Found:", search.group())
else:
    print("Not found")

findall

pattern = r"ab"
text = "ababab"
matches = re.findall(pattern, text)
print(matches)

sub

pattern = r"apple"
text = "apple banana apple cherry"
new_text = re.sub(pattern, "orange", text)
print(new_text)

dataframe里使用

正则写在一个函数里，调用函数

import pandas as pd
import re

data = {'200': ['abc (123)', 'def (456)', 'ghi (789)']}
df = pd.DataFrame(data)

# 定义一个函数来处理括号内的内容
def remove_content_in_parentheses(text):
    return re.sub(r'\([^)]*\)', '', text)

# 对'200'列应用函数
df['200'] = df['200'].apply(remove_content_in_parentheses)

# 输出处理后的DataFrame
print(df)

demo：

去掉() 里的内容

import re

txt = "目前关于这一疾病状态（current disease state）的临床数据（limited clinical data）仍然有限。通过观察不同患者群体（patient group）的临床特征（clinical characteristics）"

# 使用正则表达式去除括号内的英文内容
cleaned_txt = re.sub(r'（.*?）', '', txt)

print(cleaned_txt)

posted on 2024-03-12 12:33 黑逍逍阅读(38) 评论(0) 收藏举报

刷新页面返回顶部

正则表达式

match

search

findall

sub

dataframe里使用

demo：

公告