Numpy实现多列滑动窗口

最近项目遇到一个问题,计算区间内每天的beta值,每天日期往前推一年作为当天数据时间序列。

beta:是一个金融指标,衡量系统性风险,通过计算组合收益率Y与基准收益率X的回归系数得到

\[Y = \alpha+\beta*X \]

解决方案为:取开始日期往前推一年到结束日期的数据,以结束日期为起点滑动计算,滑动窗口为365

由于目前pandas.rolling.apply模块只支持简单的单列滑动窗口计算,一开始尝试了for循环按日期切割数据计算,结果效率太低
经过各种学习于是就有了一个numpy版本的滑动窗口功能

以下为简化的实例代码

import pandas as pd
import numpy as np
from numpy.random import randn
from sklearn.linear_model import LinearRegression
from numpy.lib.stride_tricks import as_strided as stride
linreg = LinearRegression()
s = '2000-01-01'
e = '2000-01-20'
date_list = pd.date_range(s,e).tolist()
l = len(date_list)

df = pd.DataFrame()
df['THE_DATE']=date_list
df['X'] = randn(l)
df['Y'] = randn(l)

def roll_np(df: pd.DataFrame, apply_func: callable, window: int, columns: list, **kwargs):

    return_col_num = len(columns)
    df = df.sort_index( ascending=False)
    v = df.values

    dim0, dim1 = v.shape
    stride0, stride1 = v.strides

    stride_values = stride(v, (dim0 - (window - 1), window, dim1), (stride0, stride0, stride1))

    result_values = np.full((dim0, return_col_num), np.nan)
    for idx, values in enumerate(stride_values, window-1):
        res = apply_func(values, **kwargs)
        result_values[idx,] = res
    res = pd.DataFrame(data=result_values,columns=columns).dropna().sort_values('THE_DATE').reset_index(drop=True)
    res['THE_DATE'] = res['THE_DATE'].apply(lambda x:pd.to_datetime(x))

    return res

def rolling_beta(df):
    res = roll_np(df,cal_beta,6,['THE_DATE','beta','alpha'])
    return res

def cal_beta(narr):
    _date = narr[0,0]
    X = np.array(narr[:,1]).reshape(-1,1)
    Y = np.array(narr[:,2]).reshape(-1,1)
    linreg.fit(X,Y)
    beta = linreg.coef_[0,0]
    alpha = linreg.intercept_
    return np.array([_date.to_datetime64(),beta,alpha])

if __name__ == '__main__':
    res = rolling_beta(df)
    print(res)

posted @ 2022-08-30 16:42  Franciszw  阅读(393)  评论(0编辑  收藏  举报