Loading

LSTM 昇腾平台全框架指南[1]

涵盖了keras,tensorflow,torch,ms多框架的LSTM模型。

原始开发环境:华为云昇腾Ascend下mindspore1.8下

可使用资源:CPU,NPU-Ai core,NPU-HBM

个人使用流程:【NPU-HBM和NPU-Ai core的关系类似CPU和GPU】

①keras模型训练

②切换tensorflow模型训练

③切换torch模型训练

④切换torch+NPU-Ai core模型训练

⑤切换mindspore+NPU-HBM+NPU-Ai core模型训练

总结,一个算法任务,算是把所有框架换了个遍,就差torch+GPU了,以后说不定会补充一下

★重点在于④⑤,前面三个比较常规

1 keras模型训练

首先区分两种训练模型,不管哪个框架,都有两种写法:①开箱训练【自己起的,并非官方命名,只是为了区分】:模型的训练函数自己定义,需要手写整个训练流程的逻辑;②闭箱训练:直接调用model.train,只需要自己设置超参即可,不用管训练逻辑。这里5种训练方式也是由闭箱训练慢慢切换成了开箱训练。

本次教程使用LSTM做时序预测,一开始是简单lstm后面改成双向,增加注意力机制,算是我自己整个项目流程,由简单到复杂。

1.1 数据集

数据源为两个npy文件:X:4000×400×32;Y:4000×400×1

数据X介绍:4000个样本,每个样本为400×32

数据Y介绍:其实是4000×1,中间400是扩展的,例如样本x1对应要预测的数值y1,这里扩展成[y1,y1,....,y1].shape=400×1

数据预处理:至少有归一化,别的处理看你自己的数据特点自行处理。

划分数据集3:1:1=训练集:验证集:测试集=2400:800:800

# numpy矩阵类型即可
train_X, train_Y = ...
valid_X, valid_Y = ...
test_X, test_Y = ...

# 模型构建
input1 = Input(shape=(train_x.shape[1], train_x.shape[2]))   # 输入大小
lstm = input1
lstm = LSTM(units=128,dropout=0.2,return_sequences=True)(lstm) 
lstm = LSTM(units=64,dropout=0.2,return_sequences=True)(lstm)
output = Dense(1)(lstm)   # 输出维度
model = Model(input1, output)   
model.compile(loss='mse', optimizer='adam')     # metrics=["mae"]
model.summary()

# 配置模型保存信息
check_point = ModelCheckpoint(filepath='model.h5', monitor='val_loss',save_best_only=True, mode='auto')

# 早停法     
early_stop = EarlyStopping(monitor='val_loss', patience=5, mode='auto')

# 如果tensorboard做可视化 就去掉注释
# tensorboard_callback = TensorBoard(log_dir="board")

# 模型训练
history = model.fit(train_X, train_Y, batch_size=64, epochs=20, verbose=2,validation_data=(valid_X, valid_Y), callbacks=[check_point, early_stop])

# 损失函数曲线可视化
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

# 数据预测
y_pred = model.predict(test_X)

2 tensorflow模型训练

与keras几乎相同,只不过不需要keras包了,要知道keras的前置是tensorflow。如果你的环境欧keras不兼容,可以尝试直接用tensorflow,而且keras与昇腾平台关联不大,tensorflow和昇腾有一定联系。

替换模型构建部分即可

from tensorflow.keras.layers import Input, Dense, LSTM, Embedding, Bidirectional,Dropout
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard
import tensorflow as tf
import tensorflow.python.keras as keras
from tensorflow.python.keras import backend as K

# 另一种写法
model = Sequential()
model.add(Bidirectional(LSTM(64,return_sequences=True),input_shape =(400,32),merge_mode='concat'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')     # metrics=["mae"]
model.summary()

# 原来的写法,按照keras的写法也行

3 torch模型训练

三集划分

(2400, 400, 32) (2400, 400, 1)
(800, 400, 32) (800, 400, 1)
(800, 400, 32) (800, 400, 1)

# torch中lstm有隐藏层,并且添加了注意力机制,如果隐藏层大小调整32,这里也要做调整16->32
(2400, 400, 32) (2400, 16, 1)
(800, 400, 32) (800, 16, 1)
(800, 400, 32) (800, 16, 1)

预处理部分

# 这里还是numpy格式
train_x, train_y = ...
val_x, val_y = ...
test_x, test_y = ...

train_x, train_y = [torch.Tensor(xx) for xx in train_x], [torch.Tensor(yy) for yy in train_y]
val_x, val_y = [torch.Tensor(xx) for xx in val_x], [torch.Tensor(yy) for yy in val_y]
test_x, test_y = [torch.Tensor(xx) for xx in test_x], [torch.Tensor(yy) for yy in test_y]

train_dataset = torch.utils.data.TensorDataset(torch.stack(train_x), torch.stack(train_y))
val_dataset = torch.utils.data.TensorDataset(torch.stack(val_x), torch.stack(val_y))
test_dataset = torch.utils.data.TensorDataset(torch.stack(test_x), torch.stack(test_y))

batch_size = 16

# 数据加载器构建,训练集和验证集打乱,测试集为了分析每次训练结果,最好不打乱
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size)

模型构建【相比于之前增加了注意力机制+单向LSTM变双向LSMT】

class AttentionLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, batch_size):
        super(AttentionLSTM, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.batch_size = batch_size
        self.isattention = False
        # bidirectional:是否为双向模式
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True,bidirectional=True,dropout=0.4)
        # 如果单向, 下面两条不需要有*2
        self.attention = nn.Linear(hidden_size*2, hidden_size)
        self.out = nn.Linear(hidden_size*2, output_size)
        
    def forward(self, input):
        h_0 = torch.randn(2, self.batch_size, self.hidden_size)
        c_0 = torch.randn(2, self.batch_size, self.hidden_size)
        # LSTM编码
        output, (hidden, cell) = self.lstm(input)
        # 计算注意力权重
        attn_weights = torch.softmax(self.attention(output), dim=1)
        # 计算注意力向量
        attn_vectors = torch.bmm(attn_weights.transpose(1, 2), output)

        # 将注意力向量和LSTM输出相加并通过线性层得到最终输出
        output = torch.relu(attn_vectors.squeeze(1))
        output = self.out(output)
        return output

超参训练

input_size = 32
hidden_size = 16
output_size = 1

lr = 0.001
epochs = 10
early_patience = 3

# 创建模型
model = AttentionLSTM(input_size=input_size, hidden_size=hidden_size, output_size=output_size, batch_size=batch_size)

# 定义损失函数和优化器
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# 训练模型
train_loss_list_epoch = []
valid_loss_list_epoch = []
train_loss_list_steps = []
valid_loss_list_steps = []
valid_loss_min = 1
bad_epoch = 0
t1 = time.time()
for epoch in range(epochs):
    t2 = time.time()
    model.train()
    train_loss_list_step = []
    for i, (inputs, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        train_loss_list_step.append(loss.item())
        optimizer.step()
    
    model.eval()
    valid_loss_list_step = []
    for _valid_X, _valid_Y in val_loader:
        pred_Y = model(_valid_X)
        loss = criterion(pred_Y, _valid_Y)
        valid_loss_list_step.append(loss.item())
    train_loss_cur = np.mean(train_loss_list_step)
    valid_loss_cur = np.mean(valid_loss_list_step)
    train_loss_list_epoch.append(train_loss_cur)
    valid_loss_list_epoch.append(valid_loss_cur)
    train_loss_list_steps += train_loss_list_step
    valid_loss_list_steps += valid_loss_list_step
    t3 = time.time()
    print('Epoch [{}/{}], Train-loss: {:.4f}, Val-loss: {:.4f}, Epoch-Time {:.1f}s, All Make Time {:.1f}s'.format(epoch+1, epochs, train_loss_cur, valid_loss_cur, t3-t2, t3-t1))
    # 早停机制
    if valid_loss_cur < valid_loss_min:
        valid_loss_min = valid_loss_cur
        bad_epoch = 0
    else:
        bad_epoch += 1
        if bad_epoch >= early_patience:
            print(" The training stops early in epoch {}".format(epoch))
            break

可视化损失函数

plt.plot(train_loss_list_epoch, c='blue')
plt.plot(valid_loss_list_epoch, c='green')
plt.show()

模型预测

# 在测试集上进行评估
y_ture = []
y_pred = []
with torch.no_grad():
    for inputs, labels in test_loader:
        outputs = model(inputs)
        y_pred += [np.mean(qa) for qa in outputs.tolist()]
        y_ture += [np.mean(qb) for qb in labels.tolist()]
posted @ 2024-09-19 14:24  绯色鱼  阅读(117)  评论(0)    收藏  举报