MetaEmb模型复现
2022.12.4更新
第一次复现时对文章有一些误解,另外作者源码中在训练时没有shuffle数据,这次更新修改了以上问题。
--------------------------------------------------------------------------------------------------------------------------------
原论文为Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings。
论文里的核心思想是用旧广告的数据训练一个输入为广告的一些特征,输出为广告id embedding的生成器,然后用这个生成器为新广告生成初始的id embedding。本次实验选择了movielens-1m数据集。
整个过程大概有以下几步。
1.pre-train
最开始需要预训练一个模型,这里我选择的base模型是deepFM。
def pre_train(model, train_dataloader, args):
model.train()
pretrain_optimizer = torch.optim.Adam(model.parameters(), args.learning_rate)
loss_func = nn.BCELoss(reduction='mean')
tot_loss = 0.0
tot_epoch = 0
for i in range(1):
for x, y, g, t in train_dataloader:
x, y, g, t = x.to(device), y.float().to(device), g.to(device), t.to(device)
pred_y = model(x, g, t)
loss = loss_func(pred_y, y)
pretrain_optimizer.zero_grad()
loss.backward()
pretrain_optimizer.step()
tot_loss += loss.item()
tot_epoch += 1
print('pretrain loss:{:.4}'.format(tot_loss / tot_epoch))
第一步的测试结果。
pretrain loss:0.5515
[pre-train]
test-test loss: 0.666380
[pre-train]
test-test auc: 0.709859
2.generator train
第二步是训练一个id embedding生成器,这里每次选择两个不相交的训练集,进行两步的训练,总的loss就是loss_a和loss_b加起来的loss。这里的一些方法参照了之前MeLU模型的,不确定是否正确。总共跑了3个epoch,每次都选择两个不同的训练集。
train_y_a, train_x_a, train_t_a, train_g_a = get_data(_train_a)
train_y_b, train_x_b, train_t_b, train_g_b = get_data(_train_b)
train_dataloader_a = DataLoader(MetaMovie(train_x_a, train_y_a, train_g_a, train_t_a), batch_size=batchsize, num_workers=0)
train_dataloader_b = DataLoader(MetaMovie(train_x_b, train_y_b, train_g_b, train_t_b), batch_size=batchsize, num_workers=0)
loss_func = nn.BCELoss(reduction='mean')
model.train()
metagen.train()
for (x_a, y_a, g_a, t_a), (x_b, y_b, g_b, t_b) in zip(train_dataloader_a, train_dataloader_b):
x_a, y_a, g_a, t_a = x_a.to(device), y_a.float().to(device), g_a.to(device), t_a.to(device)
x_b, y_b, g_b, t_b = x_b.to(device), y_b.float().to(device), g_b.to(device), t_b.to(device)
metaemb = metagen(model.embeddings, x_a, g_a, t_a)
pred_a = model(x_a, g_a, t_a, model_type = 'generator_train', metaemb=metaemb)
loss_a = loss_func(pred_a, y_a)
grad = torch.autograd.grad(loss_a, metaemb, retain_graph=True)
metaemb = metaemb - args.cold_lr * grad[0]
pred_b = model(x_b, g_b, t_b, model_type = 'generator_train', metaemb=metaemb)
loss_b = loss_func(pred_b, y_b)
loss = loss_a * args.alpha + loss_b * (1 - args.alpha)
optimizer.zero_grad()
loss.backward()
optimizer.step()
第二步的测试结果。
[Meta-Embedding]
test-test loss: 0.658307
[Meta-Embedding]
test-test auc: 0.711670
[Meta-Embedding]
test-test loss: 0.641029
[Meta-Embedding]
test-test auc: 0.720815
3.new_ad embedding train
这一步分别训练base模型和用metaEmbedding的模型,注意这一步的训练只更新新广告的id embedding,base模型和文章提出的模型不同之处只有新广告的初始id embedding不同,base模型的新广告的id embedding是随机初始化,本文模型的新广告的id embedding是生成器生成的。
model.load_state_dict(torch.load("./model_parameter.pkl"))
warmup_train_optimizer = torch.optim.Adam(model.embeddings['MovieID'].parameters(), lr=args.warm_lr)
test_dataloader_a = DataLoader(MetaMovie(test_x_a, test_y_a, test_g_a, test_t_a), batch_size=batchsize,shuffle=True)
warmup_train(model, warmup_train_optimizer, test_dataloader_a)
test_auc_test, test_loss_test = predict_on_batch(model, test_dataloader)
print("[baseline]\n\ttest-test loss:\t{:.4f}, improvement: {:.2%}".format(
test_loss_test, 1-test_loss_test/logloss_base_cold))
print("[baseline]\n\ttest-test auc:\t{:.4f}, improvement: {:.2%}".format(
test_auc_test, test_auc_test/auc_base_cold-1))
test_dataloader_b = DataLoader(MetaMovie(test_x_b, test_y_b, test_g_b, test_t_b), batch_size=batchsize,shuffle=True)
warmup_train(model, warmup_train_optimizer, test_dataloader_b)
test_auc_test, test_loss_test = predict_on_batch(model, test_dataloader)
print("[baseline]\n\ttest-test loss:\t{:.4f}, improvement: {:.2%}".format(
test_loss_test, 1-test_loss_test/logloss_base_cold))
print("[baseline]\n\ttest-test auc:\t{:.4f}, improvement: {:.2%}".format(
test_auc_test, test_auc_test/auc_base_cold-1))
test_dataloader_c = DataLoader(MetaMovie(test_x_c, test_y_c, test_g_c, test_t_c), batch_size=batchsize, shuffle=True)
warmup_train(model, warmup_train_optimizer, test_dataloader_c)
test_auc_test, test_loss_test = predict_on_batch(model, test_dataloader)
print("[baseline]\n\ttest-test loss:\t{:.4f}, improvement: {:.2%}".format(
test_loss_test, 1-test_loss_test/logloss_base_cold))
print("[baseline]\n\ttest-test auc:\t{:.4f}, improvement: {:.2%}".format(
test_auc_test, test_auc_test/auc_base_cold-1))
print("*"*100)
print(" "*100)
print("*"*100)
model.load_state_dict(torch.load("./model_parameter.pkl"))
warmup_train_optimizer = torch.optim.Adam(model.embeddings['MovieID'].parameters(), lr=args.warm_lr)
test_dataloader_a = DataLoader(MetaMovie(test_x_a, test_y_a, test_g_a, test_t_a), batch_size=batchsize, num_workers=0)
#用生成器生成新广告的id embedding并更新embedding table
with torch.no_grad():
for x, y, g, t in test_dataloader_a:
x, y, g, t = x.to(device), y.float().to(device), g.to(device), t.to(device)
for i in range(x.shape[0] // args.mini_batchsize):
idx = i * args.mini_batchsize
mid = x[idx, 0]
model.embeddings['MovieID'].weight.data[mid] = metagen(model.embeddings, x[idx:idx+1], g[idx: idx+1], t[idx: idx+1])
test_auc_test, test_loss_test = predict_on_batch(model, test_dataloader)
print("[Init]\n\ttest-test loss:\t{:.4f}, improvement: {:.2%}".format(
test_loss_test, 1-test_loss_test/logloss_base_cold))
print("[Init]\n\ttest-test auc:\t{:.4f}, improvement: {:.2%}".format(
test_auc_test, test_auc_test/auc_base_cold-1))
test_dataloader_a = DataLoader(MetaMovie(test_x_a, test_y_a, test_g_a, test_t_a), batch_size=batchsize, shuffle=True)
warmup_train(model, warmup_train_optimizer, test_dataloader_a)
test_auc_test, test_loss_test = predict_on_batch(model, test_dataloader)
print("[Meta-Embedding]\n\ttest-test loss:\t{:.4f}, improvement: {:.2%}".format(
test_loss_test, 1-test_loss_test/logloss_base_cold))
print("[Meta-Embedding]\n\ttest-test auc:\t{:.4f}, improvement: {:.2%}".format(
test_auc_test, test_auc_test/auc_base_cold-1))
test_dataloader_b = DataLoader(MetaMovie(test_x_b, test_y_b, test_g_b, test_t_b), batch_size=batchsize, shuffle=True)
warmup_train(model, warmup_train_optimizer, test_dataloader_b)
test_auc_test, test_loss_test = predict_on_batch(model, test_dataloader)
print("[Meta-Embedding]\n\ttest-test loss:\t{:.4f}, improvement: {:.2%}".format(
test_loss_test, 1-test_loss_test/logloss_base_cold))
print("[Meta-Embedding]\n\ttest-test auc:\t{:.4f}, improvement: {:.2%}".format(
test_auc_test, test_auc_test/auc_base_cold-1))
test_dataloader_c = DataLoader(MetaMovie(test_x_c, test_y_c, test_g_c, test_t_c), batch_size=batchsize, shuffle=True)
warmup_train(model, warmup_train_optimizer, test_dataloader_c)
test_auc_test, test_loss_test = predict_on_batch(model, test_dataloader)
print("[Meta-Embedding]\n\ttest-test loss:\t{:.4f}, improvement: {:.2%}".format(
test_loss_test, 1-test_loss_test/logloss_base_cold))
print("[Meta-Embedding]\n\ttest-test auc:\t{:.4f}, improvement: {:.2%}".format(
test_auc_test, test_auc_test/auc_base_cold-1))
最终结果,分界线上面那部分是base模型的结果,下面是本文提出的模型的结果。
warmup_train loss:0.6564
[baseline]
test-test loss: 0.6425, improvement: 3.59%
[baseline]
test-test auc: 0.7316, improvement: 3.06%
warmup_train loss:0.6285
[baseline]
test-test loss: 0.6218, improvement: 6.69%
[baseline]
test-test auc: 0.7490, improvement: 5.51%
warmup_train loss:0.6118
[baseline]
test-test loss: 0.6034, improvement: 9.45%
[baseline]
test-test auc: 0.7632, improvement: 7.51%
****************************************************************************************************
****************************************************************************************************
[Init]
test-test loss: 0.6410, improvement: 3.80%
[Init]
test-test auc: 0.7208, improvement: 1.54%
warmup_train loss:0.6272
[Meta-Embedding]
test-test loss: 0.6204, improvement: 6.90%
[Meta-Embedding]
test-test auc: 0.7400, improvement: 4.25%
warmup_train loss:0.6015
[Meta-Embedding]
test-test loss: 0.6031, improvement: 9.50%
[Meta-Embedding]
test-test auc: 0.7552, improvement: 6.39%
warmup_train loss:0.5865
[Meta-Embedding]
test-test loss: 0.5879, improvement: 11.78%
[Meta-Embedding]
test-test auc: 0.7678, improvement: 8.16%
总结
本次实验对我来说相当不容易,文章作者提供的源代码应该是tf1版本的,我是用pytorch复现的,tf1和pytorch有非常大的不同,所以大部分内容还是我根据论文里面的去复现的。可以看出,本文提出的模型效果是优于base模型的,复现还算是比较成功。此外,我还发现了一个问题,在源代码中,为什么不将预训练的模型训练到收敛或是多训练几个epoch让它效果更好(github也有人提出一样的疑问)?在复现过程中也有一些问题,最后新广告的id embedding训练,只训练一个epoch远远达不到论文中的效果,我是训练了五个epoch才有这样的效果,源代码只训练一个epoch的效果甚至超过了我的复现效果,这一点不知道是因为我的代码有错误还是因为tf和pytorch的差异。 作者原文中在训练base模型和warm阶段的训练没有shuffle数据,应该是这点原因。
下面是复现结果的表格总结,所有的改进都是相对于pre-train的模型改进的。
logloss improvements(这里的改进我将它符号改为正值了,和论文中是相反的)
| Model | Warm-Up phase: a | Warm-Up phase: b | Warm-Up phase: c |
|---|---|---|---|
| deepFM(base) | 3.59% | 6.69% | 9.45% |
| MetaEmb | 6.90% | 9.50% | 11.78% |
AUC
| Model | Warm-Up phase: a | Warm-Up phase: b | Warm-Up phase: c |
|---|---|---|---|
| deepFM(base) | 3.06% | 5.51% | 7.51% |
| MetaEmb | 4.25% | 6.39% | 8.16% |
还可以发现,当数据量越少时,MetaEmb效果相对于base模型就越好,这也可以证明MetaEmb确实可以缓解冷启动问题。

浙公网安备 33010602011771号