MMOE 论文阅读

推荐系统多任务建模

在推荐系统中,我们往往需要同时预估多个目标(如点击、点赞、观看时长、评论等),最直接的做法是对每个目标各训练一个模型,但是这样做存在两个缺点:

  1. 资源开销大,后期需要维护多个模型,成本高
  2. 不同任务之间存在相关性,单独建模无法利用不同目标之间的相关性

 

share bottom建模

为了解决上面的问题,早期出现了上图(a)所示的share bottom的模型结构,share bottom不同模型共享底座,上面接不同的task tower预估不同的任务,这样就可以实现一个模型建模多个目标,且通过共享bottom的方式,让不同目标学习相关性,提高模型的泛化能力。但是其也存在一些问题,那就是如果两个目标存在差异时,由于用的同一个bottom,训练时可能存在冲突,进而出现“跷跷板”现象,即迭代过程中出现一个任务指标涨了,另一个任务指标跌了

 

MMOE模型

为了解决“跷跷板”问题,这篇论文提出了MMOE模型(如上图(c)所示),MMOE底部由多个expert组成,每个任务会有gate去学习不同专家的权重,每个任务最终由不同专家网络的加权和组成

 

代码实现

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
import numpy as np


# 定义专家网络
def expert_network(input_dim, expert_units):
    inputs = Input(shape=(input_dim,))
    x = Dense(expert_units, activation='relu')(inputs)
    return Model(inputs=inputs, outputs=x)


# 定义门控网络
def gate_network(input_dim, num_experts):
    inputs = Input(shape=(input_dim,))
    x = Dense(num_experts, activation='softmax')(inputs)
    return Model(inputs=inputs, outputs=x)


# 定义 MMOE 模型
def MMOE(input_dim, num_experts, expert_units, num_tasks):
    inputs = Input(shape=(input_dim,))
    experts = [expert_network(input_dim, expert_units) for _ in range(num_experts)]
    gates = [gate_network(input_dim, num_experts) for _ in range(num_tasks)]

    expert_outputs = [expert(inputs) for expert in experts] # [(batch_szie, expert_units)]
    expert_outputs = tf.stack(expert_outputs, axis=1) # (batch_szie, num_experts, expert_units)

    task_outputs = []
    for gate in gates:
        gate_output = gate(inputs) # (batch_szie, num_experts)
        gate_output = tf.expand_dims(gate_output, axis=-1) # (batch_szie, num_experts, 1)
        weighted_expert_output = expert_outputs * gate_output
        task_output = tf.reduce_sum(weighted_expert_output, axis=1)
        task_outputs.append(task_output)

    final_outputs = [Dense(1, activation='sigmoid')(task_output) for task_output in task_outputs]
    model = Model(inputs=inputs, outputs=final_outputs)
    return model


# 生成模拟数据
input_dim = 10
num_samples = 1000
num_tasks = 2
num_experts = 3
expert_units = 16

X = np.random.randn(num_samples, input_dim)
y = [np.random.randint(0, 2, num_samples) for _ in range(num_tasks)]

# 划分训练集和测试集
train_size = int(num_samples * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train = [y_task[:train_size] for y_task in y]
y_test = [y_task[train_size:] for y_task in y]

# 创建 MMOE 模型
model = MMOE(input_dim, num_experts, expert_units, num_tasks)

# 编译模型
model.compile(optimizer='adam',
              loss=['binary_crossentropy'] * num_tasks,
              metrics=['accuracy'])

# 训练模型
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.1)

# 测试模型
results = model.evaluate(X_test, y_test)
losses = results[:num_tasks+1]
accuracies = results[num_tasks+1:]

print(f"Test losses: {losses}")
print(f"Test accuracies: {accuracies}")
    
View Code

 

参考资料

https://zhuanlan.zhihu.com/p/527185153

posted @ 2021-08-31 00:42  AI_Engineer  阅读(817)  评论(0)    收藏  举报