MMOE 论文阅读
推荐系统多任务建模
在推荐系统中,我们往往需要同时预估多个目标(如点击、点赞、观看时长、评论等),最直接的做法是对每个目标各训练一个模型,但是这样做存在两个缺点:
- 资源开销大,后期需要维护多个模型,成本高
- 不同任务之间存在相关性,单独建模无法利用不同目标之间的相关性
share bottom建模
为了解决上面的问题,早期出现了上图(a)所示的share bottom的模型结构,share bottom不同模型共享底座,上面接不同的task tower预估不同的任务,这样就可以实现一个模型建模多个目标,且通过共享bottom的方式,让不同目标学习相关性,提高模型的泛化能力。但是其也存在一些问题,那就是如果两个目标存在差异时,由于用的同一个bottom,训练时可能存在冲突,进而出现“跷跷板”现象,即迭代过程中出现一个任务指标涨了,另一个任务指标跌了
MMOE模型
为了解决“跷跷板”问题,这篇论文提出了MMOE模型(如上图(c)所示),MMOE底部由多个expert组成,每个任务会有gate去学习不同专家的权重,每个任务最终由不同专家网络的加权和组成
代码实现

import tensorflow as tf from tensorflow.keras.layers import Input, Dense from tensorflow.keras.models import Model import numpy as np # 定义专家网络 def expert_network(input_dim, expert_units): inputs = Input(shape=(input_dim,)) x = Dense(expert_units, activation='relu')(inputs) return Model(inputs=inputs, outputs=x) # 定义门控网络 def gate_network(input_dim, num_experts): inputs = Input(shape=(input_dim,)) x = Dense(num_experts, activation='softmax')(inputs) return Model(inputs=inputs, outputs=x) # 定义 MMOE 模型 def MMOE(input_dim, num_experts, expert_units, num_tasks): inputs = Input(shape=(input_dim,)) experts = [expert_network(input_dim, expert_units) for _ in range(num_experts)] gates = [gate_network(input_dim, num_experts) for _ in range(num_tasks)] expert_outputs = [expert(inputs) for expert in experts] # [(batch_szie, expert_units)] expert_outputs = tf.stack(expert_outputs, axis=1) # (batch_szie, num_experts, expert_units) task_outputs = [] for gate in gates: gate_output = gate(inputs) # (batch_szie, num_experts) gate_output = tf.expand_dims(gate_output, axis=-1) # (batch_szie, num_experts, 1) weighted_expert_output = expert_outputs * gate_output task_output = tf.reduce_sum(weighted_expert_output, axis=1) task_outputs.append(task_output) final_outputs = [Dense(1, activation='sigmoid')(task_output) for task_output in task_outputs] model = Model(inputs=inputs, outputs=final_outputs) return model # 生成模拟数据 input_dim = 10 num_samples = 1000 num_tasks = 2 num_experts = 3 expert_units = 16 X = np.random.randn(num_samples, input_dim) y = [np.random.randint(0, 2, num_samples) for _ in range(num_tasks)] # 划分训练集和测试集 train_size = int(num_samples * 0.8) X_train, X_test = X[:train_size], X[train_size:] y_train = [y_task[:train_size] for y_task in y] y_test = [y_task[train_size:] for y_task in y] # 创建 MMOE 模型 model = MMOE(input_dim, num_experts, expert_units, num_tasks) # 编译模型 model.compile(optimizer='adam', loss=['binary_crossentropy'] * num_tasks, metrics=['accuracy']) # 训练模型 model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.1) # 测试模型 results = model.evaluate(X_test, y_test) losses = results[:num_tasks+1] accuracies = results[num_tasks+1:] print(f"Test losses: {losses}") print(f"Test accuracies: {accuracies}")