[1] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

728x90

Contribution

They propose a novel Multi-gate Mixture-of-Experts model which explicitly models task relationships.
They conduct control experiments on synthetic data. They report how task relatedness affects training dynamics in multi-task learning and how MMoE improves both model expressiveness and trainability.
They conduct experiments on real benchmark data and a large-scale production recommendation system with hundreds of millions of users and items.

Each Experts is a feed-forward network.

The gating networks take the input features and output softmax gates assembling the experts with different weights, allowing different tasks to utilize experts differently

The results of the assembled experts are then passed into the task-specific tower networks.

In this way, the gating networks for different tasks can learn different mixture patterns of experts assembling, and thus capture the task relationships.

+ They conduct a synthetic experiment.

Datasets : UCI Census-income dataset

Modeling Approaches

$$
\begin{align}
y_k = h^k(f(x)) \tag{1} \\
y = \sum_{i=1}^{n} g(x)_i f_i(x) \tag{5} \\
\end{align}
$$

Eq (1) is the Shared-bottom Multi-task Model.
$K$ tasks,
shared-bottom network $f$,
$K$ tower networks $h^k$.

Eq (5) is the Original MoE Model.
$ f_i $ 는 n개의 expert network,
$ g $ 는 모든 experts의 결과를 ensemble하는 gating network.

The new model is called Multi-gate Mixture-of-Experts (MMoE) model, where the key idea is to substitute the shared bottom network $ f $ in Eq (1) with the MoE layer in Eq (5).

$$
\begin{align}
y_k = h^k(f^k(x)) \tag{6} \\
f^k(x) = \sum_{i=1}^{n} g^k(x)_i f_i(x) \tag{7} \\
g^k(x) = \text{softmax}(W_{gk} \cdot x) \tag{8}
\end{align}
$$

Eq (5)가 Eq (7)로 수정됨. $g_k$는 각 task k에 대해 separate gating network이기 때문.
$ W_{gk} \in \mathbb{R}^{n \times d} $ is a trainable matrix.
- $n$ is the number of experts
- $d$ is the feature dimension

728x90

'AI-LAB > 논문리뷰' 카테고리의 다른 글

[3] Switch Transformers: Scaling to Trillion Parameter Modelswith Simple and Efficient Sparsity (0)	2025.03.12
꽤 괜찮은 논문 요약 프롬프트 (0)	2025.03.08
[2] Recommending What Video to Watch Next: A MultitaskRanking System(2019 RecSys) (0)	2025.03.04
LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading (2)	2025.02.16
Modeling Interactions Between Stocks Using LLM-Enhanced Graphs for Volume Prediction (0)	2025.02.16

JS LAB

[1] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

Modeling Approaches

'AI-LAB > 논문리뷰' 카테고리의 다른 글

티스토리툴바

[1] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

Modeling Approaches

'AI-LAB > 논문리뷰' 카테고리의 다른 글

관련글

티스토리툴바