· Krzysztof Sopyla AI Blog

Table of Contents

Referecne materials #

Youtube video:

some of the publications and repositories:

The Sparsely-Gated Mixture-of-Experts Layer (2017) https://arxiv.org/abs/1701.06538
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (2020) https://arxiv.org/abs/2006.16668
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (2022) https://arxiv.org/abs/2211.15841
Mixture-of-Experts Meets Instruction Tuning (2023) https://arxiv.org/abs/2305.14705
Furthermore, if you are interested in trying MoE LLMs, also check out the OpenMoE repository, which implemented and shared MoE LLMs earlier this year. https://github.com/XueFuzhao/OpenMoE