Skip to main content

·81 words
Table of Contents

Referecne materials #

Pytorch implementation: https://hungyuling.com/blog/fast-mixture-of-experts-in-pytorch/

From Sebastian Raschka: https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023?open=false#%C2%A7mixture-of-experts

Youtube video:

  1. https://www.youtube.com/watch?v=7yR5ScbK1qk - history of MoE

some of the publications and repositories:

  1. The Sparsely-Gated Mixture-of-Experts Layer (2017) https://arxiv.org/abs/1701.06538

  2. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (2020) https://arxiv.org/abs/2006.16668

  3. MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (2022) https://arxiv.org/abs/2211.15841

  4. Mixture-of-Experts Meets Instruction Tuning (2023) https://arxiv.org/abs/2305.14705

  5. Furthermore, if you are interested in trying MoE LLMs, also check out the OpenMoE repository, which implemented and shared MoE LLMs earlier this year. https://github.com/XueFuzhao/OpenMoE