Table of Contents
Referecne materials #
Pytorch implementation: https://hungyuling.com/blog/fast-mixture-of-experts-in-pytorch/
From Sebastian Raschka: https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023?open=false#%C2%A7mixture-of-experts
- https://magazine.sebastianraschka.com/i/141130005/mixtral-of-experts
- https://magazine.sebastianraschka.com/i/139848187/mixture-of-experts
Youtube video:
- https://www.youtube.com/watch?v=7yR5ScbK1qk - history of MoE
some of the publications and repositories:
The Sparsely-Gated Mixture-of-Experts Layer (2017) https://arxiv.org/abs/1701.06538
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (2020) https://arxiv.org/abs/2006.16668
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (2022) https://arxiv.org/abs/2211.15841
Mixture-of-Experts Meets Instruction Tuning (2023) https://arxiv.org/abs/2305.14705
Furthermore, if you are interested in trying MoE LLMs, also check out the OpenMoE repository, which implemented and shared MoE LLMs earlier this year. https://github.com/XueFuzhao/OpenMoE