Engineering Blog

                            

Efficient MoE Pre-training at Scale on 1K AMD GPUs with TorchTitan