At 11 nodes and 87 AMD Instinct MI355X GPUs, we delivered 1,042,110 tokens per second in Offline, 1,016,380 tokens per second in Server and 785,522 tokens per second in Interactive. Scale-out efficiency reached 93% in Offline, 93% in Server and 98% in Interactive. Offline scale-out is the more standard path, but Server and Interactive are harder because they must maintain latency requirements as the cluster grows, which makes these results especially compelling.