Back in the elden days, I took a course called "Large Scale Scientific Computing". It was mostly about multiplying large matrices. I didn't think this was going to be remotely applicable to anything commercial.
Only read the first section but this sounds really impressive -- up to 50% of up to 17% of training time when using the Muon optimiser, so up to around 7% of basically pure improvement with no downside.