From TFA: https://arxiv.org/pdf/2510.06189
> We term this approach as AI-Driven Research for Systems (ADRS), which iteratively generates, evaluates, and refines solutions.
> The central thesis of this paper is that a new class of AI-driven approaches, which we term AI-Driven Research for Systems (ADRS), is beginning to show promising results in automated algorithm discovery, and will ultimately prompt a re-evaluation of the traditional role of systems researchers.
The pattern might be a familiar trick to those experienced with this kind of problem — you can see my thoughts on it here: https://news.ycombinator.com/item?id=45688236#45689440
I agree that this is a fairly simple problem. Experienced engineers—or anyone who has faced similar challenges—can quickly come up with such solutions. The key point, however, is that others might get stuck in their research simply because they don’t realize these quick solutions exist (“I don’t know what I don’t know”). AI helps bridge that gap by making expert-level knowledge accessible to every researcher, allowing them to focus more on exploring the truly unknown parts.
EDIT: The chutzpah of downvoting this is striking. The paper says "surpasses highly optimized algorithms engineered by human experts to achieve a 5.0x speedup" and https://news.ycombinator.com/item?id=45689663 links to a 2024 paper where humans discovered a 4.2x speedup using a snake pattern. The 2024 paper is not cited.
What "AI" is best at is enabling theft without crediting the true creators
>First, we evaluate DeepSeek's open-source EPLB implementation. This employs a greedy bin-packing strategy: experts are sorted by load in descending order, and each is placed onto the least-loaded GPU that has capacity (Figure 3a, Example 1). While simple, the solution is slow because it written in Python and uses a for-loop to performs linear search for finding the best-fit GPU choice.
This is because when considering a load balancing algorithm, unless the work being done (in this case by the GPU) lasts only a few ms, the load balancing algorithm being fast will never be the bottleneck. The post does not mention whether this is the case at all.
Also, I don't want to sound rude, but if all they managed to get is a 5x increase over a simple python algorithm, I don't think this is impressive at all...? Any rewrite of the 'dumb' algorithm in a language with more memory control and cache continuity should result in much better results.
The original algorithm was provided by DeepSeek, and our optimized implementation achieves a 92× speedup over it. The 5x number is comparing with another baseline that is undisclosed yet.
When integrating EPLB into vLLM, I discovered—somewhat unexpectedly—that the open-source algorithm consumes nearly half of the total time of a rearrangement step, with the remaining time spent transferring weights across GPUs. To address this, I applied OpenEvolve to the algorithm, setting the primary objective to improve speed while maintaining the same balance factor. It performed remarkably well. With additional optimizations on the weight transferring, the overall overhead has now become almost negligible.
e: also comparison a fixed (nothing faster than 0!) and random policy might be informative if your intent is to publish this as improvement for the object problem, not just a demonstration of ARDS.
https://softwaredoug.com/blog/2025/10/19/agentic-code-genera...
The odds of that working, though, are of course pretty near 0. But theoretically, it could happen.
It's a remote possibility, but it is a possibility, isn't it?
A sufficiently advanced discovery in, say, mathematics can only be understood by other mathematicians. Does that make it less of a discovery? So what's wrong if a machine discovers something that can only be analysed and proved by other machines?
> On average, it takes about 540 ms to re-balance the experts and achieves a load balance factor of 0.66 (calculated as the ratio of average to maximum tokens generated per GPU).
> ...
> We also consider a non-public reference implementation from a frontier lab that we have access to. This implementation avoids explicit iteration and reduces the rebalancing algorithm runtime to 19.6 ms while achieving the same balance factor as the open-source algorithm.
> ...
> The resulting algorithm matches the load balance factor of the other baselines while reducing runtime to just 3.7 ms, yielding a 5.0x speedup over the internal reference implementation.
Actually I think the evaluator will be the most important part for the whole pipeline to work
We are in the absolute worst timeline.