FilterHN

CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

56 points

by dzign

3 hours ago

| past

| 4 comments

| github.com

| HN

▲

j2kun

2 hours ago

[-]

They claim the algorithm "discovered" the new techniques, but the methods described in section 5 do not seem all that novel to me. It smells like it could be "laundering" the literature [1] and reshuffling existing techniques. This is not inherently a bad thing, but I would hope that if it is borrowing existing techniques, the appropriate citation would eventually make it into this paper.

[1]: https://www.argmin.net/p/lore-laundering-machines

▲

Q6T46nT668w6i3m

1 hour ago

[-]

You’re not kidding. I just looked. There isn’t anything novel in that section. I assumed from the description they found novel methods but this is standard GPU Gems advice.

▲

alyxya

2 hours ago

[-]

There generally aren't new techniques when optimizing something ubiquitous. Instead, there are a lot of ways to apply existing techniques to create new and better results. Most ideas are built on top of the same foundational principles.

▲

slashdave

13 minutes ago

[-]

I am not sure about that. However, what is clear is that if there is a new technique, it will not be found by this LLM.

▲

AlexCoventry

2 hours ago

[-]

In the future, we will all be Jürgen Schmidhuber. :-)

▲

alyxya

2 hours ago

[-]

The chart confused me because I expected to see performance numbers of CUDA-L2 compared to the others, but instead it shows a chart showing the speedup percentage of CUDA-L2 over the others. In some sense, the bar chart effectively inverts the performance of torch.matmul and cuBLAS with how much percentage it shows. 0% on the bar chart would only mean equal performance.

▲

stonogo

2 hours ago

[-]

Am I reading this wrong, or does this only support FP16 inputs, and compares its performance against an FP32 solver?

▲

bgwalter

2 hours ago

[-]

-4 -4 -4 -4 -4

▲

krapht

2 hours ago

[-]

This is a standard which few kernels will ever meet. I'd say requiring a numerical proof is the same as requiring no proof at all - because it won't ever happen unless you're validating silicon or something equally expensive.

▲

Q6T46nT668w6i3m

1 hour ago

[-]

I guess it depends on your definition of proof but I’d say the reasoning and justifications sections of a TOMS article qualifies and that’s a standard nearly every popular library meets.