Audio is the one area small labs are winning
57 points
2 days ago
| 5 comments
| amplifypartners.com
| HN
tl2do
42 minutes ago
[-]
This matches my experience. In Kaggle audio competitions, I've seen many competitors struggle with basics like proper PCM filtering - anti-aliasing before downsampling, handling spectral leakage, etc.

Audio really is a blue ocean compared to text/image ML. The barriers aren't primarily compute or data - they're knowledge. You can't scale your way out of bad preprocessing or codec choices.

When 4 researchers can build Moshi from scratch in 6 months while big labs consider voice "solved," it shows we're still in a phase where domain expertise matters more than scale. There's an enormous opportunity here for teams who understand both ML and signal processing fundamentals.

reply
dkarp
1 hour ago
[-]
There's too much noise at large organizations
reply
echelon
21 minutes ago
[-]
They're focused on soaking up big money first.

They'll optimize down the stack once they've sucked all the oxygen out of the room.

Little players won't be able to grow through the ceiling the giants create.

reply
bossyTeacher
1 hour ago
[-]
Surprised ElevenLabs is not mentioned
reply
krackers
6 minutes ago
[-]
reply
amelius
1 hour ago
[-]
Probably because the big companies have their focus elsewhere.
reply
giancarlostoro
51 minutes ago
[-]
OpenAI being the death star and audio AI being the rebels is such a weird comparison, like what? Wouldn't the real rebels be the ones running their own models locally?
reply
tl2do
37 minutes ago
[-]
True, but there's a fun irony: the Rebels' X-Wings are powered by GPUs from a company that's... checks relationships ...also supplying the Empire.

NVIDIA's basically the galaxy's most successful arms dealer, selling to both sides while convincing everyone they're just "enabling innovation." The real rebels would be training audio models on potato-patched RP2040s. Brave souls, if they exist.

reply