FilterHN

Ask HN: When has a "dumb" solution beaten a sophisticated one for you?

6 points

17 hours ago

| 10 comments

Recently built something where simple domain-specific heuristics crushed a fancy ML approach I assumed would win. This has me thinking about how often we reach for complex tools when simpler ones would work better. Occam's razor moments.

Anyone have similar stories? Curious about cases where knowing your domain beat throwing compute at the problem.

▲

atrettel

3 hours ago

[-]

I recently wrote a command-line full-text search engine [1]. I needed to implement an inverted index. I choose what seems like the "dumb" solution at first glance: a trie (prefix tree).

There are "smarter" solutions like radix tries, hash tables, or even skip lists, but for any design choice, you also have to examine the tradeoffs. A goal of my project is to make the code simpler to understand and less of a black box, so a simpler data structure made sense, especially since other design choices would not have been all that much faster or use that much less memory for this application.

I guess the moral of the story is to just examine all your options during the design stage. Machine learning solutions are just that, another tool in the toolbox. If another simpler and often cheaper solution gets the job done without all of that fuss, you should consider using it, especially if it ends up being more reliable.

[1] https://github.com/atrettel/wosp

▲

bawis

2 hours ago

[-]

What body of knowledge (books, tutorials etc) did you use while developing it?

▲

atrettel

1 hour ago

[-]

Before I started the project, I was already vaguely familiar with the notion of an inverted index [1]. That small bit of knowledge meant that I knew where to start looking for more information and saved me a ton of time. Inverted indices form the bulk of many search engines, with the big unknown being how you implement it. I just had to find an adequate data structure for my application.

To figure that out, I remember searching for articles on how to implement inverted indices. Once I had a list of candidate strategies and data structures, I used Wikipedia supplemented by some textbooks like Skiena's [2] and occasionally some (somewhat outdated) information from NIST [3]. I found Wikipedia quite detailed for all of the data structures for this problem, so it was pretty easy to compare the tradeoffs between different design choices here. I originally wanted to implement the inverted index as a hash table but decided to use a trie because it makes wildcard search easier to implement.

After I developed most of the backend, I looked for books on "information retrieval" in general. I found a history book (Bourne and Hahn 2003) on the development of these kind of search systems [4]. I read some portions of this book, and that helped confirm many of the design choices that I made. I actually was just doing what people traditionally did when they first built these systems in the 1960s and 1970s, albeit with more modern tools and much more information on hand.

The harder part of this project for me was writing the interpreter. I actually found YouTube videos on how to write recursive descent parsers to be the most helpful there, particular this one [5]. Textbooks were too theoretical and not concrete enough, though Crafting Interpreters was sometimes helpful [6].

[1] https://en.wikipedia.org/wiki/Inverted_index

[2] https://doi.org/10.1007/978-3-030-54256-6

[3] https://xlinux.nist.gov/dads/

[4] https://doi.org/10.7551/mitpress/3543.001.0001

[5] https://www.youtube.com/watch?v=SToUyjAsaFk

[6] https://craftinginterpreters.com/

▲

conditionnumber

5 hours ago

[-]

Still happens all the time in certain finance tasks (eg trying to predict stock prices), but I'm not sure how long that will hold. As for why that might be, I don't think I can do any better than linking to this comment about a comment about your question: <https://news.ycombinator.com/item?id=45306256>.

I suspect that locating the referenced comment would require a semantic search system that incorporates "fancy models with complex decision boundaries". A human applying simple heuristics could use that system to find the comment.

In the "Dictionary of Heuristic" chapter, Polya's "How to Solve it" says this: *The feeling that harmonious simple order cannot be deceitful guides the discover in both in mathematical and in other sciences, and is expressed by the Latin saying simplex sigillum veri (simplicity is the seal of truth).*

▲

acheong08

3 hours ago

[-]

For me, CP-SAT is the "dumb" solution that works in a lot of situations. Whenever a hackathon has a problem definable in constraints, that tends to be the first path I take and generally scores top 5

▲

al_borland

7 hours ago

[-]

I have a silly little internal website I use for bookmarks, searching internal tools, and some little utilities. I keep getting pressure to put it into our heavy and bespoke enterprise CICD process. I’ve seen people quit over trying to onboard into this thing… more than one. It’s complete overkill for my silly little site.

My “dumb” solution is a little Ansible job that just runs a git pull on the server. It gets the new code and I’m done. The job also has an option to set everything up, so if the server is wiped out for some reason I can be back up and running in a couple minutes by running the job with a different flag.

▲

austin-cheney

13 hours ago

[-]

I occasionally see people complaining about long TypeScript compile times where a small code base can take multiple minutes (possibly 10 minutes). I think to myself WTF, because large code bases should take no more than 20 seconds on ancient hardware.

On another note I recently wrote this large single page app that is just a collection of functions organized by page sections as a collection of functions according to a nearly flat typescript interface. It’s stupid simple to follow in the code and loads as fast as an eighth of a second. Of course that didn’t stop HN users from crying like children for avoiding use of their favorite framework.

▲

eastoeast

17 hours ago

[-]

I’m mostly a hardware engineer.

I needed to test pumping water through a special tube, but didn’t have access to a pump. I spent days searching how to rig a pump to this thing.

Then I remembered I could just hang a bucket of water up high to generate enough head pressure. Free instant solution!

▲

iamflimflam1

8 hours ago

[-]

I wrote a clone of battle zone the old Atari tank game. For the enemy tank “AI” I just used a simple state machine with some basic heuristics.

This gave a great impression of an intelligent adversary with very minimal code and low CPU overhead.

▲

commandersaki

15 hours ago

[-]

I remember Scalyr, at least before they were bought by SentinelOne basically did parallel / SIMD grep for each search query and consistently beat data that was continually indexed by the likes of Splunk and ElasticSearch.

▲

helix90

17 hours ago

[-]

The common one I fought long ago was folks who always use regular expressions when what they want is a string match, or contains, or other string library function.

▲

hahahahhaah

16 hours ago

[-]

Seen people tripped up with dynamodb like stores, especially when they have a misleading sql interface like Azure tables.

You cant be "agile" with them, you need to design your data storage upfront. Like a system design interview :).

Just use postgres (or friends) until you are webscale. Unless you really have a problem amenible to key/value storage.