Decision trees – the unreasonable power of nested decision rules
143 points
4 hours ago
| 6 comments
| mlu-explain.github.io
| HN
lokimedes
43 minutes ago
[-]
When I worked at CERN around 2010, Boosted Decision Trees were the most popular classifier, exactly due to the (potential for) explainability along with its power of expression. We had a cultural aversion for neural networks back then, especially if the model was used in physics analysis directly. Times have changed…
reply
wodenokoto
24 minutes ago
[-]
Are boosted decision trees the same as a boosted random forest?
reply
boccaff
14 minutes ago
[-]
short answer: No.

longer answer: Random forests use the average of multiple trees that are trained in a way to reduce the correlation between trees (bagging with modified trees). Boosting trains sequentially, with each classifier working on the resulting residuals so far.

I am assuming that you meant boosted decision trees, sometimes gradient boosted decisions trees, as usually one have boosted decision trees. I think xgboost added boosted RF, and you can boost any supervised model, but it is not usual.

reply
fooker
1 hour ago
[-]
Fun fact - single bit neural networks are decision trees.

In theory, this means you can 'compile' most neural networks into chains of if-else statements but it's not well understood when this sort of approach works well.

reply
Almondsetat
1 hour ago
[-]
Do you know of any software that does this? Or any papers on the matter? It could be a fun weekend project
reply
tomashubelbauer
1 hour ago
[-]
Made me think of https://github.com/xoreaxeaxeax/movfuscator. Would be definitely cool to see it realized even if it would be incredibly impractical (probably).
reply
kqr
1 hour ago
[-]
Experts' nebulous decision making can often be modelled with simple decision trees and even decision chains (linked lists). Even when the expert thinks their decision making is more complex, a simple decision tree better models the expert's decision than the rules proposed by the experts themselves.

I've long dismissed decision trees because they seem so ham-fisted compared to regression and distance-based clustering techniques but decision trees are undoubtedly very effective.

See more in chapter seven of the Oxford Handbook of Expertise. It's fascinating!

reply
ablob
53 minutes ago
[-]
I once saw a visualization that basically partitioned decisions on a 2D plane. From that perspective, decision trees might just be a fancy word for kD-Trees partitioning the possibility space and attaching an action to the volumes.

Given that assumption, the nebulous decision making could stem from expert's decisions being more nuanced in the granularity of the surface separating 2 distinct actions. It might be a rough technique, but nonetheless it should be able to lead to some pretty good approximations.

reply
srean
45 minutes ago
[-]
You have this thing so backwards that it is embarrassingly hilarious. Little knowledge is a dangerous thing.

Decision trees predate KD trees by a decade.

Both use recursive partitioning of function domain a fundamental and an old idea.

reply
zelphirkalt
1 hour ago
[-]
Decision trees are great. My favorite classical machine learning algorithm or group of algorithms, as there are many slight variations of decision trees. I wrote a purely functional (kind of naive) parallelized implementation in GNU Guile: https://codeberg.org/ZelphirKaltstahl/guile-ml/src/commit/25...

Why "naive"? Because there is no such thing as NumPy or data frames in the Guile ecosystem to my knowledge, and the data representation is therefore probably quite inefficient.

reply
srean
33 minutes ago
[-]
What benefit does numpy or dataframes bring to decision tree logic over what is available in Guile already ? Honest question.

Guile like languages are very well suited for decision trees, because manipulating and operating on trees is it's mother tongue. Only thing that would be a bit more work would be to compile the decision tree into machine code. Then one doesn't have traverse a runtime structure, the former being more efficient.

BTW take a look at Lush, you might like it.

https://lush.sourceforge.net/

https://news.ycombinator.com/item?id=2406325

If you are looking for vectors and tensors in Guile, there is this

https://wedesoft.github.io/aiscm/

reply
boccaff
12 minutes ago
[-]
tree algorithms on sklearn use parallel arrays to represent the tree structure.
reply
xmprt
3 hours ago
[-]
Interesting website and great presentation. My only note is that the color contrast of some of the text makes it hard to read.
reply
thesnide
2 hours ago
[-]
exactly my thought. and here thr reader view of FF is a godsend.

having 'accessible' content is not only for people with disabilities, it also help with bad color taste.

well, at least bad taste for readable content ;)

reply
moi2388
1 hour ago
[-]
That was beautifully presented!
reply