▲The most magical part of this transform is the search! First learned about this in a bioalgorithms course, and the really cool property is that for a string length l, you can search for the string in O(l) time. It has the same search time complexity of a suffix tree with O(n) space complexity (with a very low constant multiple). To this day it may be the coolest algorithm I've encountered.
reply▲I encountered the search version of this, which is turned suffix arrays, in grad school and was so taken by them I incorporated them as the primary search mechanism for the pi searcher. 25 years later it's still the best way to do it. Incredible insight behind bwt and suffix arrays.
reply▲A friend doing bioinformatics told me about this at uni, it was definitely one of those "i can't believe this is doable" sort of things.
reply▲For some reason I was really getting lost on step 2, “sort” it. They mean lexicographically or “sort it like each rotation is a word in the dictionary, where $ has the lowest character value”. Now that I see it, I’m a little embarrassed I got stuck on that step, but hopefully it helps someone else.
reply▲Noteworthy that the paper that was published describing this technique came at a time when IP lawyers had begun to destroy the world wrt to small business innovation. And that they released it unencumbered is a huge debt of gratitude that we haven’t really honored.
reply▲An interesting bit of trivia about the Burrows-Wheeler transform is that it was rejected when submitted to a conference for publication, and so citations just point to a DEC technical report, rather than a journal or conference article.
reply▲lancefisher10 hours ago
[-] This is a good article and a fun algorithm. Google made a great series of compression videos a while ago - albeit sometimes silly. For the Burrows-Wheeler one they interviewed Mike Burrows for some of the history.
https://youtu.be/4WRANhDiSHM reply▲I wonder if he came up with it when pondering permuted indexes. Which used to be a feature of the man pages system.
reply▲I've spent years playing with BWT and suffix sorting at school (some work can be found be names of archon and dark). It's a beautiful domain to learn!
Now I'm wondering: in the era of software 2.0, everything is figured out by AI. What are the chances AI would discover this algorithm at all?
reply▲No need to speculate about this algorithm in particular. It's been a few years since every programmer has had access to LLMs (that's a huge sample size!). Some LLMs are even branded as 'Research'. Plus they never get tired like humans do. We should be swimming in algorithms like this.
reply▲I remember randomly reading about this in the Hugi demoscene diskmag as a young teen, and it completely blew my mind (notably the reverse transform, apparently not covered in OP's article):
https://hugi.scene.org/online/coding/hugi%2013%20-%20cobwt.h...The author later wrote many now-famous articles, including A Trip Through the Graphics Pipeline.
reply▲raboukhalil11 hours ago
[-] Author here, nice to see the article posted here! I'm currently looking for ideas for other interactive articles, so let me know if think of other interesting/weird algorithms that are hard to wrap your head around.
reply▲The unintuitive part of BWT to me was always the reverse operation, which was the only thing the post didn't give intuition for :-)
reply▲SimplyUnknown1 hour ago
[-] I'm still not sure I get it. I think it is
1. Put the BWT string in the right-most empty column
2. Sort the rows of the matrix such that the strings read along the columns of the matrix are in lexicographical order starting from the top-row????
3. Repeat step 1 and 2 until matrix is full
4. Extract the row of the matrix that has the end-delimiter in the final column
It's the "sort matrix" step that seems under-explained to me.
reply▲Most BWT descriptions: now draw the rest of the owl.
Once upon a time I found one that did the entire round trip, I neglected to bookmark it. So when I eventually forgot how it worked, or wanted to show others, I was stuck. Never did find it.
reply▲raboukhalil11 hours ago
[-] Good point, thanks! I'll add a subsection about the intuition for that.
reply▲fair_enough11 hours ago
[-] Thanks, man! This helps a lot, because as you say the algorithm is not intuitive. The description of the type of data it is very good for is the best value-adding part of this writeup. Greatly appreciated!
reply▲This is great. The best article I have read on BWT and experimented was in Dr Dobbs Journal around 1996-97.
reply▲loloquwowndueo11 hours ago
[-] I was once mentioning this transform to someone and they were like “what? The Bruce Willis transform?” - I guess my pronunciation sucked that much.
reply▲Bukhmanizer10 hours ago
[-] Once in a lecture one of my friends asked “You keep mentioning Cooper Netties, who is Cooper Netties?” Everyone was very confused until someone figured out she was asking about Kubernetes.
reply▲For an article describing a compression algorithm this was very digestible and entertaining.
reply