Fun with Finite State Transducers
44 points
4 days ago
| 2 comments
| blog.yossarian.net
| HN
zokier
11 hours ago
[-]
I've been thinking that FST could be well suited for building routers for web frameworks. I.e. matching request path `/foo/42` to a set of route patterns like `/foo/{id}` etc. As I understand FST should be near perfect for this use and could potentially allow very good performance. Especially if you can construct the FST at compile time. Somewhat surprisingly I haven't seen anyone doing anything in that direction though, and FST literature is bit difficult if you don't have formal NLP background
reply
rickette
8 hours ago
[-]
Radix Trees are (often?) used for this purpose, for example in Chi: https://github.com/go-chi/chi. Coincidentally FSTs and Radix trees share some similarities.
reply
woodruffw
4 hours ago
[-]
> Coincidentally FSTs and Radix trees share some similarities

Indeed -- I think a useful way to comprehend FSTs is as a radix/prefix trie that also allows suffix compaction. There's probably a formal correlation proof between the two, but I don't know it off the top of my head.

reply
jyndbeb
16 hours ago
[-]
I'm not quite sure I understand how they are applicable, since access patterns itself can be arbitrary complex

For example how is github.event.pull_request.labels[another_variable].name handled?

reply
yorwba
15 hours ago
[-]
Since it only needs to classify the result as one of "fixed", "structured" or "arbitrary", it doesn't matter what the value of "another_variable" is, you'll always be in the same state at the end. Finite state transducers can handle an infinite number of possibilities with a finite number of states just fine.

It's a bit strange though that the author constructs the FST to only recognize a finite list of strings. Probably some of the matching logic that could have been part of the FST is pushed into the normalizer instead.

reply
woodruffw
4 hours ago
[-]
Author here.

> Probably some of the matching logic that could have been part of the FST is pushed into the normalizer instead.

Yep, exactly -- there's a great deal of prenormalization that in principle could be pushed into automata run over the FST instead. The reason I didn't do that is laziness, but it's the obvious next step :-)

reply