FilterHN

Fun with Finite State Transducers

44 points

by woodruffw

4 days ago

| past

| 2 comments

| blog.yossarian.net

| HN

▲

zokier

11 hours ago

[-]

I've been thinking that FST could be well suited for building routers for web frameworks. I.e. matching request path `/foo/42` to a set of route patterns like `/foo/{id}` etc. As I understand FST should be near perfect for this use and could potentially allow very good performance. Especially if you can construct the FST at compile time. Somewhat surprisingly I haven't seen anyone doing anything in that direction though, and FST literature is bit difficult if you don't have formal NLP background

▲

rickette

8 hours ago

[-]

Radix Trees are (often?) used for this purpose, for example in Chi: https://github.com/go-chi/chi. Coincidentally FSTs and Radix trees share some similarities.

▲

woodruffw

4 hours ago

[-]

> Coincidentally FSTs and Radix trees share some similarities

Indeed -- I think a useful way to comprehend FSTs is as a radix/prefix trie that also allows suffix compaction. There's probably a formal correlation proof between the two, but I don't know it off the top of my head.

▲

jyndbeb

16 hours ago

[-]

I'm not quite sure I understand how they are applicable, since access patterns itself can be arbitrary complex

For example how is github.event.pull_request.labels[another_variable].name handled?

▲

yorwba

15 hours ago

[-]

Since it only needs to classify the result as one of "fixed", "structured" or "arbitrary", it doesn't matter what the value of "another_variable" is, you'll always be in the same state at the end. Finite state transducers can handle an infinite number of possibilities with a finite number of states just fine.

It's a bit strange though that the author constructs the FST to only recognize a finite list of strings. Probably some of the matching logic that could have been part of the FST is pushed into the normalizer instead.

▲

woodruffw

4 hours ago

[-]

Author here.

> Probably some of the matching logic that could have been part of the FST is pushed into the normalizer instead.

Yep, exactly -- there's a great deal of prenormalization that in principle could be pushed into automata run over the FST instead. The reason I didn't do that is laziness, but it's the obvious next step :-)