https://lmsys.org/blog/2024-02-05-compressed-fsm/?ref=aidancooper.co.uk
The model described in your paper still uses some amount of inference to generate JSON keys. Plus, each JSON key becomes part of the expanding context window. These keys aren’t free to generate.