I like the power of `jq` and the fact that LLMs are proficient at it, but I find it right out impossible to come up with the right `jq` incantations myself. Has anyone here been in a similar situation? Which tool / language did you end up exposing to your users?
it's a pipeline operating on a stream of independent json terms. The filter is reapplied to every element from the stream. Streams != lists; the latter are just a data type. `.` always points at the current element of the stream. Functions like `select` operate on separate items of the stream, while `map` operates on individual elements of a list. If you want a `map` over all elements of the stream: that's just what jq is, naturally :)
stream of a single element which is a list:
echo '[1,2,3,4]' | jq .
# [1,2,3,4]
unpack the list into a stream of separate elements: echo '[1,2,3,4]' | jq '.[]'
# 1
# 2
# 3
# 4
echo '[1,2,3,4]' | jq '.[] | .' # same: piping into `.` is a NOP:
only keep elements 2 and 4 from the stream, not from the array--there is no array left after .[] : echo '[1,2,3,4]' | jq '.[] | select(. % 2 == 0)'
# 2
# 4
keep the array: echo '[1,2,3,4]' | jq 'map(. * 2)'
# [2,4,6,8]
map over individual elements of a stream instead: echo '[1,2,3,4]' | jq '.[] | . * 2'
# 2
# 4
# 6
# 8
printf '1\n2\n3\n4\n' | jq '. * 2' # same
This is how you can do things like printf '{"a":{"b":1}}\n{"a":{"b":2}}\n{"a":{"b":3}}\n' | jq 'select(.a.b % 2 == 0) | .a'
# {"b": 2}
select creates a nested "scope" for the current element in its parens, but restores the outer scope when it exits.Hope this helps someone else!
(LLMs are already very adept at using `jq` so I would think it was preferable to be able to prompt a system that implements querying inside of source code as "this command uses the same format as `jq`")
Yes there are issues with ideologically motivated moderators, poorly cited articles, etc. But even with its flaws, it's an amazing resource provided to the public for free (as in coffee and maybe as in speech also).
[1] - https://duckdb.org/docs/stable/data/json/overview [2] - https://www.malloydata.dev/
I've observed that too many users of jq aren't willing to take a few minutes to understand how stream programming works. That investment pays off in spades.
Are you interested in having help writing more scenarios? I’ve had a couple ideas for similar kata-like exercises that I haven’t shared publicly. Happy to send a PR or something if it would provide value
``` $sum($myArrayExtractor($.context)) ```
where `$myArrayExtractor` is your custom code.
---
Re: "how did it go"
We had a situation where we needed to generate EDI from json objects, which routinely required us to make small tweaks to data, combine data, loop over data, etc. JSONata provided a backend framework for data transformations that reduced the scope and complexity of the project drastically.
I think JSONata is an excellent fit for situations where companies need to do data transforms, for example when it's for the sake of integrations from 3rd-party sources; all the data is there, it just needs to be mapped. Instead of having potentially buggy code as integration, you can have a pseudo-declarative jsonata spec that describes the transform for each integration source, and then just keep a single unified "JSONata runner" as the integration handler.
It's nice because we can just put the JSONata expression into a db field, and so you can have arbitrary data transforms for different customers for different data structures coming or going, and they can be set up just by editing the expression via the site, without having to worry about sandboxing it (other than resource exhaustion for recursive loops). It really sped up the iteration process for configuring transforms.
It made my life a lot easier
Just use jq. None of the other ones are as flexible or widespread and you just end up with frustrated users.
Which isn't to say jq is the best or even good but its battle-tested and just about every conceivable query problem has been thrown at it by now.
I then switched to JavaScript / TypeScript, which I found much better overall: it's understandable to basically every developer, and LLMs are very good at it. So now in my app I have a button wherever a TypeScript snippet is required that asks the LLM for its implementation, and even "weak" models one-shot it correctly 99% of the times.
It's definitely more difficult to set up, though, as it requires a sandbox where you can run the code without fears. In my app I use QuickJS, which works very well for my use case, but might not be performant enough in other contexts.
obj.friends.filter(x=>{ return x.city=='New York'})
.sort((a, b) => a.age - b.age)
.map(item => ({ name: item.name, age: item.age }));
does exactly the same without any plugin.am I missing something?
To your point abstractions often multiply and then hide the complexity, and create a facade of simplicity.
Things like https://jsonlogic.com/ works better if you wish to expose a rest api with a defined query schema or something like that. Instead of accepting a query `string`. This seems better as in you have a string format and a concrete JSON format. Also APIs to convert between them.
Also, if you are building a filter interface, having a structured representation helps:
https://react-querybuilder.js.org/demo?outputMode=export&exp...
mapValues(mapKeys(substring(get(), 0, 10)))
This is all too cute. Why not just use JavaScript syntax? You can limit it to the exact amount of functionality you want for whatever reason it is you want to limit it.
Kudos for all the work, it's a nice language. I find writing parsers a very mind-expanding activity.
Admittedly I don't know that much about LLM optimization/configuration, so apologies if I'm asking dumb questions. Isn't the value of needing to copy/paste that prompt in front of your queries a huge bog on net token efficiency? Like wouldn't you need to do some hundred/thousand query translations just to break even? Maybe I don't understand what you've built.
Cool idea either way!
Helpful when querying JSON API responses that are parsed and persisted for normal, relational uses. Sometimes you want to query data that you weren’t initially parsing or that matches a fix to reprocess.
it might just be a very limited subset?
I implemented one day of advent of code in jq to learn it: https://github.com/ivanjermakov/adventofcode/blob/master/aoc...