Unfortunately I haven't had time to maintain it so it is now archived and out of date. I hope that I might get a chance to update it again. More has happened since then and there are low hanging fruit to make it more usable, for example adding connector_arrow (https://github.com/aljazerzen/connector_arrow) support for other databases, etc...
Quick example of how things looked with pq:
```sh
$ pq --from i=invoices.csv "from i | take 5"
+------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+
| invoice_id | customer_id | invoice_date | billing_address | billing_city | billing_state | billing_country | billing_postal_code | total |
+------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+
| 1 | 2 | 2009-01-01T00:00:00.000000000 | Theodor-Heuss-Straße 34 | Stuttgart | | Germany | 70174 | 1.98 |
| 2 | 4 | 2009-01-02T00:00:00.000000000 | Ullevålsveien 14 | Oslo | | Norway | 0171 | 3.96 |
| 3 | 8 | 2009-01-03T00:00:00.000000000 | Grétrystraat 63 | Brussels | | Belgium | 1000 | 5.94 |
| 4 | 14 | 2009-01-06T00:00:00.000000000 | 8210 111 ST NW | Edmonton | AB | Canada | T6G 2C7 | 8.91 |
| 5 | 23 | 2009-01-11T00:00:00.000000000 | 69 Salem Street | Boston | MA | USA | 2113 | 13.86 |
+------------+-------------+-------------------------------+-------------------------+--------------+---------------+-----------------+---------------------+-------+
$ # When there is only one input table then this automatically becomes the source relation, i.e. `from i | ` is prepended to the query
$ # so this can be simplified to:
$ pq --from invoices.csv "take 5"
...
```
In a previous HN comment [1] I showed how you can leverage PRQL to make your RDBM's JSON functionality more ergonomic. For example
```sh
> prqlc compile <<EOF - | duckdb
let get = path obj -> s"""{obj} -> {path}"""
let getstr = path obj -> s"""{obj} ->> {path}"""
let extract = obj path -> s"""json_extract({obj}, {path})"""
from [{data='{"duck": [1, 2, 3]}'}]
select { (data | get '$.duck[0]'), (data | getstr '$.duck[1]'), extract data '$.duck[2]'}
EOF
┌───────────────────────┬──────────────────────────┬───────────────────────────────────┐
│ "data" -> '$.duck[0]' │ ("data" ->> '$.duck[1]') │ json_extract("data", '$.duck[2]') │
│ json │ varchar │ json │
├───────────────────────┼──────────────────────────┼───────────────────────────────────┤
│ 1 │ 2 │ 3 │
└───────────────────────┴──────────────────────────┴───────────────────────────────────┘
```
More details in that post.Unfortunately I don't think this really addresses the grandparent comment though because you're still using jsonpath type expressions to unpack the JSON objects. If you really wanted to use PRQL for everything you would have to first convert and flatten your JSON data into relational tables.
From learning perspective JQ seems much more difficult than RegEX for example, its learning curve is potentially steeper than that of CSS and XPATH, which are other examples for querying tree-like-structs. While LLMs are welcome to write it (the jq) for me, my work has relatively little JSON transformations, and for the most part handling these in python/js/perl is okay as.
Stating all this with much fascination for the JQ language itself, as technology, but not as a tool that I find the need for on a daily basis. Besides for me it is much more easier to feed and transform data into Postgres (or even SQLite, which can also be challenging), rather than crunch it w/pandas or R where you can also find fourth generation language capabilities, but performance lags.
That was exactly the target audience.
Note that sq doesn't just handle relational DBs, it also has (varying quality) support for CSV, JSON, Excel, and so on. At the time (2013) I wasn't aware of a convenient one-liner mechanism for munging all of those together from the command line.
JQ is great because JSON didn't have a query language & needed one. JQ isn't the best query language - it's just the one that won adoption.
DBs already have query languages that are mostly superior to JQ's.
Developer here. I can't speak to what you see as the weird trend, but I can speak about sq's history:
- I created the first version of sq circa 2013 as my own personal dev tool to address some pain points (amongst other things: getting JSON out of various DBs from the command line in a consistent manner, and wanting an easy way to inspect DB schemas, i.e. "sq inspect")
- It was starting to be a minor PITA dealing with colleagues asking for the latest revision of the tool, so I open-sourced it. In 2016 or so I think?
- sq is FL/OSS and will remain so, no funding is sought, not even one of those "buy me a coffee" thingies
- I didn't create this HN post about sq, nor do I know the person who did. But thanks for sq's 15 mins of fame, kind stranger
Personally, I can see a spot for sq in my developer toolbox. Instead of reaching for DBeaver’s bloated interface, or psql’s perplexing TUI, I like the idea of picking up an sq multitool to answer some simple questions quickly about my databases.
Thanks again!
That’s really harsh and misguided. If people didn’t do “paper thin abstractions” projects on their own time for the simple pleasure of doing it we wouldn’t have 90% of the successful projects we have today.
Let people have fun and don’t judge their motives when they’re making something Open Source. I can guarantee the person just thought “this would be cool to have” and implemented it.
> the person just thought “this would be cool to have” and implemented it.
Correct. This is true of much OSS, or at least I've always suspected so.
This! Is! HN!
They’re not moving anyone’s goalposts all usage is voluntary.
The fact that HN is mentioned is also confusing. If something had no value beyond paper thin abstraction I doubt we’d be seeing it on the front page with 500 votes.
Even if this project turned out to be less than hoped it seems counterproductive to complain about people creating and exploring. It’s a good thing to see and natural selection will do any sorting needed.
Why does pgcli exist. What about dbeaver, datagrip, pgadmin, postico, LINQ, ORMs? It's almost as if people value different interfaces to databases.
Maybe it's not for you, but it's not hard to imagine that someone, beyond the author, might find it useful. Maybe it's just me but to dismiss those people as individuals that should just 'learn SQL' is a pretty rude thing to say.
Those who don't understand SQL are doomed to reinvent it, poorly
Any language that features the ability to embed/interpolate unescaped quotes is not secure enough to be used directly by professional developers. Just not worth it to track down all the potential injection attacks.
And if an abstraction lets you understand the structure of queries (eg for sharding them) and even writes the code for you and balances parentheses, flags other syntax errors at compile time etc. then it’s a net benefit.
And of course there is the Adapter Pattern to abstract away connection logic and differences between mysql, pg, etc.
I wrote the “Db” library 12 years ago in PHP and still use it.
I never released it on HN for reasons like this. But I might. Here it is, embedded in my larger library, but I might break it out:
https://github.com/Qbix/Platform/tree/main/platform/classes/...
Documentation:
https://qbix.com/platform/guide/database
https://qbix.com/platform/guide/models
One of the coolest features:
https://qbix.com/platform/guide/models#relations
If you read the above, please let me know your feedback!
Question: Should I release it as a Show HN?
And the irony is of course, he never wants to use a relational DB to avoid SQL, so No-SQL DB it is.
One recently after
Here's my theory: some developers see simple languages that are easy to learn and want to build something more complex to output that language. Maybe its a sub-conscious thing.
HTML is another one, if explained simply, HTML and CSS basic use cases should be easy-ish to pick up by nearly anyone. The fact that we have so many over-engineered frameworks and libraries out there that generate these is evidence of over complicating simple things. Maybe its called resume driven development? Maybe people see genuine useful frameworks that get wide adoption and are wannabes?
The generated SQL is output in sq's logs, when logging is enabled.
Personally I really like working with SQL and find it quite elegant, I always encourage people to use it as it really is a job-superpower if you can just dig up issues directly in the DB quickly. It has a long history so for every question you have there's many answers or avenues to ask them.
https://www.scattered-thoughts.net/writing/against-sql (by another former Materialize employee) is a good takedown.
It’s not a very composable or consistent language, so I think it makes total sense that we see so many abstractions over it.
FWIW, nothing forces an extension to do so. I'm pretty sure there are several that do DML using lower level primitives.
If you have cases where you might need to conditionally join, or have any reasonably complex filtering based on input, building up a plain SQL statement using string interpolation at the call site gets very messy very quickly.
Verbosity is in the eye of the beholder; To me, it's the verbosity of error handling that makes the algorithm clear. Of course I recognize with others that opinion probably changes depending on whether the person reading a given bit of code views error handling as part of the algorithm.
I think SQL is often quite awkward, at least when you look at people exploring alternatives, like https://prql-lang.org/
Not just “the standard” variety, but also all of the vendor specific varieties that you’ll use in practice (MySQL/MariaDB, PostgreSQL, SQLite, Oracle, SQL Server etc.) and how the features offered by each differ, how the support for custom types differs, how the procedural languages differ, the actual experience of using them, seemingly half baked error messages when compared to most imperative languages, varying support for getting the query optimiser to do what you want (e.g. hints), query plans that aren’t pleasant to stare at, often no support for automated analysis of how things are running and suggestions for indices (e.g. Oracle had that sort of feature, it is sometimes helpful, but the automatically generated indices are or at least last I checked were treated as their own special thing and you couldn’t easily delete them). Even things like varying support for working with geospatial data, JSON or time series data.
Not just that, but also the tooling (or the lack of it) - good luck debugging the execution of stored procedures in your DB or placing breakpoints in there, good luck hooking up observability/tracing solutions as easily as you would for your back end, good luck debugging why your database link calls take 100x longer when executed through JDBC in your app but not in the developer tooling.
Not that ORMs or NoSQL make everything much better, you’d just trade one set of annoyances for another, especially when you end up trying to generate dynamic SQL with the likes of myBatis XML mappers. Don’t get me started on people over fetching or ending up with N+1 problems with most ORMs, or executing bunches of queries against the DB just to retrieve some data that you might as well get in one go with a well crafted DB view. Not that you can’t make your DB schema a mess as well with a liberal application of EAV or OTLT (or just attempting to make polymorphic links).
I think SQL as a language makes sense, but PRQL often feels more sensible. I feel like the current RDBMSes out there are very powerful, but that there are also aspects about them (and the differences between the ones you’ll use) that absolutely suck. It feels like where back end languages are getting progressively better DX, databases instead gave us… CTEs? Some syntactic sugar for JSON in PostgreSQL? Feels like they move painfully slow.
For what it's worth, that's why I welcome every new tool or attempt at improving things, even if they won't replace the usual enterprise stacks of having some RDBMS, a JDBC/ODB/whatever driver and probably an ORM on the back end.
I've done data processing in Perl and Python before. THAT is painful. What I think people don't realize is that doing procedural "querying" in a typical PL is, like, 10x the amount of code as an equivalent SQL query. I don't see how the alternative is much better.
SQL just doesn't work like a typical PL, but that's a good thing.
Things that seem potentially worth it to me:
* seL4
* Google's SQL syntax tweak
* Rust
* GraalVM
* systemd
* Tree sitter
* LSP
* CMake
* Bazel
These all get you a step change improvement in comprehensibility, safety, or something else important.
Things that seem like more churn than they're worth:
* Noise protocol (relative to TLS)
* JMAP (compared to good old IMAP)
* Nim/Zig/etc.
* Wayland (fait accompli now, but still)
* Varlink
* Fish shell
* YAML/TOML
* Sq?
* Meson
I wish we, as an industry, invested more in improving existing technologies instead of continually abandoning and replacing working solutions.
I would move Zig (innovates a lot in language and compiler design) and JMAP (IMAP is horrible and needs to be replaced) up and CMake and Bazel down (I count them the same as Meson). I would say the jury is still out on Google's tweak and on seL4.
Additionally, if people like jq, they don't need a special tool like this. They can just get the database to output json and use json/jq - another tool isn't needed. In postgres you can do something like this
select array_to_json(array_agg(row_to_json (r))) from (
... put your sql query here...
) r;
...and postgres will output a json array for you where each item in the array is a json map of a single row of the output.I'm sure other databases have similar functionality.
- no defined encoding, so it may be anything from US-ASCII over ISO-8859-x to UTF-8 and it's always a guesswork what it actually is
- no definition of the separator sign, usually it can be a literal comma, a semicolon, or tabs
- escaping of the separator sign in column values is always fun
- escaping of newlines is even MORE fun
- line endings are not specified
Every piece of software there is has their own expectations on what specific kind of format it expects, and to make it worse, using "Microsoft Excel" as a target/source isn't foolproof either because it behaves differently between the Mac and the Windows version!
JSON in contrast is clearly defined in all of these issues and has battle-tested implementations in every programming language there is.
For example, here's your list fully handled in postgres: 1. SET CLIENT_ENCODING TO 'value'; (eg 'UTF8') 2. COPY ... with FORMAT CSV DELIMITER 'delimiter_character' QUOTE 'quote_character'
Now the output format is fully specified and everything just works fine (including for input into excel)
The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character.
https://www.postgresql.org/docs/current/sql-copy.html $ cat example.json | sq sql 'SELECT * FROM data'
actor_id first_name last_name last_update
1 PENELOPE GUINESS 2020-06-11T02:50:54Z
2 NICK WAHLBERG 2020-06-11T02:50:54Z
One of the data source types is 'json' and the command 'sq sql ...' lets you execute SQL directly instead of using their default query language (SLQ).While on the topic, how exactly does chdb relate to clickhouse-local?
At least not before postgres 17, in the latter json_table does provide this capability.
Fortunately we don’t have to see it so it’s not like it blocks my vision.
But I just wanted to note that the idea of “anything is good” is not really true and I don’t like its spread as there’s opportunity cost. I think we need to spend more attention on evaluation and quality and making good things than the idea that even creating lots of bad things is good in some way.
Duckdb supports both Postgres, Mysql, SQLite and many other extensions.
Postgres: https://duckdb.org/docs/extensions/postgres
MySQL: https://duckdb.org/docs/extensions/mysql
SQLite: https://duckdb.org/docs/extensions/sqlite
You can try this yourself.
1. Clone this repo and create a postgres container with sample data: https://github.com/TemaDobryyR/simple-postgres-container
2. Install duckdb if you haven't and if you have just access it on the console: https://duckdb.org/docs/installation/index?version=stable&en...
3. Load the postgres extension: INSTALL postgres;LOAD postgres;
4. Connect to the postgres database: ATTACH 'dbname=postgres user=postgres host=127.0.0.1 password=postgres' AS db (TYPE POSTGRES, READ_ONLY);
5. SHOW ALL TABLES;
6. select * from db.public.transactions limit 10;
Trying to access SQL data without using SQL only gets you so far and you can just use basic sql interface for that.
DuckDB extension support / dev experience is quite good now too. I've been working on some improvements (e.g. predicate pushdown) to the Iceberg extension and it's been pretty smooth.
If you use things like `array_to_json(array_agg(row_to_json(....)))` in your psql commands to output some rows to JSON, then sq's `--json` or `--jsonl` is quite a bit easier IMHO. If you know the exact SQL query you want to run you can just do `sq sql '....'` as well, but I agree there's not much point in doing that if you aren't taking advantage of some other sq feature.
Developer here. There's a few features other than the query stuff that I still think are pretty handy.
The "sq inspect" stuff isn't easy to do with the standard CLI tools, or at least wasn't when I started working on sq back in 2013 or so.
I also regularly make use of the ability to diff the metadata/schema of different DB instances (e.g. "sq diff @pg_prod @pg_qa").
Looks like an absolutely great (and necessary!) utility, which will automate many future workflows and dataflows, save countless hours of time collectively for many people en masse, and therefore change the world (allow more people to get more done in less time!) much like Unix, shell scripting, grep, sed, awk and Perl gave the world...
Congratulations on writing what no doubt will become one of the major Unix/Windows/MacOS/Other OS/Linux shell scripting commands in the future, if it isn't already!
Well done!
Substring matching is on my short list (also totally open to a PR!).
As a person who is apart from the tech scene, and lurks in the tech space out of interest, I appreciate this guidance. For the longest time I didn’t know nginx was pronounced Engine-X; I called it N-jinx.
I suspect the uptake on the "seek" pronunciation is about 2%, if I'm being generous
Looks kinda neat for when I don't want or need anything more than bash for a script.
Trying to understand: Is the main purpose of this to use jq-syntax for cataloging-like functionality and/or cross-query? I mean it's quite a few lines of code, but you inspect the database catalogs and offer a layer on top of that? I mean, how much data is actually leaving the database?
[0] https://github.com/Florents-Tselai/liteJQ [1] https://github.com/Florents-Tselai/pgJQ
Another POV is all the developers I know who thrive the most and have found the most success: they rate aesthetic concerns the lowest when looking at their tools. That is to say that the packaging or aesthetic coherence in some broader philosophy matters less than other factors.