For the past four months I've been working (part-time, this is OSS after all) on Harlequin, a SQL IDE for DuckDB that runs in your terminal. I built this because I work in Data, and I found myself often reaching for the DuckDB CLI to quickly query CSV or Parquet data, but then hitting a wall when using the DuckDB CLI as my queries got more complex and my result sets got larger.
Harlequin is a drop-in replacement for the DuckDB CLI that runs in any terminal (even over SSH), but adds a browsable data catalog, full-powered text editor (with multiple buffer support), and a scrollable results viewer that can display thousands of records.
Harlequin is written in Python, using the Textual framework. It's licensed under MIT.
Yesterday I released v1.0.0: you can try it out with `pip install harlequin`, or visit https://harlequin.sh for docs and other info.
select * from sql_auto_complete( 'select ra' )
https://harlequin.sh/_app/immutable/assets/export.a0e81d27.p...
Every item in the form takes 4 lines (I think?) whereas in more traditional curses UI they would be packed to one line per item, the scrollbar could have easily been avoided here. Smaller nitpick but that style of toggle switches is also form over function; I'd find traditional [X] far more clear/less ambiguous
It's a TUI! It should be buzzing with numbers, packed with information, sparing with space and using every pixel possible. btop[1] is a great example imo — one of the best.
I've tried to be really thoughtful about using space and getting value out of every pixel. There are some features in Harlequin that make it nice on small screens, even with the padding:
1. Scrolling is supported in every widget 2. You can show or hide the left side bar with ctrl+b or F9 3. You can enter "full screen mode" in the query editor or results viewer with F10
Beyond the main screen, the interface could be more compact. Especially the export and help modals. I'm considering tweaks to the design.
No. It should be useful, and if it’s not “buzzing with numbers” and causing me a seizure, well, thank fucking God.
If you want something like that, go write it yourself. I, for one, find the folks writing quiet terminal interfaces to be a massive relief and a tremendous respite from the madness of our seemingly increasingly incoherent, incompetent, and distracting world.
Essentially it's: 1. Data sources from places such as s3, sftp, rds 2. Use duckdb to load most of these with only extensions (I dont believe there's one for sftp, so we just have some python code to pull the files out.) 3. transform the data however we'd like with duckdb. 4. convert the duckdb table to pyarrow 5. Save to s3 with delta-rs
FWIW, we also have this all execute externally from our orchestration on an EC2 instance. This allows us to scale vertically.
Last time I checked duckdb didn't have the concept of a metastore so do you have an internal convention for table locations and folder structure ?.
What do you use for reports/visualizations? notebooks ?.
Reports and viz varies a lot, the finance department uses tableau where as for more 'data sciencey' stuff we normally just use notebooks.
My only work with it to date has been to load-analyze-usuallydelete to refresh, so I do not require any db mutability. Outside of mutability, not sure if there are any obvious wins with the format.
My stack is now JSON files containing the search data. At present I'm loading these into DuckDB and pre-aggregating due to a bug I found in DuckDB, not because it was needed for performance.
For those asking what DuckDB is: columnstore databases like DuckDB may be slower at data ingestion, but are very quick at multi-GB sums, counts, and aggregations.
And they all get SO close, just give a look to SetEdit screenshots: https://setedit.sourceforge.net/
I don't know if it can be of any help for you, but at the moment on the terminal I mostly use:
- Tilde: https://github.com/gphalkes/tilde
- Textadept: https://github.com/orbitalquark/textadept
Or "Geany IDE" on desktop environment (while waiting for lapce.dev to get better), I tend to stay away as much as possible from VS Codium, but everyone else seems to love it and already forgot about Atom, few seems to realise how Microsoft really is.
Maybe the plot twist is that you have to accept in your heart that "writing text on anything, is the real IDE", and transcend to writing on nano!
For example, Tilde. It seems nice and maybe it has nice features, but what about these features is so highly dependent on being accessible through a "menu"? Do you reach for the "menu" all the time? Or are there fancy "dialogs"?
This feels like Vim, but with some strange plugin that provides a "menu". I'd think why not just jump ahead and ditch the quasi-GUI and just learn some shortcuts?
Honestly not trying to be difficult here. I want to understand the mindset, because you are definitely not alone.
Project looks rad too, but I'm just here to appreciatively bike shed.
What is the advantage of using this instead of something well-proven and battle tested like an in memory SQLite database?
(The large spark cluster was still ingesting the data when I was done with my analysis, only working on my local laptop)
`pipx install harlequin`
From the pipx documentation:
> This automatically creates a virtual environment, installs the package, and adds the package's associated applications (entry points) to a location on your PATH.
It makes the command available globally, but sandboxes the package in its own virtual environment.
Here goes...
Take 1: sqlite is to Postgres, what duckdb is to Snowflake/BigQuery.
Take 2: In a similar way that sqlite is an in-process/memory OLTP, duckdb is an in-process/memory OLAP.
I should mention some caveats/exceptions/notes about my statements above:
- there are OLAP projects out there that use Postgres as their basis.
- HTAPs are DBs that allow you to define tables as either row-based or columnar.
- duckdb works with sqlite tables, and its SQL is heavily based on postgres SQL
- duckdb 0.9.0 is being released next week :)
- It seems duckdb is poised to become an almost ubiquitous component of the analytical stack. Apache Iceberg, dbt, ...
How are those generated?