Harlequin: DuckDB IDE for the terminal
309 points
11 months ago
| 17 comments
| harlequin.sh
| HN
tconbeer
11 months ago
[-]
Hi everyone! I made this. Tried posting it to Show yesterday, glad this thread is getting more momentum!

For the past four months I've been working (part-time, this is OSS after all) on Harlequin, a SQL IDE for DuckDB that runs in your terminal. I built this because I work in Data, and I found myself often reaching for the DuckDB CLI to quickly query CSV or Parquet data, but then hitting a wall when using the DuckDB CLI as my queries got more complex and my result sets got larger.

Harlequin is a drop-in replacement for the DuckDB CLI that runs in any terminal (even over SSH), but adds a browsable data catalog, full-powered text editor (with multiple buffer support), and a scrollable results viewer that can display thousands of records.

Harlequin is written in Python, using the Textual framework. It's licensed under MIT.

Yesterday I released v1.0.0: you can try it out with `pip install harlequin`, or visit https://harlequin.sh for docs and other info.

reply
ayhanfuat
11 months ago
[-]
Lovely tool. I'll certainly try it out. The code fragments in the documentation seem to be not selectable, though (or maybe it is not highlighting correctly).
reply
tconbeer
11 months ago
[-]
it's an issue with the highlight color. I'll get that fixed shortly.
reply
avinassh
11 months ago
[-]
This looks very cool! how is autocomplete feature implemented?
reply
tconbeer
11 months ago
[-]
Right now there is no autocomplete inside the text editor, but that will be added soon. DuckDB itself provides an extension that adds a table function that can be queried for completions like so:

select * from sql_auto_complete( 'select ra' )

reply
zokier
11 months ago
[-]
I'm not in love with this style of UI design in terminals:

https://harlequin.sh/_app/immutable/assets/export.a0e81d27.p...

Every item in the form takes 4 lines (I think?) whereas in more traditional curses UI they would be packed to one line per item, the scrollbar could have easily been avoided here. Smaller nitpick but that style of toggle switches is also form over function; I'd find traditional [X] far more clear/less ambiguous

reply
rochak
11 months ago
[-]
+1. Lazygit is one of the few tools that does the TUI just right. Super functional UI + Easy discoverability = Chef's kiss.
reply
tconbeer
11 months ago
[-]
TIL about LazyGit. Looks excellent. Thanks for the tip, and the UI feedback
reply
ElCapitanMarkla
11 months ago
[-]
Don’t forget lazydocker if you use docker
reply
queuebert
11 months ago
[-]
Agree. The best curses interfaces make liberal use of highlighting and color, rather than box drawing characters, which take up too much real estate. Besides lazygit, which rocks, I would even point to the old DOS versions of MS Word, which had popups that were information dense but readable.
reply
ptman
11 months ago
[-]
Color can be tricky too. Colorblind or bad contrast. Terminals do bold, italic, underline and strikethrough.
reply
tconbeer
11 months ago
[-]
fair feedback!
reply
lazzlazzlazz
11 months ago
[-]
I love this and will definitely try it out! Although I admit I'm a little puzzled when people simultaneously want to do a TUI but also design things so there is generous (excessive, actually) whitespace around everything to create the illusion of "minimalism" or "comfort".

It's a TUI! It should be buzzing with numbers, packed with information, sparing with space and using every pixel possible. btop[1] is a great example imo — one of the best.

[1]: https://github.com/aristocratos/btop

reply
tconbeer
11 months ago
[-]
Thanks for the feedback; gotten some of the same here.

I've tried to be really thoughtful about using space and getting value out of every pixel. There are some features in Harlequin that make it nice on small screens, even with the padding:

1. Scrolling is supported in every widget 2. You can show or hide the left side bar with ctrl+b or F9 3. You can enter "full screen mode" in the query editor or results viewer with F10

Beyond the main screen, the interface could be more compact. Especially the export and help modals. I'm considering tweaks to the design.

reply
leoh
11 months ago
[-]
> It should be buzzing with numbers, packed with information, sparing with space and using every pixel possible.

No. It should be useful, and if it’s not “buzzing with numbers” and causing me a seizure, well, thank fucking God.

If you want something like that, go write it yourself. I, for one, find the folks writing quiet terminal interfaces to be a massive relief and a tremendous respite from the madness of our seemingly increasingly incoherent, incompetent, and distracting world.

reply
quadrature
11 months ago
[-]
If anyone here is using DuckDB in production i'd love to hear what your stack looks like over the entire lifecycle of extract->transform->load.
reply
zlurker
11 months ago
[-]
We orchestrate our ETL pipelines with dagster. We only use duckdb in a few of them but are slowly replacing pandas etls with it. For some of our bigger jobs we use spark instead.

Essentially it's: 1. Data sources from places such as s3, sftp, rds 2. Use duckdb to load most of these with only extensions (I dont believe there's one for sftp, so we just have some python code to pull the files out.) 3. transform the data however we'd like with duckdb. 4. convert the duckdb table to pyarrow 5. Save to s3 with delta-rs

FWIW, we also have this all execute externally from our orchestration on an EC2 instance. This allows us to scale vertically.

reply
quadrature
11 months ago
[-]
This is very cool!.

Last time I checked duckdb didn't have the concept of a metastore so do you have an internal convention for table locations and folder structure ?.

What do you use for reports/visualizations? notebooks ?.

reply
zlurker
11 months ago
[-]
Yeah, dagster has a concept of metadata and assets so we have some code that'll map dagster's own logical representation to physical s3 locations.

Reports and viz varies a lot, the finance department uses tableau where as for more 'data sciencey' stuff we normally just use notebooks.

reply
quadrature
11 months ago
[-]
Hadn't heard of that dagster feature, thats really cool.
reply
tconbeer
11 months ago
[-]
It's great as: 1. An ephemeral processing engine. For example, I have a machine learning pipeline where I load data into a DataFrame, and then I can use DuckDB to execute SQL on my DataFrame (I prefer both the syntax and performance to Pandas). 2. A data lake processing engine. DuckDB makes it very easy to interact with partitioned files. 3. A lightweight datastore. I have one ETL pipeline where I need to cache the data if an API is unavailable. I just write the DataFrame to a DuckDB database that is on a mounted network filesystem, and read it back when I need it.
reply
0cf8612b2e1e
11 months ago
[-]
On a similar point, are people using the actual duck database format or sticking with Parquet? I love everything about DuckDB, but I feel more comfortable keeping things in an existing format.

My only work with it to date has been to load-analyze-usuallydelete to refresh, so I do not require any db mutability. Outside of mutability, not sure if there are any obvious wins with the format.

reply
tconbeer
11 months ago
[-]
It's a bit faster and easier than parquet, but right now the format is unstable, which is a huge downside and makes it unsuitable for medium/long-term storage. After DuckDB v1, they'll keep the format stable and then I think its popularity will increase dramatically.
reply
pbowyer
11 months ago
[-]
I'm using it for site-search analytics. Originally I was using MariaDB but because we do search-as-you-type there's a complex aggregation query to identify the "real" searches, and with 1000 entries that was taking 10s of seconds to run. Materialized views would've fixed that but...

My stack is now JSON files containing the search data. At present I'm loading these into DuckDB and pre-aggregating due to a bug I found in DuckDB, not because it was needed for performance.

reply
thenipper
11 months ago
[-]
I've been using it for taking output from our data validation steps and bundling that up with the data that was validated into one neat artifact we can download if there is an issue and explore manually.
reply
souldeux
11 months ago
[-]
duckDB is one of the built-ins for count.co, which i've come to love
reply
whoopdeepoo
11 months ago
[-]
We use it to sort parquets out of core and then the arrow interface to read into Python and export as geoparquets
reply
NortySpock
11 months ago
[-]
This looks super handy, I will definitely take a moment to try it out.

For those asking what DuckDB is: columnstore databases like DuckDB may be slower at data ingestion, but are very quick at multi-GB sums, counts, and aggregations.

reply
pjot
11 months ago
[-]
This looks great! I’ve been using Motherduck for a while now, glad to see more things being built with it in mind.
reply
DesiLurker
11 months ago
[-]
I need this but for C++, I kinda miss the old borland turbo C++ ide. mostly what I care about is terminal access and code navigation but other than some vim + plugins monstrosity I cant find any.
reply
malloc-0x90
11 months ago
[-]
Same. Thousands text editors, but not a single IDE.

And they all get SO close, just give a look to SetEdit screenshots: https://setedit.sourceforge.net/

I don't know if it can be of any help for you, but at the moment on the terminal I mostly use:

- Tilde: https://github.com/gphalkes/tilde

- Textadept: https://github.com/orbitalquark/textadept

Or "Geany IDE" on desktop environment (while waiting for lapce.dev to get better), I tend to stay away as much as possible from VS Codium, but everyone else seems to love it and already forgot about Atom, few seems to realise how Microsoft really is.

Maybe the plot twist is that you have to accept in your heart that "writing text on anything, is the real IDE", and transcend to writing on nano!

reply
SanderNL
11 months ago
[-]
A rich text-based UI feels like it has all the downsides of a GUI while substantially lacking the upsides.

For example, Tilde. It seems nice and maybe it has nice features, but what about these features is so highly dependent on being accessible through a "menu"? Do you reach for the "menu" all the time? Or are there fancy "dialogs"?

This feels like Vim, but with some strange plugin that provides a "menu". I'd think why not just jump ahead and ditch the quasi-GUI and just learn some shortcuts?

Honestly not trying to be difficult here. I want to understand the mindset, because you are definitely not alone.

reply
whobre
11 months ago
[-]
There’s Motor, but I don’t know if the project is still being maintained.

https://github.com/rofl0r/motor

reply
bdcravens
11 months ago
[-]
Very complete IDE - would be very useful for other RDBMSes.
reply
tconbeer
11 months ago
[-]
Thanks! Hopefully coming to v2 ... https://github.com/tconbeer/harlequin/issues/230
reply
jamestimmins
11 months ago
[-]
Absolutely love the logo. I'd like to see more projects/startups choosing logos/themes with this level of personality.

Project looks rad too, but I'm just here to appreciatively bike shed.

reply
aargh_aargh
11 months ago
[-]
I don't mean to minimize the effort and contribution of this project in any way, and I understand the author is scratching their own itch, but I'm curious why this is specific to DuckDB? From skimming the homepage, it has generic querying capabilities. Isn't there a generic DB driver that can be swapped out and the TUI used on top of another DBMS?
reply
tconbeer
11 months ago
[-]
DuckDB provides some really nice python APIs that other databases don't have, so it was a very easy way for me to get started. I'm considering/planning extending to other DBs using plugins. More info here: https://github.com/tconbeer/harlequin/issues/230
reply
niels_bom
11 months ago
[-]
I quite like the dbcli.com family of TUI DBMS tools. I personally use pgcli a lot.
reply
mavam
11 months ago
[-]
Apache Arrow recently introduced ADBC for this.
reply
sspiff
11 months ago
[-]
Perhaps slightly off topic, but I'm not familiar with DuckDB. I don't really understand what it is for, and their site is doesn't really explain it well either.

What is the advantage of using this instead of something well-proven and battle tested like an in memory SQLite database?

reply
victorbjorklund
11 months ago
[-]
DuckDB is a columnar db and focused on analysis queries. So it is more optimized for those workflows. Of course for many things sqlite is fine but if you have a really large db with complex queries duckdb should be faster.
reply
filleokus
11 months ago
[-]
DuckDB also compresses the data. I've ingested ≈ 1 TB of apache-like access logs into a duckdb file that was small enough to load completely into ram on my local machine (like 10-15 GB). It was blazingly fast to query over.

(The large spark cluster was still ingesting the data when I was done with my analysis, only working on my local laptop)

reply
sghiassy
11 months ago
[-]
Love the passion and enthusiasm behind this project. Keep it up!
reply
tconbeer
11 months ago
[-]
Thank you!
reply
wodenokoto
11 months ago
[-]
Kinda off topic but how do I install these pip sourced command line applications such that they are available regardless of which virtual environment I am using?
reply
duckmysick
11 months ago
[-]
As mentioned in Getting Started, with `pipx`:

`pipx install harlequin`

From the pipx documentation:

> This automatically creates a virtual environment, installs the package, and adds the package's associated applications (entry points) to a location on your PATH.

It makes the command available globally, but sandboxes the package in its own virtual environment.

https://github.com/pypa/pipx

reply
tschellenbach
11 months ago
[-]
what are the benefits of DuckDB?
reply
chrisjc
11 months ago
[-]
Databases can usually be split into one of two types; OLTP (row-based) or OLAP (columnar). OLTPs are used mostly for transactional workloads whereas OLAP is mostly used for analytics.

Here goes...

Take 1: sqlite is to Postgres, what duckdb is to Snowflake/BigQuery.

Take 2: In a similar way that sqlite is an in-process/memory OLTP, duckdb is an in-process/memory OLAP.

I should mention some caveats/exceptions/notes about my statements above:

- there are OLAP projects out there that use Postgres as their basis.

- HTAPs are DBs that allow you to define tables as either row-based or columnar.

- duckdb works with sqlite tables, and its SQL is heavily based on postgres SQL

- duckdb 0.9.0 is being released next week :)

- It seems duckdb is poised to become an almost ubiquitous component of the analytical stack. Apache Iceberg, dbt, ...

reply
maxmcd
11 months ago
[-]
it’s like a columnar store sqlite, better performance for analytical data
reply
pacbard
11 months ago
[-]
It’s similar to sqlite but can open multiple file formats.
reply
bdcravens
11 months ago
[-]
In addition to the other examples given, it's a quick way to run sql commmands against a CSV, and lets you quickly export to Parquet as well run queries against Parquet files. If you're going to be using Parquet but want to run some quick tests (think setting up an Athena database) it's nice to do it ad hoc rather than having to setup a full ETL process first.
reply
hobs
11 months ago
[-]
I like it for writing easy sql locally transforming data frames without having to think too hard about it.
reply
erksa
11 months ago
[-]
Love the SVG's for showing of themes.

How are those generated?

reply
tconbeer
11 months ago
[-]
Textual provides a utility for screenshotting apps. I loop over a list of themes and take the screenshots. You can see the code for that here: https://github.com/tconbeer/harlequin/blob/e666319b7a964f026...
reply
erksa
11 months ago
[-]
Thanks!
reply
tiffanyh
11 months ago
[-]
Love the simplicity yet functionality.
reply
Scarbutt
11 months ago
[-]
Is there something like this for XML?
reply