Is the actual UI open source, or is that something MotherDuck is allowing to be used by this while remaining proprietary? Right now it doesn't appear like this would work without an internet connection.
At least it's hosted on duckdb.org and not mother duck, but I really would expect to see that source somewhere. Disappointing unless I've missed it.
Breadcrumbs in the extension src: https://github.com/duckdb/duckdb-ui/blob/963e0e4d4c6f84b2536...
> Jeff Raymakers — Today at 9:25 AM
> The language in the blog post is misleading, and we're going to correct it.
> The UI extension is open source, but the UI itself is not.
Something like this will basically destroy legacy platforms like tableau, sas etc
Maybe the closed source UI is downloaded upon first execution for installation and then cached locally?
Or is this a web app that loads from the remote URL each time?
See the note just above this link on data locations and the optional and explicit opt-in to motherduck:
But yeah, I can't find docs nor source for the UI. And the extension docs refer to MotherDuck's own UI: https://motherduck.com/docs/getting-started/motherduck-quick...
So, a bit confusing way this is set up.
> Be sure you trust any URL you configure, as the application can access the data you load into DuckDB.
That’s certainly not what I would expect if someone gave me a “local UI” for some database. I’ve only just once toyed with duckdb, was planning to look more at it - looks like will need to have my guard and see what actually is “local” and doesn’t ship my data to a remote url.
> The repository does not contain the source code for the frontend, which is currently not available as open-source. Releasing it as open-source is under consideration.
Make it opt-in, or not installed by default please, it’s so hazardous.
(Someone could write an actually open source UI extension for duckdb, but that would require a lot of investment that so far only motherduck has been able to provide.)
If you want to support a real OS UI take a look.
I have similar concerns for Astral. Frankly they're single-handedly unshitifying Python, and it would be a tragedy if they run out of money and we're back to dealing with Pip.
Rill has better built in visualizations and pivot tables and overall a polished product with open-source code in Go/Svelte. But the DuckDB UI has very nice Jupyter notebook-style "cells" for editing SQL queries.
Rill is fully open-source under the Apache license. [2]
[1] https://blobs.duckdb.org/events/duckcon6/mike-driscoll-rill-...
Pygwalker does open-source descriptive statistics and charts from pandas dataframes: https://github.com/Kanaries/pygwalker
ydata-profiling does open-source Exploratory Data Analysis (EDA) with Pandas and Spark DataFrames and integrates with various apps: https://github.com/ydataai/ydata-profiling #integrations, #use-cases
jupyterlite-xeus installs packages specified in an environment.yml from emscripten-forge: https://jupyterlite-xeus.readthedocs.io/en/latest/environmen...
emscripten-forge has xeus-sqlite and pandas and numpy and so on; but not yet duckdb-wasm: https://repo.mamba.pm/emscripten-forge
https://youtu.be/_IqvrFWY7ZM?si=1ux9SGUsh4kDs-ff
Alongside several great talks including Rusty Conover presenting Airport - Arrow + DuckDB — and Christophe Blefari (Bl3f) introducing a new, lightweight orchestrator called yato.
we're using Perspective in crabwalk[0] (it's like dbt specifically built for duckdb and written in rust) and it's amazing paired with duckdb. Near instant loads for hundreds of thousands of rows and you can keep everything in arrow.
It does look interesting, but for the local ETL use case, I am missing the pitch on just having my own collection of SQL scripts. Presumably the all-local case needs less complexity. Unless the idea is that this will eventually support more connectors/backends and work as a full dbt replacement?
* Built-in column level lineage (i.e. dump in 20 .sql files and crabwalk automatically figures out lineage)
* Visualize the lineage
* Clean handling of input / output (e.g. simply specify @config output and you can export results to parquet, csv, etc.)
* Tests are not yet implemented, but crabwalk will have built-in support for tests (e.g. uniqueness, joins, etc.)
we're using it in our product (https://www.definite.app/), but only for lineage right now.
https://github.com/EduardoVernier/eduardovernier.github.io/b...
https://youtu.be/Bf-MRxhNMdI?list=PLy5Y4CMtJ7mKaUBrSZ3YgwrFY... (see the GIT method)
Why is some software so difficult to install beats me.
Just a few days ago I have been looking for existing column explorers that look like from Kaggle Dataset, but I was not able to find anything. And this one by DuckDB is better!
It seems nobody else besides them cares.
Seeing data distribution, unique values, min/ max/ percentiles is so easy and powerful.
Really commend whoever came up with that.
It's a bit of a shame this metadata cannot be queried itself, would be immensely useful for automatic data profiling/ QA at scale.
See it's demo:
https://manzt.github.io/quak/?source=https://pub-2fc10ef6724...
And foremost - thank you for designing this awesome component!
Observable's column summary feature is very nice! But I do think there's a very common lineage around these kinds of diagnostics which motivated both Observables and ours. See Jeff Heer's profiler paper[1] for more.
I'm very passionate about this area because I think "first mile problems" are underserved by most tools, but they take the longest to work out.
We had to do some gnarly things[2] to make this feature work well; and there's a lot of room to make it scale nicely and cover all DuckDB data types.
[1] http://vis.stanford.edu/papers/profiler [2] https://motherduck.com/blog/introducing-column-explorer/
I find the ease and intuitiveness of navigating it as well as the clarity of the information presented even for the density of a small window or many columns outstandingly pleasant.
Kudos to you!
I am somewhat at odds with it being a default extension build into DuckDB release. This still is a feature/product coming from another company than the makers of DuckDB [1], though they did announce a partnership with makers of this UI [2]. Whilst DuckDB has so far thrived without VC money, MotherDuck has (at least) 100M in VC [3].
I guess I'm wondering where the lines are between free and open source work compared to commercial work here. My assumption has been that the line is what DuckDB ships and what others in the community do. This release seems to change that.
Yes, I do like and use nice, free things. And I understand that things have to be paid for by someone. That someone even sometimes is me. I guess I'd like clarification on the future of DuckDB as its popularity and reach is growing.
[2] https://duckdblabs.com/news/2022/11/15/motherduck-partnershi...
[3] https://motherduck.com/blog/motherduck-open-for-all-with-ser...
edit: I don't want to leave this negative sounding post here without addendum. I'm just concerned of future monetization strategies and roadmap of DuckDB. DuckDB is a good and useful, versatile tool. I mainly use it from Python through Jupyter, in the browser and native. I haven't felt the need for commercial services (plus purchasing them from my professional setting is too convoluted). This UI whilst undoubtedly useful seems to be leaning towards commercial side. I merely wanted some clarity on what it might entail. I do hope DuckDB and its community even more greater, better things, with requisite compensation for those who work to ensure this.
To be specific, the work we did was:
* Add the -ui command to the shell. This executes a SQL query (CALL start_ui()). The query that gets executed can be customized by the user through the .ui_command option - e.g. by setting .ui_command my_ui_function().
* The ui extension is automatically installed and loaded when the start_ui function is executed - similar to other trusted extensions we distribute. The automatic install and load can be disabled through configuration (SET autoinstall_known_extensions=false, SET autoload_known_extensions=false) and is also disabled when SET enable_external_access=false.
Then there's the (to me) entirely new feature of an extension providing a HTTP proxy for external web service. This part could have been more prominently explained.
Edit: the OP states that "built-in local UI for DuckDB" and "full-featured local web user interface is available out-of-the-box". These statements make me think this feature comes with the release binary, not that it's an extension.
To clarify my point: for me it's not the possible confusion of what this plugin does or how, but what this collaboration means for the future of DuckDB's no-cost and commercial usage.
We have collaborated with MotherDuck on streamlining the experience of launching the UI through auto-installation, but the DuckDB Foundation still remains in full control of DuckDB and the extension ecosystem. This has no impact on that.
For further clarification:
* The auto-installation mechanism is identical to that of other trusted extensions - the auto-installation is triggered when a specific function is called that does not exist in the catalog - in this case the `start_ui` function. See [1]. The query I mentioned just calls that function. The only special feature here is the addition of the CLI flag (and what that flag executes is user-configurable).
* The HTTP server is necessary for the extension to function as the extension needs to communicate with the browser. The server is open-source as part of the extension code [2]. The server (1) fetches web resources (javascript/css) from ui.duckdb.org, and (2) communicates with localhost to co-ordinate the UI with DuckDB. Outside of these the server doesn't interface with other external web services.
[1] https://github.com/duckdb/duckdb/blob/main/src/include/duckd...
I realized that the extension provides a HTTP API to DuckDB. Is this perhaps to become the official way to use DuckDB through HTTP? For me this is much more interesting than one particular UI.
I went looking and found that there's community extension of similar functionality: https://duckdb.org/community_extensions/extensions/httpserve...
Official, supported HTTP API with stable schema versioning would be a nice addition.
I'm OK with this. Commercial open source projects need a business model. I get why this can be controversial, but the ecosystem needs to find ways to fund future development and I'm willing to compromise on purity if it means people are getting paid for their work.
(Actually it looks like the UI feature may depend on loading closed source assets across the Internet? If so that changes my comfort level a lot, I'm not keen on that compromise.)
I actually really like the close partnerships in theory because it aligns incentives, but this crosses the line by not being open enough. The tight motherduck integration with DuckDB for externally hosted DuckDB/Motherduck databases is fine and good: preferential treatment where the software makes it easy to use the sponsoring service. The local UI which is actually fully dependent on the external service is off-putting. It's a little less bad because it's an extension, but it's still worrying from a governance and principals perspective.
You can self host Deno KV since over a year.
To characterize this as a rug pull is unfair IMO.
There is always going to be some overlap between open source contributions and commercial interests but unless a real problem emerges like core features getting locked behind paywalls there is no real cause for concern. If that happens then sure let’s talk about it and raise the issue in a public forum. But for now it is just a nice convenience feature that some people (like me) will find useful.
There's another way this could have gone. DuckDB Labs might have published the extension as providing official HTTP API for all to use. Then simultaneously MotherDuck would announce support for it in their UI. Now with access to any and all databases whether in-browser, anywhere through official HTTP API or in their managed cloud service.
I for one would like HTTP API for some things that now necessitates doing my own in Python. I don't see yet much need for the UI. I'm not looking for public, multiuser service. Just something that I can use locally which doesn't have to be inside a process (such as Python or web browser). There's such API in the extension now, but it's without docs and in C++ [1]. There's also the option of using 3rd party community extension that also does HTTP API [2]. Then there's one that supports remote access with Arrow Flight, but gRPC only it seems [3]. But official, stable version would be nice.
[1] https://github.com/duckdb/duckdb-ui/blob/main/src/http_serve...
[2] https://duckdb.org/community_extensions/extensions/httpserve...
Something I haven't found yet is a small swiss army knife for time series type of data: system and network monitoring, sensors and market data.
I usually put everything in Prometheus but it is awkward.
I would really love to find something I can query intuitively with SQL, have very basic plotting capability, read/parse some log files, can be queried without having to deal with REST/JSON, and support adding data with pushes.
I am wondering if this is not within DuckDB broad capabilities...
You can either drag & drop data, or use remote data sources via https
this is a first release. we know there are going to be tons of feature requests (including @antman’s request for simple charts). feel free to chime in on this thread and we’ll keep an eye on it!
meanwhile, hope you enjoy this release! we had lots of fun building it.
Is this really the case? The repo doesn’t seem to have any ui elements?
* Being able to specify a db at startup would be pretty cool. I'm teaching a class on SQL this summer and I'm already envisioning workflows where a gatekeeper proxy spins up duckdb-ui containers on-demand for users as they log in/out, and it would be much better if the UI can be pre-seeded with an existing database.
* This is maybe a big ask, but markdown cells in notebooks would be nice. Again thinking of my classroom use-case, it would be nice to distribute course materials (lessons/exercises/etc) as notebooks, but that's a bit of a non-starter without markdown or some equivalent content-centric cell type.
* Not a feature request, I just have to say I'm a big fan of how well it handles displaying massive datasets with on-demand streaming and all. I was imagining that I'd have to introduce the `LIMIT` clause right off the bat so that people don't swamp themselves with 100k's of rows, but if I end up using this then maybe I can hold off on that and introduce it at a more natural time.
Regardless, this is great and I definitely have uses for it outside the class I mentioned, so thanks!
duckdb -ui pre_seeded_db.db
duckdb -ui -init startup.sql
where startup.sql contains various statements/commands like `.open pre_seeded_db.db`Alternatively place statements/commands in `~/.duckdbrc` and just run `duckdb -iu`.
I've been trying to build a small card game with Supabase and I'm sorta stuck...
1. https://github.com/manifold-systems/manifold/blob/master/doc...
Spark is for when you have a hundreds of machines worth of processing to do
Absolutely agree. However, most uses of Spark I've seen in my career are people thinking they have hundreds of machines worth of processing to do.
1) Biting off more than they can chew,
2) Putting significant effort into something that's outside of their core value proposition,
3) Leaning more in the direction of supporting things with a for profit company that gradually cannibalizes the open source side.
Maybe I'm being too cynical. I hope I'm wrong.
you might have a point on #3, but they need to pay the bills somehow.
I doubt they’ll ever enshittify DuckDB core. It’s clear they’re only aiming for better integration with their paid service via peripherals like UI to improve the experience, but you also don’t need to use it?
It’s all extensions that you can develop the end.
Azure Data Studio can connect to a variety of databases and has completions, but tend to forget if you've set a cell to output a plot. It also doesn't have good functionality for carrying over results from one cell to the next.
Jupyter notebooks don't have any kind of autocompletion against a database (at least to my knowledge), but you do get a lot of control of how you want to store things between cells and display things.
This DuckDB UI looks great, and while DuckDB can read a lot of files, I'm not sure if it has enough connectors to be a general database exploration notebook
https://blobs.duckdb.org/slides/goto-amsterdam-2024-duckdb-g...
It works as a replacement / complementary component to dataframe libraries due to it's speed and (vertical) scalability. It's lightweight and dependency-free, so it also works as part of data processing pipelines.
*EDIT*
One useful thing I thought of with this. If you do a lot of development work on iPad Pro and/or in devcontainers, this could be useful as a UI. I have a bookmarks repository that is just a couple of python scripts and collection of json files. This would be useful to spin up a codespace on GitHub and query the files.
DuckDB is a fast ana| database system.
We also have some added bonuses for query profiling and data exploration like the Column Explorer.
The easiest way to give it a whirl is to type 'duckdb -ui' in the CLI.
Let us know if you have any other questions
At risk of harping on a tired topic, have you thought about embedding an AI query generator? For ad-hoc queries like I mostly use DuckDB for I’ve found it’s almost always fastest for me to paste the schema to ChatGPT, tell it what I’ll looking for, then paste the response back into the DuckD CLI, but the whole process isn’t very ergonomic.
I think I’m sort of after duckbook.ai, but with access to a local duckdb.
Given the above I'm not sure it supports SSH functionality? Since it exposes an API though there is probably a way to access it, but the easiest solution is probably the one you don't want, which is to open the expected port and just hit it up in a browser. You could open it only to your (office/VPN) IP address, that way at least you're only exposing the port to yourself.
And re-reading a bit it does appear to support remote data warehouses, as it has Mother Duck integration, and that is what Mother Duck is. Someone will probably add an interface to make this kind of thing possible for privately hosted DBs. The question is will it be dynamic via SSH tunnel or is it exclusively API driven? And does it depend on the closed source (I think?) Mother Duck authentication system.
ssh -F ssh.config -L 4213:localhost:4213 dev 'DUCKDB_HTTPPORT=4213 ~/.duckdb/cli/latest/duckdb -ui'
PS: Not associated with DuckDB team at all, I just love DuckDB so much that I shill for them when I see them in HN.
What sort of thing should I be working on, to think "oh, maybe I want this DuckDB thing here to do this for me?"
I guess I don't really get the "that you want to learn something about" bit.
If you’re using excel power query and XLOOKUPs, then it’s similar but dramatically faster and without the excel autocorrection nonsense
If you’re doing data processing that fits on your local machine eg 50MB, 10GB, 50GB CSVs kind of thing, then it should be your default.
If you’re using pandas/numpy, this is probably better/faster/easier
Basically if you’re doing one-time data mangling tasks with quick python scripts or excel or similar, you should probably be looking at SQLite/duckdb.
For bigger/repeatable jobs, then just consider it a competitor to doing things with multiple CSV/JSON files.
- data you’ve pulled from an API, such as stock history or weather data,
- banking records you want to analyze for patterns, trends, unauthorized transactions, etc
- your personal fitness data, such as workouts, distance, pace, etc
- your personal sleep patterns (data retrieved from a sleep tracking device),
- data you’ve pulled from an enterprise database at work — could be financial data, transactions, inventory, transit times, or anything else stored there that you might need to pull and analyze.
Here’s a personal example: I recently downloaded a publicly available dataset that came in the form of a 30 MB csv file. But instead of using commas to separate fields, it used the pipe character (‘|’). I used DuckDB to quickly read the data from the file. I could have actually queried the file directly using DuckDB SQL, but in my case I saved it to a local DuckDB database and queried it from there.
Hope that helps.
- Am I doing data analysis?
- Is it read-heavy, write-light, using complex queries over large datasets?
- Is the dataset large (several GB to terabytes or more)?
- Do I want to use parquet/csv/json data without transformation steps?
- Do I need to distribute the workload across multiple cores?
If any of those are a yes, I might want DuckDB - Do I need to write data frequently?
- Are ACID transactions important?
- Do I need concurrent writers?
- Are my data sets tiny?
- Are my queries super simple?
If most of the first questions are no and some of these are yes, SQLite is the right callIt works great against local files, but my favorite DuckDB feature is that it can run queries against remote Parquet files, fetching just the ranges of bytes it needs to answer the query using HTTP range queries.
This means you can run eg a count(*) against a 15GB parquet file from your laptop and only fetch a few hundred KBs of data (if that).
If you're developing in the data space you should consider your "small data" scenarios (ex: the vast majority of our clients have < 1GB of analytical data; Snowflake, etc. is overkill). Building a DW that exists entirely in a local browser session is possible now; that's a big deal.
duckdb -ui data.parquet
duckdb -ui data.sqlite
`duckdb -ui sqlitedb.db` should work bc duckdb can read sqlite files. If it doesn't autoload extension, you can INSTALL/LOAD in to your ~/.duckdbrc
MotherDuck lets you run a fleet of DuckDB instances as a managed cloud service.
It's just a matter of time until there will be a paywall in front of this. Hook people on something, then demand money.
Refreshing to neither see a loom recording or a high budget video set in a Japandi architecture style office designed to go viral.