Show HN: Postgres as a VectorDB GUI
157 points
2 days ago
| 10 comments
| github.com
| HN
wenc
2 days ago
[-]
This is good, but could also be good to mention that you're using umap for dimensionality reduction with cosine metric.

https://github.com/Z-Gort/Reservoirs-Lab/blob/main/src/elect...

Dimensionality reduction from n >> 2 dimensions to 2 dimensions can be very fickle, so the hyperparameters matter. Your visualization can change significantly significantly depending on choice of metric.

https://umap-learn.readthedocs.io/en/latest/parameters.html

You may want to consider projecting to more than 2 dimensions too. You may ask, how does one visualize more than two dimensions? Through a scatterplot matrix of 2 axes at a time.

https://seaborn.pydata.org/examples/scatterplot_matrix.html

These are used for PCA-type multivariate analyses to visualize latent variables in higher dimensions than 2, but 2 dimensions at a time. Some clustering behavior that cannot be seen in 2 axes might be seen in higher dimensions. We used to do this our lab to find anomalies in high dimensions.

reply
isoprophlex
2 days ago
[-]
About fickleness... indeed i've found this a kinda problematic thing when running large-d text embeddings through umap -- it always comes out spherical, blob-shaped, without any obvious segregation in the low-d projected space.

IMO it's very difficult to make a "fire and forget" embedding interpreter. Maybe I never found the right parameters to umap but the results of running it (or any dimension reduction algo) always left me a bit underwhelmed.

reply
antman
2 days ago
[-]
Have you tried PaCMAP? It should be better and faster
reply
wenc
1 day ago
[-]
Thanks for the pointer to PacMap.

I just tried it. My verdict?

PacMap >= UMAP >> t-SNE.

UMAP captures the basic pattern but PacMap makes it crisper.

reply
isoprophlex
1 day ago
[-]
Wow, thanks for that!
reply
gregncheese
2 days ago
[-]
I have yet to find a better tool than the old Tensorflow projector: https://projector.tensorflow.org/

Granted, it requires to prepare your data into TSV files first.

reply
wenc
2 days ago
[-]
That is indeed an excellent tool. Allows one to dynamically adjust and recompute umap and t-sne.
reply
z-gort
2 days ago
[-]
lmk if anyone has any thoughts...if I could go back I may have not gone through Electron

Doing dimensionality reduction locally posed a few challenges in terms of application size--the idea was that by analyzing just a few thousand randomly sampled points you can get an idea of your data through a local GUI where you interact with your data and see some correlated metadata.

Not sure if there's too much need for an individual GUI to go along with Postgres as a VectorDB, maybe people just do analysis separate from a normal "GUI"? But maybe not.

What you think?

reply
maxchehab
2 days ago
[-]
Just some fast feedback, I can't copy & paste in the connection url input form. On a mac.

Once loaded, I get the error "Table must contain a UUID column for vector visualization."

I'm assuming it's trying to find an ID column for grouping? Can we manually specify this? My ID columns are varchars.

reply
garybake
2 days ago
[-]
Same here. I'm using langchain which creates a varchar id column. It also has different collections on the same table.
reply
redwood
2 days ago
[-]
Have folks seen https://atlas.nomic.ai/ <-- absolutely beautiful vector visualization
reply
Alifatisk
2 hours ago
[-]
Seem to require sign ups just to view it.
reply
dcreater
1 day ago
[-]
Proprietary hosted solution to gain as I uncover insights in my data? Hard pass
reply
abadid
23 hours ago
[-]
Why use PostgreSQL instead of columnar databases that are likely to perform way better for these types of analytical workloads?
reply
paddy_m
2 days ago
[-]
README suggestions:

Put the animated gif at the top

Add subtitles to the gif explaining what you're doing.

reply
dcreater
1 day ago
[-]
If I had a nickel for GUI/viz tools that bury the image/video or straight up don't have it in the readme.. lends credence to the popular opinion that engineers don't know how to communicate
reply
ddtaylor
2 days ago
[-]
Does this use pgVector?
reply
z-gort
2 days ago
[-]
It lets you visualize any column with type "EMBEDDING", and I think the only way to get that is through pgvector/pgvectorscale.
reply
samanthasu
2 days ago
[-]
That is excellent visualization!
reply
dmezzetti
2 days ago
[-]
Very interesting, thanks for sharing!
reply
thangngoc89
2 days ago
[-]
As a non-native English speaker and not very familiar with vector database, the title seems very ambiguous to me. I understand it as Postgres as a GUI for some VectorDB. Upon closer inspection, I realized that "Postgres as a VectorDB" is a full name. Maybe shorten that thing to something else. Just my 2 cents.
reply
colechristensen
2 days ago
[-]
It’s just plain bad grammar, the title should be

“Show HN: Reservoirs Lab, a Postgres VectorDB GUI”

reply
monsieurbanana
2 days ago
[-]
I think the confusing term is "VectorDB" which sounds like a name of an existing product. "A vector db GUI powered by Postgres"?
reply