Show HN: Postgres as a VectorDB GUI
162 points
10 months ago
| 10 comments
| github.com
| HN
wenc
10 months ago
[-]
This is good, but could also be good to mention that you're using umap for dimensionality reduction with cosine metric.

https://github.com/Z-Gort/Reservoirs-Lab/blob/main/src/elect...

Dimensionality reduction from n >> 2 dimensions to 2 dimensions can be very fickle, so the hyperparameters matter. Your visualization can change significantly significantly depending on choice of metric.

https://umap-learn.readthedocs.io/en/latest/parameters.html

You may want to consider projecting to more than 2 dimensions too. You may ask, how does one visualize more than two dimensions? Through a scatterplot matrix of 2 axes at a time.

https://seaborn.pydata.org/examples/scatterplot_matrix.html

These are used for PCA-type multivariate analyses to visualize latent variables in higher dimensions than 2, but 2 dimensions at a time. Some clustering behavior that cannot be seen in 2 axes might be seen in higher dimensions. We used to do this our lab to find anomalies in high dimensions.

reply
isoprophlex
10 months ago
[-]
About fickleness... indeed i've found this a kinda problematic thing when running large-d text embeddings through umap -- it always comes out spherical, blob-shaped, without any obvious segregation in the low-d projected space.

IMO it's very difficult to make a "fire and forget" embedding interpreter. Maybe I never found the right parameters to umap but the results of running it (or any dimension reduction algo) always left me a bit underwhelmed.

reply
antman
10 months ago
[-]
Have you tried PaCMAP? It should be better and faster
reply
wenc
10 months ago
[-]
Thanks for the pointer to PacMap.

I just tried it. My verdict?

PacMap >= UMAP >> t-SNE.

UMAP captures the basic pattern but PacMap makes it crisper.

reply
isoprophlex
10 months ago
[-]
Wow, thanks for that!
reply
gregncheese
10 months ago
[-]
I have yet to find a better tool than the old Tensorflow projector: https://projector.tensorflow.org/

Granted, it requires to prepare your data into TSV files first.

reply
wenc
10 months ago
[-]
That is indeed an excellent tool. Allows one to dynamically adjust and recompute umap and t-sne.
reply
z-gort
10 months ago
[-]
lmk if anyone has any thoughts...if I could go back I may have not gone through Electron

Doing dimensionality reduction locally posed a few challenges in terms of application size--the idea was that by analyzing just a few thousand randomly sampled points you can get an idea of your data through a local GUI where you interact with your data and see some correlated metadata.

Not sure if there's too much need for an individual GUI to go along with Postgres as a VectorDB, maybe people just do analysis separate from a normal "GUI"? But maybe not.

What you think?

reply
maxchehab
10 months ago
[-]
Just some fast feedback, I can't copy & paste in the connection url input form. On a mac.

Once loaded, I get the error "Table must contain a UUID column for vector visualization."

I'm assuming it's trying to find an ID column for grouping? Can we manually specify this? My ID columns are varchars.

reply
garybake
10 months ago
[-]
Same here. I'm using langchain which creates a varchar id column. It also has different collections on the same table.
reply
redwood
10 months ago
[-]
Have folks seen https://atlas.nomic.ai/ <-- absolutely beautiful vector visualization
reply
dcreater
10 months ago
[-]
Proprietary hosted solution to gain as I uncover insights in my data? Hard pass
reply
Alifatisk
10 months ago
[-]
Seem to require sign ups just to view it.
reply
paddy_m
10 months ago
[-]
README suggestions:

Put the animated gif at the top

Add subtitles to the gif explaining what you're doing.

reply
dcreater
10 months ago
[-]
If I had a nickel for GUI/viz tools that bury the image/video or straight up don't have it in the readme.. lends credence to the popular opinion that engineers don't know how to communicate
reply
abadid
10 months ago
[-]
Why use PostgreSQL instead of columnar databases that are likely to perform way better for these types of analytical workloads?
reply
ddtaylor
10 months ago
[-]
Does this use pgVector?
reply
z-gort
10 months ago
[-]
It lets you visualize any column with type "EMBEDDING", and I think the only way to get that is through pgvector/pgvectorscale.
reply
samanthasu
10 months ago
[-]
That is excellent visualization!
reply
dmezzetti
10 months ago
[-]
Very interesting, thanks for sharing!
reply
thangngoc89
10 months ago
[-]
As a non-native English speaker and not very familiar with vector database, the title seems very ambiguous to me. I understand it as Postgres as a GUI for some VectorDB. Upon closer inspection, I realized that "Postgres as a VectorDB" is a full name. Maybe shorten that thing to something else. Just my 2 cents.
reply
colechristensen
10 months ago
[-]
It’s just plain bad grammar, the title should be

“Show HN: Reservoirs Lab, a Postgres VectorDB GUI”

reply
monsieurbanana
10 months ago
[-]
I think the confusing term is "VectorDB" which sounds like a name of an existing product. "A vector db GUI powered by Postgres"?
reply