https://github.com/Z-Gort/Reservoirs-Lab/blob/main/src/elect...
Dimensionality reduction from n >> 2 dimensions to 2 dimensions can be very fickle, so the hyperparameters matter. Your visualization can change significantly significantly depending on choice of metric.
https://umap-learn.readthedocs.io/en/latest/parameters.html
You may want to consider projecting to more than 2 dimensions too. You may ask, how does one visualize more than two dimensions? Through a scatterplot matrix of 2 axes at a time.
https://seaborn.pydata.org/examples/scatterplot_matrix.html
These are used for PCA-type multivariate analyses to visualize latent variables in higher dimensions than 2, but 2 dimensions at a time. Some clustering behavior that cannot be seen in 2 axes might be seen in higher dimensions. We used to do this our lab to find anomalies in high dimensions.
IMO it's very difficult to make a "fire and forget" embedding interpreter. Maybe I never found the right parameters to umap but the results of running it (or any dimension reduction algo) always left me a bit underwhelmed.
I just tried it. My verdict?
PacMap >= UMAP >> t-SNE.
UMAP captures the basic pattern but PacMap makes it crisper.
Granted, it requires to prepare your data into TSV files first.
Doing dimensionality reduction locally posed a few challenges in terms of application size--the idea was that by analyzing just a few thousand randomly sampled points you can get an idea of your data through a local GUI where you interact with your data and see some correlated metadata.
Not sure if there's too much need for an individual GUI to go along with Postgres as a VectorDB, maybe people just do analysis separate from a normal "GUI"? But maybe not.
What you think?
Once loaded, I get the error "Table must contain a UUID column for vector visualization."
I'm assuming it's trying to find an ID column for grouping? Can we manually specify this? My ID columns are varchars.
Put the animated gif at the top
Add subtitles to the gif explaining what you're doing.
“Show HN: Reservoirs Lab, a Postgres VectorDB GUI”