Ask HN: Alternatives to Vector DB?
21 points
1 month ago
| 22 comments
| HN
A while back I was looking for a vector database that would work across Windows / Mac / Linux platforms. Some of the options required specific processors like Intel. I am curious if there are any alternatives to a Vector DB that can run cross platform and are easy to setup?
romanhn
1 month ago
[-]
There's Postgres with pgvector extension.
reply
bilater
1 month ago
[-]
yup keep it simple. hybrid search on supabase with keyword + pg vector. use a good embedding model with inner product (not cosine so its faster). there was a good article about learnings in this space recently here https://news.ycombinator.com/item?id=43299659
reply
saturn_vk
1 month ago
[-]
Do you have recommendations for embedding models?
reply
dghlsakjg
1 month ago
[-]
This would be my vote.

There are probably more performant options, but for universality, you are best off going with a popular SQL DB that has vector extensions or support. SQLLite is another option if you want even more portability.

reply
muzani
1 month ago
[-]
pgvector is very performant for tiny scale, especially if they don't expect to update the knowledge base constantly and don't care about millisecond differences in speed. It's also probably the only major vector db with ACID. And it's substantially cheaper than many others.

I haven't looked into SQLite and such, but many of the other SQL type of options were not good last I checked. And it takes quite a bit of effort to dig through them because all of these will show research that puts themselves as the best lol.

reply
redskyluan
1 month ago
[-]
How is pg performed on windows? Never tried before
reply
tmaly
1 month ago
[-]
I have used pg years ago on windows when it was like version 7, so I suspect it still works well.
reply
codingmoney
1 month ago
[-]
You can try Milvus, Weaviate, or Qdrant. They all support multiple platforms and are relatively easy to set up.
reply
miZero0
1 month ago
[-]
+1 for Weaviate. It’s free, open source, cross platform and has a very active developer community with good docs/events/hackathons. Really friendly people to deal with in their community Slack too.
reply
carlbren
1 month ago
[-]
I use usearch for the engine https://github.com/unum-cloud/usearch/. It's fast. The vectors are stored in DB and added to usearch at startup for retreival. There is also FAISS.
reply
geuis
1 month ago
[-]
Postgres with pgvector extension. Have been using it in production for months and it works great.
reply
bobosha
1 month ago
[-]
We used Qdrant in production - it's a solid vector db offering and highly recommend. However we are moving everything to Postgres with pgvector for simplicity i.e fewer moving parts. It was a PITA keeping data synced between pgsql <> qdrant.
reply
bitforge
1 month ago
[-]
mind sharing what your embedding generation architecture looks like and what data sync issues you've encountered?
reply
babyent
1 month ago
[-]
Why not a Kafka pipeline?
reply
schreiaj
1 month ago
[-]
SQLite via libsql is my go to for lightweight stuff.

Has vector embedding columns out of the box.

reply
tmaly
1 month ago
[-]
Thanks! I had no idea this existed.
reply
andre-z
1 month ago
[-]
Qdrant runs on Linux/Mac/Windows and on x86/ARM processors
reply
yawnxyz
1 month ago
[-]
if you have small data requirements (less than 100mb) I just run FAISS for embeddings and store the rows of data and vectors in JSON.

I then have a small python script to run a vector similarity algo like cosine similarity or whatever. It's not the fanciest or most efficient, but it works surprisingly well.

I use it to search/rank my own blog posts (~300+) for relevancy. So the entire thing is only like 10mb. Probably will get super slow for really large dataasets

reply
samber
1 month ago
[-]
Elasticsearch is a good bet, if you need to use multiple filters with your queries, and when you grow above the acceptable size of an in-memory database.
reply
OutOfHere
1 month ago
[-]
If you strictly want things that are not vector databases, you can choose to categorize each item with multi-hierarchical categories. To query, just filter for items belonging to the categories that interest you. As a bonus, this approach is a lot more interpretable than using embeddings.
reply
jankovicsandras
1 month ago
[-]
Postgres is a good idea.

Shameless plug: https://github.com/jankovicsandras/plpgsql_bm25

BM25 search implemented in PL/pgSQL, there's also an example of Hybrid (BM25+pgvector) search in the repo.

reply
BenoitP
1 month ago
[-]
Brute force it. Gemm routines can give you a best dot product among 300k vectors well under a second
reply
vismit2000
1 month ago
[-]
Exactly. See this for details: https://news.ycombinator.com/item?id=43162995
reply
vismit2000
1 month ago
[-]
You might also want to take a look at: The best way to use text embeddings portably is with Parquet and Polars - https://news.ycombinator.com/item?id=43162995
reply
abraxas
1 month ago
[-]
postgres pgvector is a decent implementation by now which is likely available in most setups that postgres will run on.

You can also use faiss if you want it all in memory at all times and have the RAM to support it.

What's your use case and the volume of vectors you want to look up?

reply
tmaly
1 month ago
[-]
I have a knowledge store of all my notes in markdown. I wanted a way to incorporate them with RAG.
reply
abraxas
1 month ago
[-]
How many vectors is that? If under 100K then keep it all in memory. Hell probably in the low millions would be OK. You can literally write them to a flat file and load into memory and do a full scan for every lookup and it will be fast enough. If not then use faiss with hnsw indexing and it will be screaming fast.
reply
jusob
1 month ago
[-]
I've been using sqlite-vec, a module for sqlite3: https://github.com/asg017/sqlite-vec
reply
softwaredoug
1 month ago
[-]
There’s a bazillion python libraries and regular databases / search engines with vector indices that run cross platform. Postgres, hnswlib, Elasticsearch, etc.
reply
Prosammer
1 month ago
[-]
Check out lanceDB, it looks awesome for this.
reply
redskyluan
1 month ago
[-]
Running everything on windows plaform is not easy... maybe just run with a container?
reply
usgroup
1 month ago
[-]
The correct answer is duckdb with the vec extension :-)
reply
kledru
1 month ago
[-]
two years ago annoy library from spotify was good enough for small RAG, stored everything in single file...
reply
bschmidt80
1 month ago
[-]
Can I ask what the use case is?

Always fascinated to hear why people are using vector DBs, especially outside AI embeddings

reply
tmaly
1 month ago
[-]
I wanted something simple and easy to setup for RAG
reply
bschmidt80
1 month ago
[-]
LlamaIndex VectorStoreIndex defaults to in-memory which allows you to get up and running very quickly - had pretty great experiences with their TS repo and even contributed a little to it.

fromDocuments -> VectorStoreIndex -> asQueryEngine it will be in-memory.

Easily add pgvector as other recommend here when/if you need to persist embedded data.

reply
ayende
1 month ago
[-]
Starting in v7.0, ravendb has ve tor search and ai integration Run on windows , linux and mac
reply