Zvec: A lightweight, fast, in-process vector database
153 points
by dvrp
2 days ago
| 8 comments
| github.com
| HN
https://zvec.org/en/
simonw
11 hours ago
[-]
Their self-reported benchmarks have them out-performing pinecone by 7x in queries-per-second: https://zvec.org/en/docs/benchmarks/

I'd love to see those results independently verified, and I'd also love a good explanation of how they're getting such great performance.

reply
ashvardanian
10 hours ago
[-]
8K QPS is probably quite trivial on their setup and a 10M dataset. I rarely use comparably small instances & datasets in my benchmarks, but on 100M-1B datasets on a larger dual-socket server, 100K QPS was easily achievable in 2023: https://www.unum.cloud/blog/2023-11-07-scaling-vector-search... ;)

Typically, the recipe is to keep the hot parts of the data structure in SRAM in CPU caches and a lot of SIMD. At the time of those measurements, USearch used ~100 custom kernels for different data types, similarity metrics, and hardware platforms. The upcoming release of the underlying SimSIMD micro-kernels project will push this number beyond 1000. So we should be able to squeeze a lot more performance later this year.

reply
antirez
1 hour ago
[-]
It is absolutely possible and even not so hard. If you use Redis Vector Sets you will easily see 20k - 50k (depending on hardware) queries per second, with tens of millions of entries, but the results don't get much worse if you scale more. Of course all that serving data from memory like Vector Sets do. Note: not talking about RedisSearch vector store, but the new "vector set" data type I introduced a few months ago. The HNSW implementation of vector sets (AGPL) is quite self contained and easy to read if you want to check how to achieve similar results.
reply
itake
1 hour ago
[-]
Pinecone scales horizontally (which creates overhead, but accomodates more data).

A better comparison would be with Meta's FAISS

reply
panzi
9 hours ago
[-]
PGVectorScale claims even more. Also want to see someone verify that.
reply
rvz
1 hour ago
[-]
Exactly. We should always be asking these sort of questions and take these self-reported benchmarks with a grain of salt until independent sources can verify such claims rather than trusting results from the creators themselves. Otherwise it falls into biased territory.

This sort of behaviour is now absolutely rampant in the AI industry.

reply
aktuel
3 hours ago
[-]
I recently discovered https://www.cozodb.org/ which also vector search built-in. I just started some experiments with it but so far I'm quite impressed. It's not in active development atm but it seems already well rounded for what it is so depending on the use-case it does not matter or may even be an advantage. Also with today's coding agent it shouldn't be too hard to scratch your own itch if needed.
reply
clemlesne
12 hours ago
[-]
Did someone compared with uSearch (https://github.com/unum-cloud/USearch)?
reply
neilellis
10 hours ago
[-]
That I would like to see too, usearch is amazingly fast, 44m embeddings in < 100ms
reply
OfficialTurkey
9 hours ago
[-]
I haven't been following the vector db space closely for a couple years now, but I find it strange that they didn't compare their performance to the newest generation serverless vector dbs: Pinecone Serverless, turbopuffer, Chroma (distributed, not the original single-node implementation). I understand that those are (mostly) hosted products so there's not a true apples-to-apples comparison with the same hardware, but surely the most interesting numbers are cost vs performance.
reply
wittlesus
6 hours ago
[-]
Genuine question for anyone running in-process vector search in production: when do you reach for something like this vs. an external service?

The appeal of in-process is obvious — no network hop, simpler deployment, lower latency on small-to-medium datasets. But I'm curious about the operational story. How do you handle index updates while serving queries? Is there a write lock during re-indexing, or can you do hot swaps?

The mceachen comment about sqlite-vec being brute-force is interesting too. For apps with under ~100K embeddings, does the algorithmic difference even matter practically, or is the simpler deployment story more important than raw QPS?

reply
yawnxyz
5 hours ago
[-]
useful for adding semantic search to tiny bits of data, e.g. collections of research papers in a folder on my computer, etc.

for web stuff, e.g. community/forums/docs/small sites which usually don't even have 1M rows of data, precomputing embeddings and storing them and running on a small vector search like this somewhere is much simpler/cheaper than running external services

it's the operational hassle of not having to deal with a dozen+ external services, logins, apis, even if they're free

(I do like mixed bread for that, but I'd prefer it to be on my own lightweight server or serverless deployment)

reply
dev_l1x_be
2 hours ago
[-]
i think the question is really: can I turn my search problem into a in-process vector search problem where I can scale with the number of processes.
reply
NitpickLawyer
4 hours ago
[-]
These engagement bots are getting tiresome...
reply
cjonas
10 hours ago
[-]
How does this compare to duckdbs vector capabilities (vss extension)?
reply
jgalt212
7 hours ago
[-]
Yes, nothing on that or sqlite-vec (both of which seem to be apples to apples comparisons).

https://zvec.org/en/docs/benchmarks/

reply
mceachen
7 hours ago
[-]
I maintain a fork of sqlite-vec (because there hasn't been activity on the main repo for more than a year): sqlite-vec is great for smaller dimensionality or smaller cardinality datasets, but know that it's brute-force, and query latency scales exactly linearly. You only avoid full table scans if you add filterable columns to your vec0 table and include them in your WHERE clause. There's no probabilistic lookup algorithm in sqlite-vec.
reply
_pdp_
11 hours ago
[-]
I thought you need memory for these things and CPU is not the bottleneck?
reply
binarymax
10 hours ago
[-]
I haven’t looked at this repo, but new techniques taking advantage of nvme and io_uring make on disk performance really good without needing to keep everything in RAM.
reply
skybrian
11 hours ago
[-]
Are these sort of similarity searches useful for classifying text?
reply
stephantul
3 hours ago
[-]
Yes. This is known as a knn classifier. Knn classifiers are usually worse than other simple classifiers, but trivial to update and use.

See e.g., https://scikit-learn.org/stable/auto_examples/neighbors/plot...

reply
CuriouslyC
11 hours ago
[-]
Embeddings are good at partitioning document stores at a coarse grained level, and they can be very useful for documents where there's a lot of keyword overlap and the semantic differentiation is distributed. They're definitely not a good primary recall mechanism, and they often don't even fully pull weight for their cost in hybrid setups, so it's worth doing evals for your specific use case.
reply
visarga
4 hours ago
[-]
"12+38" won't embed close to "50", as you said they capture only surface level words ("a lot of keyword overlap") not meaning, it's why for small scale I prefer a folder of files and a coding agent using grep/head/tail/Python one liners.
reply
neilellis
10 hours ago
[-]
Yes, also for semantic indexes, I use one for person/role/org matches. So that CEO == chief executive ~= managing director good when you have grey data and multiple look up data sources that use different terms.
reply
esafak
11 hours ago
[-]
You could assign the cluster based on what the k nearest neighbors are, if there is a clear majority. The quality will depend on the suitability of your embeddings.
reply
OutOfHere
11 hours ago
[-]
It altogether depends on the quality and suitability of the provided embedding vector that you provide. Even with a long embedding vector using a recent model, my estimation is that the classification will be better than random but not too accurate. You would typically do better by asking a large model directly for a classification. The good thing is that it is often easy to create a small human labeled dataset and estimate the error confusion matrix via each approach.
reply