I'd love to see those results independently verified, and I'd also love a good explanation of how they're getting such great performance.
Typically, the recipe is to keep the hot parts of the data structure in SRAM in CPU caches and a lot of SIMD. At the time of those measurements, USearch used ~100 custom kernels for different data types, similarity metrics, and hardware platforms. The upcoming release of the underlying SimSIMD micro-kernels project will push this number beyond 1000. So we should be able to squeeze a lot more performance later this year.
A better comparison would be with Meta's FAISS
This sort of behaviour is now absolutely rampant in the AI industry.
The appeal of in-process is obvious — no network hop, simpler deployment, lower latency on small-to-medium datasets. But I'm curious about the operational story. How do you handle index updates while serving queries? Is there a write lock during re-indexing, or can you do hot swaps?
The mceachen comment about sqlite-vec being brute-force is interesting too. For apps with under ~100K embeddings, does the algorithmic difference even matter practically, or is the simpler deployment story more important than raw QPS?
for web stuff, e.g. community/forums/docs/small sites which usually don't even have 1M rows of data, precomputing embeddings and storing them and running on a small vector search like this somewhere is much simpler/cheaper than running external services
it's the operational hassle of not having to deal with a dozen+ external services, logins, apis, even if they're free
(I do like mixed bread for that, but I'd prefer it to be on my own lightweight server or serverless deployment)
See e.g., https://scikit-learn.org/stable/auto_examples/neighbors/plot...