Main features:
- Lightweight: the base package only uses Numpy
- Unified interface: use any of the supported algorithms and backends with a single interface: HNSW, Annoy, FAISS, and many more algorithms and libraries are supported
- Easy evaluation: evaluate the performance of your backend with a simple function to measure queries per second vs recall
- Serialization: save and load your index for persistence
After working with a large number of ANN libraries over the years, we found it increasingly cumbersome to learn the interface, features, quirks, and limitations of every library. After writing custom evaluation code to measure the speed and performance for the 100th time to compare libraries, we decided to build this as a way to easily use a large number of algorithms and libraries with a unified, simple interface that allows for quick comparison and evaluation.
We are curious to hear your feedback! Are there any algorithms that are missing that you use? Any extra evaluation metrics that are useful?
I would actually perhaps think the next step would be to add some sugar that allows you to run a random / fixed grid of hyper-parameters and get a report of accuracy and speed for your specific data set.
1. When you say backends, do you plan to integrate like a client with some "vector" stores. 2. Also any benchmarks? 3. Lastly, why python?
2: we adopted the same methodology as ann-benchmarks for our evaluation, so technically the benchmarks there are valid for the backends we support. However it's a good suggestion to add those explicitly to the repo, I'll add a todo for that.
3: mainly because a: it's the language we are most the comfortable with developing in, b: it's the most widely used and adopted language for ML and c: (almost) all the algorithms we support are written in C/C++/Cython already.
So these are nearest neighbor search implementations, not database backends.
Hybrid search is a really cool idea though; it's not something we support at the moment, but definitely something we could investigate and add as an upcoming feature, thanks for the suggestion!