The concept sounded absurd—storing text in video? But modern video codecs have been optimized for compression over decades. So, I converted text into QR codes, then encoded those as video frames, letting H.264/H.265 handle the compression.
The results were surprising. 10,000 PDFs compressed down to a 1.4GB video file. Search latency was around 900ms compared to Pinecone’s 820ms—about 10% slower. However, RAM usage dropped from over 8GB to just 200MB, and it operates entirely offline without API keys or monthly fees.
Technically, each document chunk is encoded into QR codes, which become video frames. Video compression handles redundancy between similar documents effectively. Search works by decoding relevant frame ranges based on a lightweight index.
You get a vector database that’s just a video file you can copy anywhere.
You'd probably have a smaller database and better results crunching text into a zip file, or compressed rows in a sqlite database, or any other simple random-access format.
The vector database previously used must have been very inefficient.
Especially if it was taking ~800 ms to do a search. At that speed, you'd probably be better off storing the documents as plain text, without the whole inefficient QR/H264 round-trip.
In Go, I once implemented a naive brute-force cosine search (linear scan in memory), and for 1 million 350-dimensional vectors, I got results in under 1 second too IIRC.
I ended up just setting up OpenSearch, which gives you hybrid semantic + full-text search out of the box (BM25 + kNN). In my tests, it gave better results than semantic search alone, something like +15% better retrieval.
Definitely not. None of the "redundancy" between, or within, texts (e.g. repeated phrases) is apparent in a sequence of images of QR codes.
I've only played with TF-IDF/BM25 as opposed to vector searches, but there's no way your queries should be taking so long on such a small corpus. Querying 10k documents feels like 2-10ms territory, not 900ms.
And how big was the total text in those PDFs?
I'd expect video frames to be maximally efficient if you sorted the chunks by image similarity somehow.
Also isn't there a risk of losing data by doing this since for example h.265 is lossy?
I see there's a 30% redundancy per document, but I'm not sure every frame in a h265 file is guaranteed to have more than 70% of a qr code being readable. And if it's not readable, then that could mean losing an entire chunk of data.
I'd definitely calculate the probability of losing data if storing text with a lossy compression.
Check the indexer: https://github.com/Olow304/memvid/blob/main/memvid/index.py
It's possible to be less efficient, but it takes real creativity. You could print out the QR codes and scan them again, or encode the QR codes in the waveform of an MP3 and take a video of that.
It's really, really bad.