There's no single best way to store information
47 points
4 hours ago
| 8 comments
| quantamagazine.org
| HN
bob1029
1 hour ago
[-]
The best way to store information depends on how you intend to use (query) it.

The query itself represents information. If you can anticipate 100% of the ways in which you intend to query the information (no surprises), I'd argue there might be an ideal way to store it.

reply
alphazard
38 minutes ago
[-]
This is exactly right, and the article is clickbait junk.

Given the domain name, I was expecting something about the physics of information storage, and some interesting law of nature. Instead, the article is a bad introduction to data structures.

reply
DixieDev
12 minutes ago
[-]
This line of thought works for storage in isolation, but does not hold up if write speed is a concern.
reply
danans
41 minutes ago
[-]
Pedantic, but the article is talking about the way we structure/organize information, not store it. When I think of the word store, I think of the physical medium. The way we organize the information is only partially related
reply
ronsor
36 minutes ago
[-]
There are plenty of good enough ways:

* For lossless compression of generic data, gzip or zstd.

* For text, documentation, and information without fancy formatting, markdown, which is effectively a plain-text superset.

* For small datasets, blobs, objects, and what not, JSON.

* For larger datasets and durable storage, SQLite3.

Whenever there's text involved, use UTF-8. Whenever there's dates, use ISO8601 format (UTC timezone) or Unix timestamps.

Following these rules will keep you happy 80% of the time.

reply
nicbou
14 minutes ago
[-]
One format I'm missing: storage for conversations and social media posts. Both are complex media (text + images/videos + metadata), and one is actually a collection of such posts.

How would you go about storing those in a somewhat human-readable format? My goal is to archive my chats and social media activity.

reply
notepad0x90
13 minutes ago
[-]
would it be more accurate to say "to store using information, using information"? Since everything ultimately boils down to information, humans trying to store information is a bit recursive?
reply
__MatrixMan__
2 hours ago
[-]
There are, however, several objectively bad ways. In "Service Model" (a novel that I recommend) a certain collection of fools decides to sort bits by whether it's a 1 or a 0, ending up with a long list of 0's followed by a long list of 1's.
reply
Rygian
1 hour ago
[-]
In a similar vein, someone decided that everyone should have subdirectories under home named "Pictures", "Videos", "Music", "Documents", …
reply
dsvf
1 hour ago
[-]
It _does_ open up amazing opportunities for compression though.
reply
lo_zamoyski
1 hour ago
[-]
That depends on the aim. The purpose of something determines how fitting the means are.

Also, let us not confuse "relative" with "not objective". My father is objectively my father, but he is objectively not your father.

reply
pbreit
2 hours ago
[-]
Postgres is close.
reply
imhoguy
1 hour ago
[-]
I would say Sqlite is closer, you find it on every phone, browser, server. I bet Sqlite files will be still readable in 2100. And I love Postgres.
reply
rmwaite
1 hour ago
[-]
reply
mjevans
1 hour ago
[-]
Or (real) SQLite for reasonably scaled work.

I also like (old) .ini / TOML for small (bootstrap) config files / data exchange blobs a human might touch.

+

Re: PostgreSQL 'unfit' conversations.

I'd like some clearer examples of the desired transactions which don't fit well. After thinking about them in the background a bit I've started to suspect it might be an algorithmic / approach issue obscured by storage patterns that happen to be enabled by some other platforms which work 'at scale' supported by hardware (to a given point).

As an example of a pattern that might not perform well under PostgreSQL, something like lock-heavy multiple updates for flushing a transaction atomically. E.G. Bank Transaction Clearance like tasks. If every single double-entry booking requires it's own atomic transaction that clearly won't scale well in an ACID system. Rather the smaller grains of sand should be combined into a sandstone block / window of transactions which are processed at the same time and applied during the same overall update. The most obvious approach to this would be to switch from a no-intermediate values 'apply deduction and increment atomically' action to a versioned view of the global data state PLUS a 'pending transactions to apply' log / table (either/both can be sharded). At a given moment the transactions can be reconciled, for performance a cache for 'dirty' accounts can store the non-contested value of available balance.

reply
andix
56 minutes ago
[-]
It's always Markdown. Markdown is the best way to store information. ;)
reply
jsight
41 minutes ago
[-]
Claude Code vehemently agrees.
reply
andix
35 minutes ago
[-]
You're absolutely right!
reply
kittikitti
1 hour ago
[-]
Or it's the opposite, where the slowest possible retrieval time is the intended effect, as is the basis of many cryptographic algorithms.
reply