Better Than JSON
20 points
1 hour ago
| 11 comments
| aloisdeniel.com
| HN
pyrolistical
10 minutes ago
[-]
Compressed JSON is good enough and requires less human communication initially.

Sure it will blow up in your face when a field goes missing or value changes type.

People who advocate paying the higher cost ahead of time to perfectly type the entire data structure AND propose a process to do perform version updates to sync client/server are going to lose most of the time.

The zero cost of starting with JSON is too compelling even if it has a higher total cost due to production bugs later on.

When judging which alternative will succeed, lower perceived human cost beats lower machine cost every time.

This is why JSON is never going away, until it gets replaced with something with even lower human communication cost.

reply
pzmarzly
39 minutes ago
[-]
> With Protobuf, that’s impossible.

Unless your servers and clients push at different time, thus are compiled with different versions of your specs, then many safety bets are off.

There are ways to be mostly safe (never reuse IDs, use unknown-field-friendly copying methods, etc.), but distributed systems are distributed systems, and protobuf isn't a silver bullet that can solve all problems on author's list.

On the upside, it seems like protobuf3 fixed a lot of stuff I used to hate about protobuf2. Issues like:

> if the field is not a message, it has two states:

> - ...

> - the field is set to the default (zero) value. It will not be serialized to the wire. In fact, you cannot determine whether the default (zero) value was set or parsed from the wire or not provided at all

are now gone if you stick to using protobuf3 + `message` keyword. That's really cool.

reply
connicpu
24 seconds ago
[-]
Regardless of whether you use JSON or Protobuf, the only way to be safe from version tears in your serialization format is to enforce backwards compatibility in your CI pipeline by testing the new version of your service creates responses that are usable by older versions of your clients, and vice versa.
reply
brabel
20 minutes ago
[-]
No type system survives going through a network.
reply
codewritero
43 minutes ago
[-]
I love to see people advocating for better protocols and standards but seeing the title I expected the author to present something which would be better in the sense of supporting the same or more use cases with better efficiency and/or ergonomics and I don't think that protobuf does that.

Protobuf has advantages, but is missing support for a tons of use cases where JSON thrives due to the strict schema requirement.

A much stronger argument could be made for CBOR as a replacement for JSON for most use cases. CBOR has the same schema flexibility as JSON but has a more concise encoding.

reply
port11
39 minutes ago
[-]
I think the strict schema of Protobuf might be one of the major improvements, as most APIs don't publish a JSON schema? I've always had to use ajv or superstruct to make sure payloads match a schema, Protobuf doesn't need that (supposedly).
reply
youngtaff
22 minutes ago
[-]
We need browsers to support CBOR APIs… and it shouldn’t be that hard as they all have internal implementations now
reply
written-beyond
41 minutes ago
[-]
Idk I built a production system and ensured all data transfers, client to server and server to client were proto buf and it was a pain.

Technically, it sounds really good but the actual act of managing it is hell. That or I need a lot of practice to use them, at that point shouldn't I just use JSON and get on with my life.

reply
Arainach
10 minutes ago
[-]
What issues did you have? In my experience, most things that could be called painful with protobuf would be bigger pains with things like JSON.

Making changes to messages in a backwards-compatible way can be annoying, but JSON allowing you to shoot yourself in the foot will take more time and effort to fix when it's corrupting data in prod than protobuf giving you a compile error would.

reply
brabel
23 minutes ago
[-]
Mandatory comment about ASN.1, a protocol from 1984, already did what Protobuf does, with more flexibility. Yes, it's a bit ugly but if you stick to the DER encoding it's really not worse than Protbuf at all. Check out the Wikipedia example:

https://en.wikipedia.org/wiki/ASN.1#Example_encoded_in_DER

Protobuf is ok but if you actually look at how the serializers work, it's just too complex for what it achieves.

reply
zzo38computer
14 minutes ago
[-]
I also think ASN.1 DER is better (there are other formats, but in my opinion, DER is the only good one, because BER is too messy). I use it in some of my stuff, and when I can, my new designs also use ASN.1 DER rather than using JSON and Protobuf etc. (Some types are missing from standard ASN.1 but I made up a variant called "ASN.1X" which adds some additional types such as key/value list and some others. With the key/value list type added, it is now a superset of the data model of JSON, so you can convert JSON to ASN.1X DER.)
reply
dgan
3 minutes ago
[-]
I honestly looked up for a encoder/decoder for python/c++ application, and couldnt find anything usable; i guess i would need to contact the purchase department for a license (?), while with protobuf i can make the decision myself & all alone
reply
bloppe
18 minutes ago
[-]
What makes it too complex in your opinion?
reply
pphysch
9 minutes ago
[-]
> ASN.1, a protocol from 1984, already did what Protobuf does, with more flexibility.

After working heavily with SNMP across a wide variety of OEMs, this flexibility becomes a downside. Or SNMP/MIBs were specified at the wrong abstraction level, where the ASN.1 flexibility gives mfgs too much power to do insane and unconventional things.

reply
Jemaclus
29 minutes ago
[-]
"Better than JSON" is a pretty bold claim, and even though the article makes some great cases, the author is making some trade-offs that I wouldn't make, based on my 20+ year career and experience. The author makes a statement at the beginning: "I find it surprising that JSON is so omnipresent when there are far more efficient alternatives."

We might disagree on what "efficient" means. OP is focusing on computer efficiency, where as you'll see, I tend to optimize for human efficiency (and, let's be clear, JSON is efficient _enough_ for 99% of computer cases).

I think the "human readable" part is often an overlooked pro by hardcore protobuf fans. One of my fundamental philosophies of engineering historically has been "clarity over cleverness." Perhaps the corollary to this is "...and simplicity over complexity." And I think protobuf, generally speaking, falls in the cleverness part, and certainly into the complexity part (with regards to dependencies).

JSON, on the other hand, is ubiquitous, human readable (clear), and simple (little-to-no dependencies).

I've found in my career that there's tremendous value in not needing to execute code to see what a payload contains. I've seen a lot of engineers (including myself, once upon a time!) take shortcuts like using bitwise values and protobufs and things like that to make things faster or to be clever or whatever. And then I've seen those same engineers, or perhaps their successors, find great difficulty in navigating years-old protobufs, when a JSON payload is immediately clear and understandable to any human, technical or not, upon a glance.

I write MUDs for fun, and one of the things that older MUD codebases do is that they use bit flags to compress a lot of information into a tiny integer. To know what conditions a player has (hunger, thirst, cursed, etc), you do some bit manipulation and you wind up with something like 31 that represents the player being thirsty (1), hungry (2), cursed (4), with haste (8), and with shield (16). Which is great, if you're optimizing for integer compression, but it's really bad when you want a human to look at it. You have to do a bunch of math to sort of de-compress that integer into something meaningful for humans.

Similarly with protobuf, I find that it usually optimizes for the wrong thing. To be clear, one of my other fundamental philosophies about engineering is that performance is king and that you should try to make things fast, but there are certainly diminishing returns, especially in codebases where humans interact frequently with the data. Protobufs make things fast at a cost, and that cost is typically clarity and human readability. Versioning also creates more friction. I've seen teams spend an inordinate amount of effort trying to ensure that both the producer and consumer are using the same versions.

This is not to say that protobufs are useless. It's great for enforcing API contracts at the code level, and it provides those speed improvements OP mentions. There are certain high-throughput use-cases where this complexity and relative opaqueness is not only an acceptable trade off, but the right one to make. But I've found that it's not particularly common, and people reaching for protobufs are often optimizing for the wrong things. Again, clarity over cleverness and simplicity over complexity.

I know one of the arguments is "it's better for situations where you control both sides," but if you're in any kind of team with more than a couple of engineers, this stops being true. Even if your internal API is controlled by "us," that "us" can sometimes span 100+ engineers, and you might as well consider it a public API.

I'm not a protobuf hater, I just think that the vast majority of engineers would go through their careers without ever touching protobufs, never miss it, never need it, and never find themselves where eking out that extra performance is truly worth the hassle.

reply
Arainach
3 minutes ago
[-]
If you want human readable, there are text representations of protobuf for use at rest (checked in config files, etc.) while still being more efficient over the wire.

In terms of human effort, a strongly typed schema rather than one where you have to sanity check everything saves far more time in the long run.

reply
catchmeifyoucan
26 minutes ago
[-]
I wonder if we can write an API w/ JSON the usual way and change the final packaging to send it over protobuf.
reply
bglusman
12 minutes ago
[-]
Sure... https://protobuf.dev/programming-guides/json/

I was pushing at one point for us to have some code in our protobuf parsers that would essentially allow reading messages in either JSON or binary format, though to be fair there's some overhead that way by doing some kind of try/catch, but, for some use cases I think it's worth it...

reply
pstuart
24 minutes ago
[-]
If you're using Go then this framework let's you work with protobufs and gives you a JSON rest-like service for free: https://github.com/grpc-ecosystem/grpc-gateway
reply
wilg
17 minutes ago
[-]
One of the best parts of Protobuf is that there's a fully compatible JSON serialization and deserialization spec, so you can offer a parallel JSON API with minimal extra work.
reply
bglusman
11 minutes ago
[-]
Yes! Came to comments to see if that was discussed/commented on above with link: https://protobuf.dev/programming-guides/json/
reply
esafak
40 minutes ago
[-]
It's premature and maybe presumptuous of him to be advertising protobufs when he hasn't heard of the alternatives yet.
reply
hangonhn
1 minute ago
[-]
[delayed]
reply
spagoop
46 minutes ago
[-]
Is it just me or is this article insanely confusing? With all due respect to the author, please be mindful of copy editing LLM-assisted writing.

There is a really interesting discussion underneath of this as to the limitations of JSON along with potential alternatives, but I can't help but distrust this writing due to how much it sounds like an LLM.

reply
port11
41 minutes ago
[-]
I don't think it's LLM-generated or even assisted. It's kinda like how I write when I don't want to really argue a point but rather get to the good bits.

Seems like the author just wanted to talk about Protobuf without bothering too much about the issues with JSON (though some are mentioned).

reply
dkdcio
39 minutes ago
[-]
do you have any evidence that the author used a LLM? focusing on the content, instead of the tooling used to write the content, leads to a lot more productive discussions

I promise you cannot tell LLM-generated content from non-LLM generated content. what you think you’re detecting is poor quality, which is orthogonal to the tooling used

reply
spagoop
28 minutes ago
[-]
Fair point, to be constructive here, LLMs seem to love lists and emphasizing random words / phrases with bold. Those two are everywhere. Not a smoking gun but enough to tune out.

I am not dismissing this as being slop and actually have no beef with using LLMs to write but yes, as you call out, I think it's just poorly written or perhaps I'm not the specific audience for this.

Sorry if this is bad energy, I appreciate the write up regardless.

reply
pavel_lishin
9 minutes ago
[-]
> Is it just me or is this article insanely confusing?

I didn't find it confusing.

I found it unconvincing, but the argument itself was pretty clear. I just disagreed with it.

reply
volemo
33 minutes ago
[-]
S-expressions exist since 1960, what more do you need? /s
reply