The performance was part of the reason (compared to serializing using JSON) but the main reason was just tooling support for automatic type checking. gRPC can generate types from a schema for all popular languages out there.
We ended up taking another route but I think the idea has merit.
That is exactly what COM/WinRT, XPC, Android Binder, D-BUS are.
Naturally they have several optimisations for local execution.
https://github.com/grpc/grpc-java/blob/master/binder/src/mai...
The overhead is low, and you get best practices like oneway calls and avoiding the transaction limit for free. It also comes with built in security policies for servers and clients.
COM can run over the network (DCOM), inside the same computer on its own process (out-proc), inside the client (in-proc), designed for in-proc but running as out-proc (COM host).
So for max performance, with the caveat of possibly damaging the host, in-proc will do it, and be faster than any kind of sockets.
Ah the good ol' Blaster worm...
Think of the loopback, my programs don't know (or at least shouldn't know) that IPs like 127.0.0.5 are special, but then the kernel knows that messages there are not going to go on any wire and handles that differently.
I think that this factor might be the ultimate source of my discomfort with standards like REST. Things like using HTTP verbs and status codes, and encoding parameters into the request's URL, mean that there's almost not even an option to choose a communication channel that's lighter-weight than HTTP.
Some generalities:
Function call: The developer just calls it. Blocks until completion, errors are due to bad parameters or a resource availability problem. They are handled with exceptions or return-code checks. Tests are also simple function calls. Operationally everything is, to borrow a phrase from aviation regarding non-retractable landing gear, "down and welded".
IPC: Architectually, and as a developer, you start worrying about your function as a resource. Is the IPC recipient running? It's possible it's not; that's probably treated as fatal and your code just returns an error to its caller. You're more likely to have a m:n pairing between caller and callee instances, so requests will go into a queue. Your code may still block, but with a timeout, which will be a fatal error. Or you might treat it as a co-routine, with the extra headaches of deferred errors. You probably won't do retries. Testing has some more headaches, with IPC resource initialization and tear-down. You'll have to test queue failures. Operations is also a bit more involved, with an additional resource that needs to be baby-sat, and co-ordinated with multiple consumers.
RPC: IPC headaches, but now you need to worry about lost messages, and messages processed but the acknowledgements were lost. Temporary failures need to be faced and re-tried. You will need to think in terms of "best effort", and continually make decisions about how that is managed. You'll be dealing with issues such as at-least-once delivery vs. at-most-once. Consistency issues will need to be addressed much more than with IPC, and they will be thornier problems. Resource availability awareness will seep into everything; application-level back-pressure measures _should_ be built-in. Treating RPC as simple blocking calls will be a continual temptation; if you or less-enlightened team members subcumb then you'll have all kinds of flaky issues. Emergent, system-wide behavior will rear its ugly head, and it will involve counter-intuitive interactions (such as bigger buffers reducing throughput). Testing now involves three non-trivial parts--your code, the called code, and the communications mechanisms. Operations gets to play with all kinds of fun toys to deploy, monitor, and balance usage.
My original implementation just pinned one GPU to its own thread then used message passing between them in the same process but Nvidia's NCCL library hates this for reasons I haven't fully figured out yet.
I considered gRPC for IPC since I was already using it for the server's API but dismissed it because it was an order of magnitude slower and I didn't want to drag async into the child PIDs.
Serializing the tensors between processes and using the Servo team's ipc-channel crate[0] has worked surprisingly well. If you're using Rust and need a drop-in (ish) replacement for the standard library's channels, give it a shot.
On the other hand, I really like the design of Cap'n Proto, and the library is more lightweight (and hence easier) to compile. But there, it is not clear on which language implementation you can rely other than C++. Also it feels like there are maintainers paid by Google for gRPC, and for Cap'n Proto it's not so clear: it feels like it's essentially Cloudflare employees improving Cap'n Proto for Cloudflare. So if it works perfectly for your use-case, that's great, but I wouldn't expect much support.
All that to say: my preferred choice for that would technically be Cap'n Proto, but I wouldn't dare making my company depend on it. Whereas nobody can fire me for depending on Google, I suppose.
That's correct. At present, it is not anyone's objective to make Cap'n Proto appeal to a mass market. Instead, we maintain it for our specific use cases in Cloudflare. Hopefully it's useful to others too, but if you choose to use it, you should expect that if any changes are needed for your use case, you will have to make those changes yourself. I certainly understand why most people would shy away from that.
With that said, gRPC is arguably weird in its own way. I think most people assume that gRPC is what Google is built on, therefore it must be good. But it actually isn't -- internally, Google uses Stubby. gRPC is inspired by Stubby, but very different in implementation. So, who exactly is gRPC's target audience? What makes Google feel it's worthwhile to have 40ish(?) people working on an open source project that they don't actually use much themselves? Honest questions -- I don't know the answer, but I'd like to.
(FWIW, the story is a bit different with Protobuf. The Protobuf code is the same code Google uses internally.)
(I am the author of Cap'n Proto and also was the one who open sourced Protobuf originally at Google.)
It's at least used for the public Google Cloud APIs. That by itself guarantees a rather large scale, whether they use gRPC in prod or not.
I finally figured out it was a problem with specific pairs of servers. Server A could talk to C, and D, but would timeout talking to B. The gRPC call just... wouldn't.
One good thing is you do have the source to everything. After much digging through amazingly opaque code, it became clear there was a problem with a feature we didn't even need. If there are multiple sub-channels between servers A and B. gRPC will bundle them into one connection. It also provides protocol-level in-flight flow limits, both for individual sub-channels and the combined A-B bundle. It does it by using "credits". Every time a message is sent from A to B it decrements the available credit limit for the sub-channel, and decrements another limit for the bundle as a whole. When the message is processed by the recipient process then the credit is added back to the sub-channel and bundle limits. Out of credits? Then you'll have to wait.
The problem was that failed transactions were not credited back. Failures included processing time-outs. With time-outs the sub-channel would be terminated, so that wasn't a problem. The issue was with the bundle. The protocol spec was (is?) silent as to who owned the credits for the bundle, and who was responsible for crediting them back in failure cases. The gRPC code for Go, at the time, didn't seem to have been written or maintained by Google's most-experienced team (an intern, maybe?), and this was simply dropped. The result was the bundle got clogged, and A and B couldn't talk. Comm-level backpressure wasn't doing us any good (we needed full app-level), so for several years we'd just patch new Go libraries and disable it.
gRPC ships with its own networking stack, which is one reason why those libs are heavy. Connect libraries leverage each ecosystem's native networking stack (e.g. net/http in Go, NSURLSession in Swift, etc.), which means any other libraries that work with the standard networking stack interop well with Connect.
It might make sense. Usually, if you're using IPC, you need it to be as fast as possible and there are several solutions that are much faster.
E.g. Kythe (kythe.io) was designed so that its individual language indexers run with a main driver binary written in Go, and then a subprocess binary written in.... whatever. There's a requirement to talk between the two, but it's not really a lot of traffic (relative to e.g. the CPU cost of the subprocess doing compilation).
So what happens in practice is that we used Stubby (like gRPC, except not public), because it was low overhead* to write the handler code for it on both ends, and got us some free other bits as well.
* Except when it wasn't lol. It worked great for the first N languages written in langs with good stubby support. But then weird shit (for Google) crawled out of the weeds that didn't have stubby support, so there's some handwaving going on for the long tail.
I don't see how gRPC could be any worse than that.
(The previous iteration before MQTT used HTTP polling and callbacks worked on top of an SSH reverse tunnel abomination. Using MQTT for IPC was kind of an afterthought. The SSH Cthulhu is still in use for everyday remote management because you cannot do Ansible over MQTT, but we're slowly replacing it with Wireguard. I gotta admit that out of all VPN technologies we've experimented with, SSH transport has been the most reliable one in various hostile firewalled environments.)
It's a 4-tier arhcitecture (clients - front end service - query service - database) auth system, and all communication is over grpc (except to the database). Jimmy talks about the advantages of having a very clear contract between systems.
There's a ton of really great nitty gritty detail about being super fast with gRPC. https://github.com/planetscale/vtprotobuf for statical-size allocating protobuf rather than slow reflection-based dynamic size. Upcoming memory pooling work to avoid allocations at all. Tons of advantages for observability right out of the box. It's subtle but I also get the impression most gRPC stubs are miserably bad, that Authzed had to go long and far to get away from a lot of gRPC tarpits.
This is one of my favorite talks from 2024, and strongly sold me.on how viable gRPC is for internal services. Even if I were doing local multi-process stuff, I would definitely consider gRPC after this talk. The structure & clarity & observability are huge wins, and the performance can be really good if you need it.
https://youtu.be/1PiknT36218#t=12m 12min is the internal cluster details.
>It's subtle but I also get the impression most gRPC stubs are miserably bad, that Authzed had to go long and far to get away from a lot of gRPC tarpits.
They aren't terrible, but they also aren't a user experience you want to deliver directly to your customers.
AFAIK at least on linux there is no difference between using a UDS and a tcp socket connected to localhost.
No idea what "TI Envelope" is, and a Google search doesn't come up with usable results (oh the irony...) - if it's a logging/metric thing, those are hard to get to perform well regardless of socket type. We ended up using batching with mmap'd buffers for crash analysis. (I.e. the mmap part only comes in if the process terminates abnormally, so we can recover batched unwritten bits.)
No, I am just saying that the unix socket is not Brawndo (or maybe it is?), it does not necessarily have what IPCs crave. Sprinkling it into your architecture may or may not be relevant to the efficiency and performance of the result.
We started out discussing AF_UNIX vs. AF_INET6. If you can conceptually use something faster than sockets that's great, but if you're down to a socket, unix domain will generally beat inet domain...
The only way to know is to dig through CLs? Write a test.
There's also automated tooling to compare protobuff schemas for breaking changes.
If you are building something that needs binary performance that GRPC provides, go for it, but pretending there is no extra cost over doing the obvious thing is not true.
No, it by definition does not, because JSON has no schema. Only your application contains and knows the (expected) structure of the data, but you literally cannot know what structure any random blob of JSON objects will have without a separate schema. When you read a random /docs page telling you "the structure of the resulting JSON object from this request is ...", that's just a schema but written in English instead of code. This has big downstream ramifications.
For example, many APIs make the mistake of parsing JSON and only returning some opaque "Object" type, which you then have to map onto your own domain objects, meaning you actually parse every JSON object twice: once into the opaque structure, and once into your actual application type. This has major efficiency ramifications when you are actually dealing with a lot of JSON. The only way to do better than this is to have a schema in some form -- any form at all, even English prose -- so you can go from the JSON text representation directly into your domain type at parse-time. This is part of the reason why so many JSON libraries in every language tend to have some high level way of declaring a JSON object in the host language, typically as some kind of 'struct' or enum, so that they can automatically derive an actually efficient parsing step and skip intermediate objects. There's just no way around it. JSON doesn't have any schema, and that's part of its appeal, but in practice one always exists somewhere.
You can use protobuf in text-based form too, but from what you said, you're probably screwed anyway if your coworkers are just churning stuff and changing the values of fields and stuff randomly. They're going to change the meaning of JSON fields willy nilly too and there will be nothing to stop you from landing back in step 1.
I will say that the quality of gRPC integrations tends to vary wildly based on language though, which adds debt, you're definitely right about that.
Maybe the tools are fantastic not but I still think being able to debug messages without them is an advantage in almost all systems, you probably don’t need the level of performance GRPC provides.
If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?
That's too easy. What if I give you a 200KiB JSON object with 40+ nested fields that's whitespace stripped and has base64 encoded values? Its "structure" is a red herring. It is not a matter of text or binary. The net result is I still have to use a tool to inspect it, even if that's only something like gron/jq in order to make it actually human readable. But at the end of the day the structure is a concern of the application, I have to evaluate its structure in the context of that application. I don't just look at JSON objects for fun. I do it mostly to debug stuff. I still need the schematic structure of the object to even know what I need to write.
FWIW, I normally use something like grpcurl in order to do curl-like requests/responses to a gRPC endpoint and you can even have it give you the schema for a given service. This has worked quite well IME for almost all my needs, but I accept with this stuff you often have lots of "one-off" cases that you have to cobble stuff together or just get dirty with printf'ing somewhere inside your middleware, etc.
> I would also add the GRPC implementation I used in Javascript (long ago) was not actually checking the types of the field in a lot of cases so rather than being a schema that rejects if some field is not a text field it would just return binary junk. JSON Schema or almost anything else will give you a parsing error instead.
Yes, I totally am with you on this. Many of the implementations just totally suck and JSON is common enough nowadays that you kind of have to at least have something that doesn't completely fall over, if you want to be taken remotely seriously. It's hard to write a good JSON library, but it's definitely harder to write a good full gRPC stack. I 100% have your back on this. I would probably dislike gRPC even more but I'm lucky enough to use it with a "good" toolkit (Rust/Prost.)
> If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?
I mean, if your entire complaint is about text vs binary, not efficiency or correctness, JSON Protobuf seems like it fits your needs. You still get the other benefits of gRPC you'd have anywhere (an honest-to-god schema, better transport efficiency over mandated HTTP/2, some amount of schema-generic middleware, first-class streaming, etc etc.)
FWIW, I don't particularly love gRPC. And while I admit I loathe JSON, I'm mainly pushing back on the notion that JSON has some "schema" or structure. No, it doesn't! Your application has and knows structure. A JSON object is just a big bag of stuff. For all its failings, gRPC having a schema is a matter of it actually putting the correct foot first and admitting that your schema is real, it exists, and most importantly can be written down precisely and checked by tools!
Sure, the removal of a field can cause an application level error, but that is probably the most benign form of failure there is. What's worse is when no error occurs and the data is simply reinterpreted to fit the schema. Then your database will slowly fill up with corrupted garbage data and you'll have to restore from a backup.
What you have essentially accomplished in your response is to miss the entire point.
There are also other problems with protobuf in the sense that the savings aren't actually as big as you'd expect. E.g. there is still costly parsing, the data transmitted over the wire isn't significantly smaller unless you have data that is a poor fit for JSON.
- You can encode the protocol buffers as JSON if you want a text based format.
1) Never change the type of a field
2) Never change the semantic meaning of a field
3) If you need a different type or semantics, add a new field
Pretty simple if you ask me.GRPC for most people is a completely black box with unclear error conditions that are not as clear to me at least. For example what happens if I have an old schema and I'm not seeing a field, there's loads of things that can be wrong - old services, old client, even messages not being routed correctly due to networking settings in docker or k8s.
Are you denying there is absolutely tones to learn here and it is trickier to debug and maintain?
The Go part I'm building has been much more solid in contrast.
This solves it: https://github.com/cpcloud/protoletariat
For example, here's the official tutorial for using the async callback interfaces in gRPC: https://grpc.io/docs/languages/cpp/callback/
It encourages you to write code with practices that are quite universally considered bad in modern C++ due to a very high chance of introducing memory bugs, such as allocating objects with new and expecting them to clean themselves up via delete this;. Idiomatic modern C++ would be using smart pointers, or go a completely different route with co-routines and no heap-allocated objects.