> .map() is special. It does not send JavaScript code to the server, but it does send something like "code", restricted to a domain-specific, non-Turing-complete language. The "code" is a list of instructions that the server should carry out for each member of the array.
> But the application code just specified a JavaScript method. How on Earth could we convert this into the narrow DSL? The answer is record-replay: On the client side, we execute the callback once, passing in a special placeholder value. The parameter behaves like an RPC promise. However, the callback is required to be synchronous, so it cannot actually await this promise. The only thing it can do is use promise pipelining to make pipelined calls. These calls are intercepted by the implementation and recorded as instructions, which can then be sent to the server, where they can be replayed as needed.
db.People.Where(p => p.Name == "Joe")
`Where` takes an `Expression<Func<T, bool>> predicate`. It isn't taking the `Func` itself, but an `Expression` of it so that it can look at the code rather than execute it. It can see that it's trying to match the `Name` field to the value "Joe" and translate that into a SQL WHERE clause.Since JS doesn't have this, they have to pass in a special placeholder value and try to record what the code is doing to that value.
It feels like C# has an answer to every problem I’ve ever had with other languages - dynamic loading, ADTs with pattern matching, functional programming, whatever this expression tree is, reflection, etc etc. Yet somehow it’s still a niche language that isn't widely used (outside of particular ecosystems).
I've worked only at startups/small businesses since I graduated university and it's all been in C#.
fucking nice ecosystem
You were maybe already getting at it, but as a kitchen sink language the answer is "simplicity". All these diverse language features increase cognitive load when reading code, so it's a complexity/utility tradeoff
If you dare leave the safety of a compiler you'll find that Sublime Merge can still save you when rewriting a whole part of an app. That and manual testing (because automatic testing is also clutter).
If you think it's more professional to have a compiler I'd like to agree but then why did I run into a PHP job when looking for a Typescript one? Not an uncommon unfolding of events.
Granted, I started out on LISP. My version of "easy to read and write" might be slightly masochistic. But I love Perl and Python and Javascript are definitely "you can jump in and get shit done if you have worked in most languages. It might not be idiomatic, but it'll work"...
It does require twice the lines of PHP code to make a Ruby or Python program equivalent, or more if you add phpdoc and static types though, so it is easier to read/write Ruby or Python, but only after learning the details of the language. Ruby's syntax is very expressive but very complex if you don't know it by heart.
Specifically, I'd like to be able to have "inches" as a generic type, where it could be an int, long, float, double. Then I'd also like to have "length" as a generic type where it could be inches as a double, millimeters as a long, ect, ect.
I know they added generic numbers to the language in C# 7, so maybe there is a way to do it?
Pi types, existential types and built-in macros to name a few.
I wonder why they don't just do `.toString()` on the mapping function and then parse the resulting Javascript into an AST and figure out property accesses from that. At the very least, that'd allow the code to properly throw an error in the event the callback contains any forbidden or unsupported constructs.
Unfortunately, "every object is truthy" and "every object can be coerced to a string even if it doesn't have a meaningful stringifier" are just how JavaScript works and there's not much we can do about it. If not for these deficiencies in JS itself, then your code would be flagged by the TypeScript compiler as having multiple type errors.
On a little less trivial skim over it looks like the intention here isn't to map property-level subsets returned data (e.g., only getting the `FirstName` and `LastName` properties of a larger object); as much as it is to do joins and it's not data entities being provided to the mapping function but RpcPromises so individual property values aren't even available anyway.
So I guess I might argue that map() isn't a good name for the function because it immediately made me think it's for doing a mapping transformation and not for basically just specifying a join (since you can't really transform the data) since that's what map() can do everywhere else in Javascript. But for all I know that's more clear when you're actually using the library, so take what I think with a heaping grain of salt. ;)
That sounds incredibly complicated, and not something we could do in a <10kB library!
The suggestion was to parse _JavaScript_. (That's what `.toString()` on a function does... gives you back the JavaScript.)
select(c for c in Customer if sum(c.orders.total_price) > 1000)
I love the hackiness of it.Along with https://pypi.org/project/pony-stubs/, you get decent static typing as well. It's really quite something.
It generally unrolls as a `for loop` underneath, or in this case LINQ/SQL.
C# was innovative for doing it first in the scope of SQL. I remember the arrival of LINQ... Good times.
Func<..> is lambda that can only be invoked.
Expression<Func..>> is an AST of a lambda that can be transformed by your code/library.
One thing we were surprisingly able to do is trace the js spread operation as that is a rare case of something you can intercept in JS.
Kenton, if you are reading this, could you add a series of fake operators (eq, gt, in etc) to provide the capability to trace and perform them remotely?
But also, apps can already do this themselves. Since the record/replay mechanism already intercepts any RPC calls, the server can simply provide a library of operations as part of its RPC API. And now the mapper callback can take advantage of those.
I think this is the approach I prefer: leave it up to servers to provide these ops if they want to. Don't extend the protocol with a built-in library of ops.
const isHighValueCustomer = (row: { user: User; order: Order }) =>
row.user.active && row.order.amount > 1000
But if I'm understanding the docs correctly on this point, doesn't this have to be: const isHighValueCustomer = (row: { user: User; order: Order }) =>
and(row.user.active, gt(row.order.amount, 1000))
let friendsWithPhotos = friendsPromise.map(friend => {
return {friend, photo: friend.has_photo ? api.getUserPhoto(friend.id) : default_photo};
}
Looks totally reasonable, but it's not going to work properly. You might not even realise until it's deployed.You can't do this in most languages because of if statements, which cannot be analyzed in that way and break the abstraction. You'd either need macro-based function definitions (Lisp, Elixir), bytecode inspection (like in e.g. Pytorch compile), or maybe built-in laziness (Haskell).
Edit: Or full object orientation like in Smalltalk, where if statements are just calls to .ifTrue and .ifFalse on a true/false object, and hence can be simulated.
The only catch is your function needs to have no side effects (other than calling RPC methods). There are a lot of systems out there that have similar restrictions.
I'm trying to understand how well this no-side-effects footgun is defended against.
https://github.com/cloudflare/capnweb/blob/main/src/map.ts#L... seems to indicate that if the special pre-results "record mode" call of the callback raises an error, the library silently bails out (but keeps anything already recorded, if this was a nested loop).
That catches a huge number of things like conditionals on `item.foo` in the map, but (a) it's quite conservative and will fail quite often with things like those conditionals, and (b) if I had `count += 1` in my callback, where count was defined outside the scope, now that's been incremented one extra time, and it didn't raise an error.
React Hooks had a similar problem, with a constraint that hooks couldn't be called conditionally. But they solved their DX by having a convention where every hook would start with `use`, so they could then build linters that would enforce their constraint. And if I recall, their rules-of-hooks eslint plugin was available within days of their announcement.
The problem with `map` is that there are millions of codebases that already use a method called `map`. I'd really, really love to see Cap'n Web use a different method name - perhaps something like `smartMap` or `quickMap` or `rpcMap` - that is more linter-friendly. A method name that doesn't require the linter to have access to strong typing information, to understand that you're mapping over the special RpcPromise rather than a low-level array.
Honestly, it's a really cool engineering solve, with the constraint of not having access to the AST like one has in Python. I do think that with wider adoption, people will find footguns, and I'd like this software to get a reputation for being resilient to those!
client.getAll({userIds}).map((user) => user.updatedAt == new Date().toLocaleString() ? client.photosFor(user.id) : {})
or without the conditional, client.getAll({userIds}).map((user) => client.photos({userId: user.id, since: new Date(user.updatedAt).toLocaleString()})
Like it has to call toLocaleString on the server, no?You can't perform computation on a promise. The only thing you can do is pipeline on it.
`user.updatedAt == date` is trying to compare a promise against a date. It won't type check.
`new Date(user.updatedAt)` is passing a promise to the Date constructor. It won't type check.
For any other function accepting a callback, the function on the server will receive an RPC stub, which, when called, makes an RPC back to the caller, calling the original version of the function.
This is usually what you want, and the semantics are entirely normal.
But for .map(), this would defeat the purpose, as it'd require an additional network round-trip to call the callback.
map() works for cases where you don't need to compute anything in the callback, you just want to pipeline the elements into another RPC, which is actually a common case with map().
If you want to filter server-side, you could still accomplish it by having the server explicitly expose a method that takes an array as input, and performs the desired filter. The server would have to know in advance exactly what filter predicates are needed.
But in the concrete:
* Looking up some additional data for each array element is a particularly common thing to want to do.
* We can support it nicely without having to create a library of operations baked into the protocol.
I really don't want to extend the protocol with a library of operations that you're allowed to perform. It seems like that library would just keep growing and add a lot of bloat and possibly security concerns.
(But note that apps can actually do so themselves. See: https://news.ycombinator.com/item?id=45339577 )
I did a spiritually similar thing in JS and Dart before where we read the text of the function and re-parsed (or used mirrors in Dart) to ensure that it doesn't access any external values.
SturdyRefs are tricky. My feeling is that they don’t really belong in the RPC protocol itself, because the mechanism by which you restore a SturdyRef is very dependent on the platform in which you're running. Cloudflare Workers, for example, may soon support storing capabilities into Durable Object storage. But the way this will work is very tied to the Cloudflare Workers platform. Sandstorm, similarly, had a persistent capability mechanism, but it only made sense inside Sandstorm – which is why I removed the whole notion of persistent capabilities from Cap’n Proto itself.
The closest thing to a web standard for SturdyRefs is OAuth. I could imagine defining a mechanism for SturdyRefs based on OAuth refresh tokens, which would be pretty cool, but it probably wouldn’t actually be what you want inside a specific platform like Sandstorm or Workers.
There's an interesting parallel with ML compilation libraries (TensorFlow 1, JAX jit, PyTorch compile) where a tracing approach is taken to build up a graph of operations that are then essentially compiled (or otherwise lowered and executed by a specialized VM). We're often nowadays working in dynamic languages, so they become essentially the frontend to new DSLs, and instead of defining new syntax, we embed the AST construction into the scripting language.
For ML, we're delaying the execution of GPU/linalg kernels so that we can fuse them. For RPC, we're delaying the execution of network requests so that we can fuse them.
Of course, compiled languages themselves delay the execution of ops (add/mul/load/store/etc) so that we can fuse them, i.e. skip over the round-trip of the interpreter/VM loop.
The power of code as data in various guises.
Another angle on this is the importance of separating control plane (i.e. instructions) from data plane in distributed systems, which is any system where you can observe a "delay". When you zoom into a single CPU, it acknowledges its nature as a distributed system with memory far away by separating out the instruction pipeline and instruction cache from the data. In Cap'n Web, we've got the instructions as the RPC graph being built up.
I just thought these were some interesting patterns. I'm not sure I yet see all the way down to the bottom though. Feels like we go in circles, or rather, the stack is replicated (compiler built on interpreter built on compiler built on interpreter ...). In some respect this is the typical Lispy code is data, data is code, but I dunno, feels like there's something here to cut through...
> We're often nowadays working in dynamic languages, so they become essentially the frontend to new DSLs, and instead of defining new syntax, we embed the AST construction into the scripting language.
And I'd say that TypeScript is the real game-changer here. You get the flexibility of the JavaScript runtime (e.g., how Cap'n Web cleverly uses `Proxy`s) while still being able to provide static types for the embedded DSL you're creating. It’s the best of both worlds.
I've been spending all of my time in the ORM-analog here. Most ORMs are severely lacking on composability because they're fundamentally imperative and eager. A call like `db.orders.findAll()` executes immediately and you're stuck without a way to add operations before it hits the database.
A truly composable ORM should act like the compilers you mentioned: use TypeScript to define a fully typed DSL over the entirety of SQL, build an AST from the query, and then only at the end compile the graph into the final SQL query. That's the core idea I'm working on with my project, Typegres.
If you find the pattern interesting: https://typegres.com/play/
But at the same time, something feels off about it (just conceptually, not trying to knock your money-making endeavor, godspeed). Some of the issues that all of these hit is:
- No printf debugging. Sometimes you want things to be eager so you can immediately see what's happening. If you print and what you see is <RPCResultTracingObject> that's not very helpful. But that's what you'll get when you're in a "tracing" context, i.e. you're treating the code as data at that point, so you just see the code as data. One way of getting around this is to make the tracing completely lazy, so no tracing context at all, but instead you just chain as you go, and something like `print(thing)` or `thing.execute()` actually then ships everything off. This seems like how much of Cap'n Web works except for the part where they embed the DSL, and then you're in a fundamentally different context.
- No "natural" control flow in the DSL/tracing context. You have to use special if/while/for/etc so that the object/context "sees" them. Though that's only the case if the control flow is data-dependent; if it's based on config values that's fine, as long as the context builder is aware.
- No side effects in the DSL/tracing context because that's not a real "running" context, it's only run once to build the AST and then never run again.
Of the various flavors of this I've seen, it's the ML usage I think that's pushed it the furthest out of necessity (for example, jax.jit https://docs.jax.dev/en/latest/_autosummary/jax.jit.html, note the "static*" arguments).
Is this all just necessary complexity? Or is it because we're missing something, not quite seeing it right?
I've spent a lot of time thinking about this in the database context:
> No printf debugging
Yeah, spot on. The solutions here would be something like a `toSQL` that let's you inspect the compiled output at any step in the AST construction.
Also, if the backend supports it, you could compile a `printf` function all the way to the backend (this isn't supported in SQL though)
> No "natural" control flow in the DSL/tracing context
Agreed -- that can be a source of confusion and subtle bugs.
You could have a build rule that actually compile `if`/`while`/`for` into your AST (instead of evaluate them in the frontend DSL). Or you could have custom lint rules to forbid them in the DSL.
At the same time -- part of what makes query builders so powerful is the ability to dynamically construct queries. Runtime conditionals is what makes that possible.
> No side effects in the DSL/tracing context because that's not a real "running" context
Agreed -- similar to the above: this is something that needs to be forbidden (e.g., by a lint rule) or clearly understood before using it.
> Is this all just necessary complexity? Or is it because we're missing something, not quite seeing it right?
My take is that, at least in the SQL case: 100% the complexity is justified.
Big reasons why: 1. A *huge* impediment to productive engineering is context switching. A DSL in the same language as your app (i.e., an ORM) makes the bridge to your application code also seamless. (This is similar to the argument of having your entire stack be a single language) 2. The additional layer of indirection (building an AST) allows you to dynamically construct expressions in a way that isn't possible in SQL. This is effectively adding a (very useful) macro system on top of SQL. 3. In the case of Typescript, because its type-system is so flexible, you can have stronger typing on your DSL than the backend target.
tl;dr is these DSLs can enable better ergonomics in practice and the indirection can unlock powerful new primitives
Python does let you mess around with the AST, however, there is no static typing, and let's just say that the ML ecosystem will <witty example of extreme act> before they adopt static typing. So it's not possible to build these graphs without doing this kind of hacky nonsense.
For another example, torch.compile() works at the python bytecode level. It basically monkey patches the PyEval_EvalFrame function evaluator of Cpython for all torch.compile decorated functions. Inside that, it will check for any operators e.g BINARY_MULTIPLY involving torch tensors, and it records that. Any if conditions in the path get translated to guards in the resulting graph. Later, when said guard fails, it recomputes the subgraph with the complementary condition (and any additional conditions) and stores this as an alternative JIT path, and muxes these in the future depending on the two guards in place now.
Jax works by making the function arguments proxies and recording the operations like you mentioned. However, you cannot use normal `if`, you use lax.cond(), lax.while(), etc,. As a result, it doesn't recompute graph when different branches are encountered, it only computes the graph once.
In a language such as C#, Rust, or a statically typed lisp, you wouldn't need to do any of this monkey business. There's probably already a way in the rust toolchain to interject at the MIR stage and have your own backend convert these to some Tensor IR.
Maybe totally off but would dependent types be needed here? The runtime value of one “language” dictates the code of another. So you have some runtime compilation. Seems like dependent types may be the language of jit-compiled code.
Anyways, heady thoughts spurred by a most pragmatic of libraries. Cloudflare wants to sell more schlock to the javascripters and we continue our descent into madness. Einsteins building AI connected SaaS refrigerators. And yet there is beauty still within.
Reading this from TFA ...
Alice and Bob each maintain some state about the connection. In particular, each maintains an "export table", describing all the pass-by-reference objects they have exposed to the other side, and an "import table", describing the references they have received.
Alice's exports correspond to Bob's imports, and vice versa. Each entry in the export table has a signed integer ID, which is used to reference it. You can think of these IDs like file descriptors in a POSIX system. Unlike file descriptors, though, IDs can be negative, and an ID is never reused over the lifetime of a connection.
At the start of the connection, Alice and Bob each populate their export tables with a single entry, numbered zero, representing their "main" interfaces.
Typically, when one side is acting as the "server", they will export their main public RPC interface as ID zero, whereas the "client" will export an empty interface. However, this is up to the application: either side can export whatever they want.
... sounds very similar to how Binder IPC (and soon RPC) works on Android.I'm surprised how little code is actually involved here, just looking at the linked GitHub repo. Is that really all there is to it? In theory, it shouldn't be too hard to port the server side to another language, right? I'm interested in using it in an Elixir server for a JS/TS frontend.
For that matter, the language porting seems like a pretty good LLM task. Did you use much LLM-generated code for this repo? I seem to recall kentonv doing an entirely AI-generated (though human-reviewed, of course) proof of concept a few months ago.
I don't think LLMs would be capable of writing this library (at least at present). The pieces fit together like a very intricate puzzle. I spent a lot more time thinking about how to do it right, than actually coding.
Very different from my workers-oauth-provider library, where it was just implementing a well-known spec with a novel (yet straightforward) API.
The code might port nicely to another dynamic language, like Python, but I think you'd have a hard time porting it to a statically-typed language. There's a whole lot of iterating over arbitrary objects without knowing their types.
That's just parametric polymorphism.
Those three words are doing a lot of work there.
1. What's the best way to do app deploys that update the RPC semantics? In other words how do you ensure that the client and server are speaking the same version of the RPC? This is a challenge that protos/grpc/avro explicitly sought to solve.
2. Relatedly, what's the best way to handle flaky connections? It seems that the export/import table is attached directly to a stateful WS connection such that if the connection breaks you'd lose the state. In principle there should be nothing preventing a client/server caching this state and reinstantiating it on reconnect. That said, given these tables can contain closures, they're not exactly serializable so you could run into memory issues. Curious if the team has thought about this.
Absolutely mind blowing work!
2. After losing the connection, you'll have to reconnect and reconstruct the objects from scratch. The way I've structured this in an actual React app is, I pass the main RPC stub as an argument to the top-level component. It calls methods to get sub-objects and passes them down to various child components. When the connection is lost, I recreate it, and then pass the new stub into the top-level component, causing it to "rerender" just like any other state change. All the children will fetch the sub-objects they need again.
If you have an object that represents some sort of subscription with a callback, you'll need to design the API so that when initiating the subscription, the caller can specify the last message they saw on the subscription, so that it can pick up where they left off without missing anything.
Hmm, I suppose we'll need to do a blog post of design patterns at some point...
However, stumbled over this:
The fact is, RPC fits the programming model we're used to. Every programmer is trained to think in terms of APIs composed of function calls, not in terms of byte stream protocols nor even REST. Using RPC frees you from the need to constantly translate between mental models, allowing you to move faster.
The fact that this is, in fact, true is what I refer to as "The gentle tyranny of Call/Return"
We're used to it, doing something more appropriate to the problem space is too unfamiliar and so more or less arbitrary additional complexity is...Just Fine™.
https://www.hpi.uni-potsdam.de/hirschfeld/publications/media...
Maybe it shouldn't actually be true. Maybe we should start to expand our vocabulary and toolchest beyond just "composed function calls"? So composed function calls are one tool in our toolchest, to be used when they are the best tool, not used because we have no reasonable alternative.
https://blog.metaobject.com/2019/02/why-architecture-oriente...
What i'm getting at is: For the places where other tools are better (like the UI example), we already have other tools (signals, observables, effects, runes,...). And for the places like client/server-communication: This is kind of where "call/return" usually shines.
The WWW would like a quick word with you. CORBA as well, if it could get a word in.
> we already have other tools (signals, observables, effects, runes,...)
We can build them. We can't express them. We can also build everything out of Turing Machines, or Lambda Calculus or NAND gates.
2. You are making an invalid assumption, which is that we only get to have one tool in our toolbox, and therefore one tool has to "win". Even if function calls were the best tool, they would still not always be the right one.
With the benefit of hindsight, it’s clear that these properties of structured programs, although helpful, do not go to the heart of the matter. The most important difference between structured and unstructured programs is that structured programs are designed in a modular way. Modular design brings with it great productivity improvements. First of all, small modules can be coded quickly and easily. Second, general-purpose modules can be reused, leading to faster development of subsequent programs. Third, the modules of a program can be tested independently, helping to reduce the time spent debugging.
However, there is a very important point that is often missed. When writing a modular program to solve a problem, one first divides the problem into subproblems, then solves the subproblems, and finally combines the solutions.
The ways in which one can divide up the original problem depend directly on the ways in which one can glue solutions together. Therefore, to increase one’s ability to modularize a problem conceptually, one must provide new kinds of glue in the programming language.
-- John Hughes, Why Functional Programming Matters
https://www.cse.chalmers.se/~rjmh/Papers/whyfp.pdf
via
https://blog.metaobject.com/2019/02/why-architecture-oriente...
3. Procedure calls are not particularly composable
See CORBA vs. REST.
One thing about a traditional RPC system where every call is top-level and you pass keys and such on every call is that multiple calls in a sequence can usually land on different servers and work fine.
Is there a way to serialize and store the import/export tables to a database so you can do the same here, or do you really need something like server affinity or Durable Objects?
When using WebSockets, that's the lifetime of the WebSocket.
But when using the HTTP batch transport, a session is a single HTTP request, that performs a batch of calls all at once.
So there's actually no need to hold state across multiple HTTP requests or connections, at least as far as Cap'n Web is concerned.
This does imply that you shouldn't design a protocol where it would be catastrophic if the session suddenly disconnected in the middle and you lost all your capabilities. It should be possible to reconnect and reconstruct them.
FWIW the way I've handled this in a React app is, the root stub gets passed in as a prop to the root component, and children call the appropriate methods to get whatever objects they need from it. When the connection is lost, a new one is created, and the new root stub passed into the root component, which causes everything downstream to re-run exactly as you'd want. Seems to work well.
RPC SDKs should have session management, otherwise you end up in this situation:
"Any sufficiently complicated gRPC or Cap'n'Proto program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Akka"
It looks like the server affinity is accomplished by using websockets. The http batching simply sends all the requests at once and then waits for the response.
I don't love this because it makes load balancing hard. If a bunch of chatty clients get a socket to the same server, now that server is burdened and potentially overloadable.
Further, it makes scaling in/out servers really annoying. Persistent long lived connections are beasts to deal with because now you have to handle that "what do I do if multiple requests are in flight?".
One more thing I don't really love about this, it requires a timely client. This seems like it might be trivial to DDOS as a client can simply send a stream of push events and never pull. The server would then be burdened to keep those responses around so long as the client remains connected. That seems bad.
Architecturally I don't think it makes sense to support this in a load balancer, you instead want to pass back a "cost" or outright decisions to your load balancing layer.
Also note the "batch-pipelining" example is just a node.js client; this already supports not just browsers as clients, so you could always add another layer of abstraction (the "fundamental theorem of software engineering").
First, I absolutely love Capn Proto and the ideas of chaining calls on objects. It's amazing to see what's possible with CapNweb.
However, one of the examples compares it to GraphQL, which I think falls a bit short of how enterprises use the Query language in real life.
First, like others mentioned, you'll have N+1 problems for nested lists. That is, if we call comments() on each post and author() on each comment, we absolutely don't want to have one individual call per nested object. In GraphQL, with the data loader pattern, this is just 3 calls.
Second, there's also an element of security. Advanced GraphQL gateways like WunderGraph's are capable of implementing fine grained rate limiting that prevent a client to ask for too much data. With this RPC object calling style, we don't have a notion of "Query Plans", so we cannot statically analyze a combination of API calls and estimate the cost before executing them.
Lastly, GraphQL these days is mostly used with Federation. That means a single client talks to a Gateway (e.g. WunderGraph's Cosmo Router) and the Router distributed the calls efficiently across many sub services (Subgraphs) with a query planner that finds the optimal way to load information from multiple services. While capNweb looks amazing, the reality is that a client would have to talk to many services.
Which brings me to my last point. Instead of Going the capNweb vs GraphQL route, I'd think more about how the two can work together. What if a client could use CapNweb to talk to a Federation Router that allows it to interact with entities, the object definitions in a GraphQL Federation system.
I think this is really worth exploring. Not going against other API styles but trying to combine the strengths.
Why is that a problem? As far as I can tell, those calls are all done on the server, where they're cheap normal function calls, and the results are all sent back with 1 roundtrip; because of the pipelining.
In this paradigm, the places where you are calling map() could probably be replaced with explicit getComments() or getCommentsWithAuthors() or two methods that do just one query each.
https://www.sqlite.org/np1queryprob.html
But you are right that this won't work great with traditional databases without significantly more magic on the server side, and in that sense the comparison with GraphQL is... aggressive :)
It is still much better than making all the calls client-side, of course. And there are many use cases where you're not querying a database.
And maybe there can be some fusion between GraphQL server infrastructure and this RPC-oriented syntax that gives people the best of both worlds?
Although it seems to solve one of the problems that GraphQL solved that trpc doesn't (the ability to request nested information from items in a list or properties of an object without changes to server side code), there is no included solution for the server side problem that creates that the data loader pattern was intended to solve, where a naive GraphQL server implementation makes a database query per item in a list.
Until the server side tooling for this matures and has equivalents for the dataloader pattern, persisted/allowlist queries, etc., I'll probably only use this for server <-> server (worker <-> worker) or client <-> iframe communication and keep my client <-> server communication alongside more pre-defined boundaries.
However, if your database is sqlite in a Cloudflare Durable Object, and the RPC protocol is talking directly to it, then N+1 selects are actually just fine.
I've been working on this issue from the other side. Specifically, a TS ORM that has the level of composability to make promise pipelining a killer feature out of the box. And analagous to Cap'n Web's use of classes, it even models tables as classes with methods that return composable SQL expressions.
If curious: https://typegres.com/play/
If I run, in client side Cap'n Web land (from the post): ``` let friendsWithPhotos = friendsPromise.map(friend => { return {friend, photo: api.getUserPhoto(friend.id))}; } ```
And I implement my server class naively, the server side implementation will still call `getUserPhoto` on a materialized friend returned from the database (with a query actually being run) instead of an intermediate query builder.
@kentonv, I'm tempted to say that in order for a query builder like typegres to do a good job optimizing these RPC calls, the RpcTarget might need to expose the pass by reference control flow so the query builder can decide to never actually run "select id from friends" without the join to the user_photos table, or whatever.
Agreed! If we use `map` directly, Cap'n Web is still constrained by the ORM.
The solution would be what you're getting at -- something that directly composes the query builder primitives. In Typegres, that would look like this:
``` let friendsWithPhotos = friendsPromise.select((f) => ({...f, photo: f.photo()}) // `photo()` is a scalar subquery -- it could also be a join ```
i.e., use promise pipelining to build up the query on the server.
The idea is that Cap'n Web would allow you to pipeline the Typegres query builder operations. Note this should be possible in other fluent-based query builders (e.g., Kysely/Drizzle). But where Typegres really synergizes with Cap'n Web is that everything is already expressed as methods on classes, so the architecture is capability-ready.
P.S. Thanks for your generous offer to help! My contact info is in my HN profile. Would love to connect.
Have you considered making a sqlite version that works in Durable Objects? :)
Right now I'm focused on Postgres (biggest market-share for full-stack apps). A sqlite version is definitely possible conceptually.
You're right about the bigger picture, though: Cap'n Web + Typegres (or a "Typesqlite" :) could enable the dream dev stack: a SQL layer in the client that is both sandboxed (via capabilities) and fully-featured (via SQL composability).
> as of this writing, the feature set is not exactly the same between the two. We aim to fix this over time, by adding missing features to both sides until they match.
do you think once the two reach parity, that that parity will remain, or more likely that Cap'n Web will trail cloudflare workers, and if so, by what length of time?
[1] https://github.com/cloudflare/capnweb/tree/main?tab=readme-o...
If anything I'd expect Cap'n Web to run ahead of Workers RPC (as it is already doing, with the new pipeline features) because Cap'n Web's implementation is actually much simpler than Workers'. Cap'n Web will probably be the place where we experiment with new features.
You mention that it’s schemaless as if that’s a good thing. Having a well defined schema is one of the things I like about tRPC and zod. Is there some way that you get the benefits of a schema with less work?
Well, except you don't get runtime type checking with TypeScript, which might be something you really want over RPC. For now I actually suggest using zod for type checks, but my dream is to auto-generate type checks based on the TypeScript types...
(I do wish it could be the other way, though: Write only TypeScript, get runtime checks automatically.)
Although perhaps that's not what you mean.
I found these through this https://github.com/moltar/typescript-runtime-type-benchmarks
// Shared interface declaration:
interface MyApi {
hello(name: string): Promise<string>;
}
// On the client:
let api: RpcStub<MyApi> = newWebSocketRpcSession("wss://example.com/api");
// On the server:
class MyApiServer extends RpcTarget implements MyApi {
hello(name) {
return `Hello, ${name}!`
}
}
But my expectation is you'd use Zod to define all your parameter types. Then you'd define your RpcTarget in plain TypeScript, but for the parameters on each method, reference the Zod-derived types.
But it may be tough to justify when we already have working Cap'n Proto implementations speaking the existing protocol, that took a lot of work to build. Yes, the new implementations will be less work than the original, but it's still a lot of work that is essentially running-in-place.
OTOH, it might make it easier for Cap'n Proto RPC to be implemented in more languages, which might be worth it... idk.
That makes sense. There is some opportunity though since the Cap'n Proto RPC had always lacked a JavaScript RPC implementation. For example, I had always been planning on using the Cap'n Proto OCaml implementation (which had full RPC) and using one of the two mature OCaml->JavaScript frameworks to get a JavaScript implementation. Long story short: Not now, but I'd be interested in seeing if Cap'n Web can be ported to OCaml. I suspect other language communities may be interested. Promise chaining is a killer feature and was (previously) difficult to implement. Aside: Promise chaining is quite undersold on your blog post; it is co-equal to capabilities in my estimation.
https://github.com/capnproto/capnproto/blob/v2/c%2B%2B/src/c...
That's just the RPC state machine -- the serialization is specified elsewhere, and the state machine is actually schema-agnostic. (Schemas are applied at the edges, when messages are actually received from the app or delivered to it.)
This is the Cap'n Web protocol, including serialization details:
https://github.com/cloudflare/capnweb/blob/main/protocol.md
Now, to be fair, Cap'n Proto has a lot of features that Cap'n Web doesn't have yet. But Cap'n Web's high-level design is actually a lot simpler.
Among other things, I merged the concepts of call-return and promise-resolve. (Which, admittedly, CapTP was doing it that way before I even designed Cap'n Proto. It was a complete mistake on my part to turn them into two separate concepts in Cap'n Proto, but it seemed to make sense at the time.)
What I'd like to do is go back and revise the Cap'n Proto protocol to use a similar design under the hood. This would make no visible difference to applications (they'd still use schemas), but the state machine would be much simpler, and easier to port to more languages.
I love the no-copy serialization and object capabilities, but wow, the RPC protocol is incredibly complex, it took me a while to wrap my head around it, and I often had to refer to the C++ implementation to really get it.
I think Cap'n Web could work pretty well in Python and other dynamically-typed languages. Statically-typed would be a bit trickier (again, in the same sense that they are harder to use JSON in), but the answer there might just be to bridge to Cap'n Proto...
I'm confused. How is this a "protocol" if its core premises rely on very specific implementation of concurrency in a very specific language?
There's been a renaissance in the tools, but now we mainly use them like "REST" endpoints with the type signatures of functions. Programming language features like Future and Optional make it easier to clearly delineate properties like "this might take a while" or "this might fail" whereas earlier in RPC, these properties were kind of hidden.
RPC is "remote procedure call", emphasis on "remote", meaning you always necessarily gonna be serializing/deserializing the information over some kind of wire, between discrete/different nodes, with discrete/distinct address spaces
a client request by definition can't include anything that can't be serialized, serialization is the ground truth requirement for any kind of RPC...
a server doesn't provide "an object" in response to a query, it provides "a response payload", which is at most a snapshot of some state it had at the time of the request, it's not as if there is any expectation that this serialized state is gonna be consistent between nodes
Anyway, the point here is that early RPC systems worked by blocking the calling thread while performing the network request, which was obviously a terrible idea.
Some friends and I still jokingly troll each other in the vein of these, interjecting with "When async programming was discovered in 2008...", or "When memory safe compiled languages were invented in 2012..." and so forth.
Often when something is discovered or invented is far less influential[1] than when it jumps on and hype train.
[1] the discovery is very important for historical and epistemological reasons of course, rewriting the past is bad
Meanwhile Go doesn't have async/await and never will because it doesn't need it; it does greenthreading instead. Java has that too now.
Either way, your code waits on IO like before and does other work while it waits. But instead of the kernel doing the context switching, your runtime does something analogous at a higher layer.
The problem is synchronization becomes extremely hard to reason about. With event loop concurrency, each continuation (callback) becomes effectively a transaction, in which you don't need to worry about anything else modifying your state out from under you. That legitimately makes a lot of things easier.
The Cloudflare Workers runtime actually does both: There's a separate thread for each connection, but within each thread there's an event loop to handle all the concurrent stuff relating to that one connection. This works well because connections rarely need to interact with each other's state, but they need to mess with their own state constantly.
(Actually we have now gone further and stacked a custom green-threading implementation on top of this, but that's really a separate story and only a small incremental optimization.)
But one thing I can't figure out: What would be the syntax for promise pipelining, if you aren't using promises to start with?
Oh, great point! That does seem really hard, maybe even intractable. That's definitely a reason to like cooperative concurrency, huh...
Just to tangent even further, but some ideas:
- Do it the ugly way: add an artificial layer of promises in an otherwise pre-emptive, direct-style language. That's just, unfortunately, quite ugly...
- Use a lazy language. Then everything's a promise! Some Haskell optimizations feel kind of like promise pipelining. But I don't really like laziness...
- Use iterator APIs; that's a slightly less artificial way to add layers of promises on top of things, but still weird...
- Punt to the language: build an RPC protocol into the language, and promise pipelining as a guaranteed optimization. Pretty inflexible, and E already tried this...
- Something with choreographic programming and modal-types-for-mobile-code? Such languages explicitly track the "location" of values, and that might be the most natural way to represent ocap promises: a promise is a remote value at some specific location. Unfortunately these languages are all still research projects...
If some other transaction commits at just the wrong time, it could change the result of some of these queries but not all. The results would not be consistent with each other.
Btw if you really want consistent multi reads, some DBMSes support setting a read timestamp, but the common ones don't.
Well...if you implemented a relational DBMS server without using threads. To my knowledge, no such DBMS exists, so the distinction seems rather academic.
> Btw if you really want consistent multi reads, some DBMSes support setting a read timestamp, but the common ones don't.
Could you elaborate? I can't say I heard of that mechanism. Perhaps you are referring to something like Oracle flashback queries or SQL Server temporal tables?
Normally, I'd use MVCC-based "snapshot" transaction isolation for consistency between multiple queries, though they would need to be executed serially.
My mental model is that it's a caller who decides how call should be executed (synchroniously or asynchroniously). Synchronious call is when caller waits till completion/error, asynchronious - is when caller puts the call in the background (whatever it means in that language/context) and handle return results later. CSP concurrency model [1] is the closest fit here.
It's not a property of the function to decide how the caller should deal with it. This frustration was partly described in the viral article "What color is your function?" [2], but my main rant about this concurrency approach is that it doesn't match well how we think and reason about concurrent processes, and requires mental cognitive gymnastics to reason about relatively simple code.
Seeing "async/await/Promises/Futures" being a justification of a "protocol" makes little sense to me. I can totally get that they reimagined how to do RPC with first-class async/await primitives, but that doesn't make it a network "protocol".
[1] https://en.wikipedia.org/wiki/Communicating_sequential_proce...
[2] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...
Is there a structured concurrency library being used to manage the chained promise calls and lazy evaluation (IE when the final promise result is actually awaited) of the chained functions?
If an await call is never added, would function calls continue to build up taking up more and more memory - I imagine the system would return an error and clear out the stack of calls before it became overwhelmed, what would these errors look like if they do indeed exist?
Cap'n Web has no dependencies at all. All the chaining is implemented internally. Arguably, this is the main thing the library does; without promise chaining you could cut out more than half the code.
> If an await call is never added, would function calls continue to build up taking up more and more memory
Yes. I recommend implementing rate limits and/or per-session limits on expensive operations. This isn't something the library can do automatically since it has no real idea how expensive each thing is. Note you can detect when the client has released things by putting disposers on your return values, so you can keep count of the resources the client is holding.
That's why the MessagePort transport is included.
edit: was skimming the github repo https://github.com/cloudflare/capnweb/tree/main?tab=readme-o...
and saw this which answers my question:
> Supports passing functions by reference: If you pass a function over RPC, the recipient receives a "stub". When they call the stub, they actually make an RPC back to you, invoking the function where it was created. This is how bidirectional calling happens: the client passes a callback to the server, and then the server can call it later.
> Similarly, supports passing objects by reference: If a class extends the special marker type RpcTarget, then instances of that class are passed by reference, with method calls calling back to the location where the object was created.
Gonna skim some more to see if i can find some example code.
The part that's most exciting to me is actually the bidirectional calling. Having set this up before via JSON RPC / custom protocol the experience was super "messy" and I'm looking forward to a framework making it all better.
Can't wait to try it out!
OTOH, JSON RPC is extremely simple. Cap'n Web is a relatively complicated and subtle underlying protocol.
Actually the author of JSON RPC suggested that method names could be dynamic, there's nothing in the spec preventing that.
https://groups.google.com/g/json-rpc/c/vOFAhPs_Caw/m/QYdeSp0...
So you could definitely build a cursed object/reference system by packing stuff into method names if you wanted. I doubt any implementations would allow this.
But yes, JSON RPC is very minimal and doesn't really offer much.
Just read about Cap'n Web array .map() [1] -- it's hard to understand where the round-trip is. And that is not a feature, that's a bug -- in reality you want to easily tell what the code does, not hide it.
[1] https://blog.cloudflare.com/capnweb-javascript-rpc-library/#...
You can tell that promise pipelining isn't adding any round trips because you set it all up in a series of statements without any `await`s. At the end you do one `await`. That's your round trip.
Because if I understand correctly, you don't queue the requests and then perform a single request/response cycle (a "round trip"), you send a bunch of requests as they happen with no response expected, then when an await happens, you send a message saying "okay, that's all, please send me the result" and get a response.
In WebSocket mode, yes, you are sending messages with each call. But you're not waiting for anything before sending the next message. It's not a round trip until you await something. As far as round trips are concerned, there is really no difference between sending multiple messages vs. a single batch message, if you are ultimately only waiting for one reply at the end.
Everything appears to be functioning smoothly, but I do miss the ‘transfer’ feature in Comlink. Although it wasn’t a critical feature, it was a nice one.
The best aspect of CapnWeb is that we can reuse most of the code related to clients, servers, and web workers (including Cloudflare Workers).
> Similarly, supports passing objects by reference: If a class extends the special marker type RpcTarget, then instances of that class are passed by reference, with method calls calling back to the location where the object was created.
Can this be relaxed? Having to design the object model ahead of time for RpcTarget is constraining. If we could just attach a ThingClass.prototype[Symbol.for('RpcTarget')] = true then there would be a lot more flexibility, less need to design explciitly for RpcTarget, to use RpcTarget with the objects/classes of 3rd party libraries.
With that said, I do think we ought to support `new RpcStub(myObject)` to explicitly create a stub around an arbitrary class, even if it doesn't extend `RpcTarget`. It would be up to the person writing the `new RpcStub` invocation to verify it's safe.
I'm trying to see if there's something specifically for streaming/generators. I don't think so? Of course you can use callbacks, but you have to implement your own sentinel to mark the end, and other little corner cases. It seems like you can create a callback to an anonymous function, but then the garbage collector probably can't collect that function?
---
I don't see anything about exceptions (though Error objects can be passed through).
---
Looking at array mapping: https://blog.cloudflare.com/capnweb-javascript-rpc-library/#...
I get how it works: remotePromise.map(callback) will invoke the callback to see how it behaves, then make it behave similarly on the server. But it seems awfully fragile... I am assuming something like this would fail (in this case probably silently losing the conditional):
friendsPromise.map(friend => {friend, lastStatus: friend.isBestFriend ? api.getStatus(friend.id) : null})
---The array escape is clever and compact: https://blog.cloudflare.com/capnweb-javascript-rpc-library/#...
---
I think the biggest question I have is: how would I apply this to my boring stateless-HTTP server? I can imagine something where there's a worker that's fairly simple and neutral that the browser connects to, and proxies to my server. But then my server can also get callbacks that it can use to connect back to the browser, and put those callbacks (capability?) into a database or something. Then it can connect to a worker (maybe?) and do server-initiated communication. But that's only good for a session. It has to be rebuilt when the browser network connection is interrupted, or if the browser page is reloaded.
I can imagine building that on top of Cap'n Web, but it feels very complicated and I can equally imagine lots of headaches.
interface Stream extends RpcTarget {
write(chunk): void;
end(): void;
[Symbol.dispose](): void;
}
Note that the dispose method will be called automatically when the caller disposes the stub or when they disconnect the RPC session. The `end()` method is still useful as a way to distinguish a clean end vs. an abort.In any case, you implement this interface, and pass it over the RPC connection. The other side can now call it back to write chunks. Voila, streaming.
That said, getting flow control right is a little tricky here: if you await every `write()`, you won't fully utilize the connection, but if you don't await, you might buffer excessively. You end up wanting to count the number of bytes that aren't acknowledged yet and hold off on further writes if it goes over some threshold. Cap'n Proto actually has built-in features for this, but Cap'n Web does not (yet).
Workers RPC actually supports sending `ReadableStream` and `WritableStream` (JavaScript types) over RPC. I'd like to support that in Cap'n Web, too, but haven't gotten around to it yet. It'd basically work exactly like above, but you get to use the standard types.
---------------------
Exceptions work exactly like you'd expect. If the callee throws an exception, it is serialized, passed back to the caller, and used to reject the promise. The error also propagates to all pipelined calls that derive from the call that threw.
---------------------
The mapper function receives, as its parameter, an `RpcPromise`. So you cannot actually inspect the value, you can only pipeline on it. `friend.isBestFriend ?` won't work, because `friend.isBestFriend` will resolve as another RpcPromise (for the future property). I suppose that'll be considered truthy by JavaScript, so the branch will always evaluate true. But if you're using TypeScript, note that the type system is fully aware that `friend` is type `RpcPromise<Friend>`, so hopefully that helps steer you away from doing any computation on it.
Maybe the best solution is just an eslint plugin. Like this plugin basically warns for the same thing on another type: https://github.com/bensaufley/eslint-plugin-preact-signals
Overloading .map() does feel a bit too clever here, as it has this major difference from Array.map. I'd rather see it as .mapRemote() or something that immediately sticks out.
I can imagine a RpcPromise.filterRemote(func: (p: RPCPromise) => RPCPromise) that only allows filtering on the truthiness of properties; in that case the types really would save someone from confusion.
I guess if the output type of map was something like:
type MapOutput = RpcPromise | MapOutput[] | Record<string, MapOutput>;
map(func: (p: RpcPromise) => MapOutput)
... then you'd catch most cases, because there's no good reason to have any constant/literal value in the return value. Almost every case where there's a non-RpcPromise value is likely some case where a value was calculated in a way that won't work.Though another case occurs to me that might not be caught by any of this:
result = aPromise.map(friend => {...friend, nickname: getNickname(friend.id, userId)})
The spread operator is a pretty natural thing to use in this case, and it probably doesn't work on an RpcPromise?edit: Downvoted, is this a bad question? The title is generically "web servers", obviously the content of the post focuses primarily on TypeScript, but i'm trying to determine if there's something unique about this that means it cannot be implemented in other languages. The serverside DSL execution could be difficult to impl, but as it's not strictly JavaScript i imagine it's not impossible?
* Use Cap'n Proto in your Rust backend. This is what you want in a type-safe language like Rust: generated code based on a well-defined schema.
* We'll build some sort of proxy that, given a Cap'n Proto schema, converts between Cap'n Web and Cap'n Proto. So your frontend can speak Cap'n Web.
But this proxy is just an idea for now. No idea if or when it'll exist.
It's usually best to ignore downvotes. Downvoted comments are noticeably grey. If people feel that's unfair, that'll attract upvotes in my experience.
Fwiw i think it was only once, and i was upvoted after mentioning it. You're right i could have worded it as something more ambiguous, aka "it seems this is unpopular" or w/e, but my edit was in reply to someones feedback (the downvote), so i usually mention it.
No complaint, just a form of wordless-feedback that i was attempting to respond to. Despite such actions being against HN will heh.
That is, the client is not packaging up all its logic and sending a single blob that describes the fully-chained logic to the server on its initial request. Right?
When I first read it, I was thinking it meant 1 client message and 1 server response. But I think "one round trip" more or less message "1 server message in response to potentially many client messages". That's a fair use of "1 RTT", but took me a moment to understand.
Just to make that distinction clear from a different angle, suppose the client were _really_ _really_ slow and it did not send the second promise message to the server until AFTER the server had computed the result for promise1. Would the server have already responded to the client with the result? That would be a way to incur multiple RTTs, albeit the application wouldn't care since it's bottlenecked by the client CPU, not the network in this case.
I realize this is unlikely. I'm just using it to elucidate the system-level guarantee for my understanding.
As always, thanks for sharing this, Kenton!
But the client can send all three messages back-to-back without waiting for any replies from the server. In terms of network communications, it's effectively the same as sending one message.
See "But how do we solve arrays" part:
> > .map() is special. It does not send JavaScript code to the server, but it does send something like "code", restricted to a domain-specific, non-Turing-complete language. The "code" is a list of instructions that the server should carry out for each member of the array
The client sends over separate 3 calls in one message, or one message describing some computation (run this function with the result of this function) and the server responds with one payload.
Cap'n Proto is inspired by ProtoBuf, protobuf has gRPC and gRPC web.
We've been using ProtoBuf/gRPC/gRPC-web both in the backends and for public endpoints powering React / TS UI's, at my last startup. It worked great, particularly with the GCP Kubernetes infrastructure. Basically both API and operational aspects were non-problems. However, navigating the dumpster fire around protobuf, gRPC, gRPC web with the lack of community leadership from Google was a clusterfuck.
This said, I'm a bit at loss with the meaning of schemaless. You can have different approaches wrt schema (see Avro vs ProtoBuf) but otherwise, can't fundamentally eschew schema/types. It's purely information tied to a communication channel that needs to be somewhere, whether that's explicit, implicit, handled by the RCP layer, passed to the type system, or worse all the way to the user/dev. Moreover, schemas tend to evolve and any protocol needs to take that into account.
Historically, ProtoBuf has done a good job managing various tradeoffs, here but had no experience using Capt'n Proto, yet seen mostly good stuff about it, so perhaps I'm just missing something here.
But Cap'n Web itself does not need to know about any of that. Cap'n Web just accepts whatever method call you make, sends it to the other end of the connection, and attempts to deliver it. The protocol itself has no idea if your invocation is valid or not. That's what I mean by "schemaless" -- you don't need to tell Cap'n Web about any schemas.
With that said, I strongly recommend using TypeScript with Cap'n Web. As always, TypeScript schemas are used for build-time type checking, but are then erased before runtime. So Cap'n Web at runtime doesn't know anything about your TypeScript types.
So it's basically Stubby/gRPC.
From strictly a RPC perspective this makes sense (i guess to the same degree gRPC would be agnostic to protobuf serialization scheme, which IIRC is the case (also thinking Stubby was called that for the same reason)).
However, that would mean some there's
1. a ton of responsibility on the user/dev —i.e. the same amount that prompted protobuf to exist, afterall.
You basically have the (independent problem of) clients, servers and data (in fligiht, or even persisted) that get different versions of the schema.
2. a missied implicit compression opportunity? IDK to what extent this actually happens on the fly or not.
Stubby / gRPC do not support object capabilities, though. I know that's not what you meant but I have to call it out because this is a huuuuuuuge difference between Cap'n Proto/Web vs. Stubby/gRPC.
> a ton of responsibility on the user/dev —i.e. the same amount that prompted protobuf to exist, afterall.
In practice, people should use TypeScript to specify their Cap'n Web APIs. For people working in TypeScript to start with, this is much nicer than having to learn a separate schema format. And the protocol evolution / compatibility problem becomes the same as evolving a JavaScript library API with source compatibility, which is well-understood.
> a missied implicit compression opportunity? IDK to what extent this actually happens on the fly or not.
Don't get me wrong, I love binary protocols for their efficiency.
But there are a bunch of benefits to just using JSON under the hood, especially in a browser.
Note that WebSocket in most browsers will automatically negotiate compression, where the compression context is preserved over the whole connection (not just one message at a time), so if you are sending the same property names a lot, they will be compressed out.
I currently work in a place where the server-server API clients are generated based on TypeScript API method return types, and it's.. not great. The reality of this situation quickly devolves the types using "extends" from a lot of internal types that are often difficult to reason about.
I know that it's possible for the ProtoBuf types to also push their tendrils quite deep into business code, but my personal experience has been a lot less frustrating with that than the TypeScript return type being generated into an API client.
I’ve ended up building similar things over and over again. For example, simplifying the worker-page connection in a browser or between chrome extension “background” scripts and content scripts.
There’s a reason many prefer “npm install” on some simple sdk that just wraps an API.
This also reminds me a lot of MCP, especially the bi-directional nature and capability focus.
Tiny remark for @kentonv if you're reading: it looks like you've got the wrong code sample immediately following the text "Putting it together, a code sequence like this".
The code was supposed to be:
let namePromise = api.getMyName();
let result = await api.hello(namePromise);
console.log(result);
> One benefit of GraphQL was to solve the “waterfall” problem of traditional REST APIs by allowing clients to ask for multiple pieces of data in one query. For example, instead of making three sequential HTTP calls:
GET /user
GET /user/friends
GET /user/friends/photos
…you can write one GraphQL query to fetch it all at once.Or you could have designed a schema to allow easy tree traversal. Or you could use a recursive CTE.
Huh that sounds a lot like graphql
Building an operation description from the callback inside the `map` is wild. Does that add much in the way of restrictions programmers need to be careful of? I could imagine branching inside that closure, for example, could make things awkward. Reminiscent of the React hook rules.
So it turns out it's actually not easy to mess up in a map callback. The main thing you have to avoid is side effects that modify stuff outside the callback. If you do that, the effect you'll see is those modifications only get applied once, rather than N times. And any stubs you exfiltrate from the callback simply won't work if called later.
This is like .NET Remoting. Suggest resisting the temptation to use this kind of stuff. It gets very hard to reason about what is going on.
In general, I worry that framework frameworks like this could be horribly complex; breaking bugs (in the framework) might not show up until late in your development cycle. This could mean that you end up having to rewrite your product sooner than you would like, or otherwise "the framework gets in the way" and cripples product development.
Some things that worry me:
1: The way that callbacks are passed through RPC. This requires a lot of complexity in the framework to implement.
2: Callbacks passed through RPC implies server-side state. I didn't read in detail how this is implemented; but server-side state always introduces a lot of complexity in code and hosting.
---
Personally, if that much server-side state is involved, I think it makes more sense to operate more like a dumb terminal and do more HTML rendering on the server. I'm a big fan of how server-side Blazor does this, but that does require drinking C# kool-aide. On the other hand, server-side Blazor is very mature, has major backing, and is built into two IDEs.
(IE, you can write quick-and-dirty pages where the UI directly queries the database. Useful for one-offs, prototypes, internal admin pages, "KISS" applications, ect, ect. IE, any situation where it's okay for the browser UI to be tightly coupled to your data model.)
This is quite interesting. However the abysmal pattern I have seen a number of times is:
list = getList(...) for item in list getItemDetails(item)
Sometimes this is quite hard to undo.
Is the server holding onto some state in memory that this specific client has already authenticated? Or is the API key somehow stored in the new AuthenticatedSession stub on the client side and included in subsequent requests? Or is it something else entirely?
This does mean the server is holding onto state, but remember the state only lasts for the lifetime of the particular connection. (In HTTP batch mode, it's only for the one batch. In WebSocket mode, it's for the lifetime of the WebSocket.)
Thanks for the explanation!
let traverse = (dir) => {
name: dir.name,
files: api.ls(dir).map(traverse) // api.ls() returns [] for files
};
let tree = api.ls("/").map(traverse);
I know that sounds like a cop-out, but this is really true of any protocol, and the RPC protocol itself has no real knowledge of the cost of each operation or how much memory is held, so can't really enforce limits automatically.
The name "Cap'n Proto" came from "capabilities and protobuf". The first, never-released version was based on Protobuf serialization. The first public release (way back on April 1, 2013) had its own, all-new serialization.
There's also a pun with it being a "cerealization protocol" (Cap'n Cruch is a well-known brand of cereal).
That won't prevent me from bragging, as we released Opalang 1.0 in 2011 :)
(but yeah, I was passingly familiar with E! when I designed the capability system of Opalang, so I definitely don't get full bragging rights)
Learn something new every day
Update: Unlike Cap'n Proto, Cap'n Web has no schemas. In fact, it has almost no boilerplate whatsoever. This means it works more like the JavaScript-native RPC system in Cloudflare Workers. https://github.com/cloudflare/capnweb
But my understanding is that neither of them support object-capabilities nor promise pipelining. These are the killer features of Cap'n Web (and Cap'n Proto), which the blog post describes at length.
That said, type checking is called out both in the blog post (in the section on TypeScript) and in the readme (under "Security Considerations"). You probably should use some runtime type checking library, just like you should with traditional JSON inputs.
In the future I'm hoping someone comes up with a way to auto-generate type checks based on TypeScript types.
I'd happily accept a PR adding a CI job to make sure it works and we don't break it.
Is the overhead for calling into WASM too high for a Rust implementation to be feasible?
A Haxe or Dafny implementation have let us generate libraries in multiple languages from the same source.
Also, the audience of this library is very specifically TypeScript developers. If your app is Rust, you'd probably be happier with Cap'n Proto.
Genuinely curious, is the disappointment because it's limited to the JS/TS ecosystem?
My take is that by going all-in on TypeScript, they get a huge advantage: they can skip a separate schema language and use pure TS interfaces as the source of truth for the API.
The moment they need to support multiple languages, they need to introduce a new complex layer (like Protobuf), which forces design into a "lowest common denominator" and loses the advanced TypeScript features that make the approach so powerful in the first place.
That could change with some investments. Haxe is a great toolkit to develop libraries in because it reduces the overhead for each implementation. It would be nice to see some commercial entity invest in Haxe or Dafny (which can also enable verification of the reference implementation).
> The moment they need to support multiple languages, they need to introduce a new complex layer (like Protobuf),
So this just won't be used outside of Node servers then?
Well... I imagine / hope it will be used a lot on Cloudflare Workers, which is not Node-based, it has its own custom runtime.
(I'm the author of Cap'n Web and also the lead developer for Cloudflare Workers.)
What this could've been is a better way to consume external APIs to avoid the SDK boilerplate generation dance. But the primary problems here are access control, potentially malicious clients, and multi-language support, none of which are solved by this system.
In short, if you're working over a network boundary, better keep that explicit. If you want to pretend the network boundary doesn't exist, then let a data sync engine handle the network parts and only write local code. But why would you write code that pretends to be local but is actually over a network boundary? I can't think of a single case where I would want to do that, I'd rather explicitly deal with the network issues so I can clearly see where the boundary is.
[1] https://bytemash.net/posts/i-went-down-the-linear-rabbit-hol...
The promise-passing lazily evaluated approach is also nice -- any debugging woes are solved by just awaiting and logging before the await -- and it solves composability at the server - client layer. The hackiness of `map()` is unfortunate, but that's just how JS is.
However, I don't see this being too useful without there also being composability at the server - database layer. This is notoriously difficult in most databases. I wonder what the authors / others here think about this.
For an example of what I mean
const user = rpc.getUser(id)
const friends = await rpc.getFriends(user)
Sure beats GET /user/id
GET /graph?outbound=id
But, at the end, both cases are running two different SQL queries. Most of the time when we fuse operations in APIs we do it all the way down to the SQL layer (with a join). GET /user/id?include=friends
Which does a join and gets the data in a single query.So while its a nice programming model for sure, I think in practice we'll end up having a `rpc.getUserAndFriends()` anyways.
I'm not that experienced, so I don't know in how many projects composability at just one layer would actually be enough to solve most composability issues. If it's a majority, then great, but if not, then I don't think this is doing much.
One situation where this actually works that comes to mind is SQLite apps, where multiple queries are more or less OK due to lack of network round trip. Or if your DB is colocated with your app in one of the new fancy datacenters where you get DB RAM to app RAM transfer through some crazy fast network interconnect fabric (RDMA) that's quicker than even local disk sometimes.
Even Cap'n Proto and Protobuf is too much for me.
My particular favorite is this. But then I'm biased coz I wrote it haha.
https://github.com/Foundation42/libtuple
No, but seriously, it has some really nice properties. You can embed JSON like maps, arrays and S-Expressions recursively. It doesn't care.
You can stream it incrementally or use it a message framed form.
And the nicest thing is that the encoding is lexicographically sortable.
interface MyService {
method(): ReturnType;
}
We parse the typescript to JSON Schema, then use that to generate runtime validation across both JS implementations and other languages.Typescript is a really nice IDL. I didn't want to hitch our wagon to something else like typespec.io even if that would have given us more things out of the box.
Yeah... I've been deep in this problem space myself. The two big points of friction are: 1. Requiring a build-step to generate runtime code from the TS types 2. TS doesn't officially support compiler transforms that do it
That said, the two most promising approaches I've found so far: 1. https://github.com/GoogleFeud/ts-runtime-checks -- it does exactly what you describe 2. https://arktype.io/ -- a very interesting take on the Zod model, but feels like writing native Typescript
Congrats on the launch, really exciting to see a way to get capabilities into the JS ecosystem!
This tool could actually be totally independent from the RPC implementation.
I don't think it's necessary to bake type checks into the deserialization itself, since the RPC system already doesn't make any assumptions about the payloads it is moving around (other than that they are composed of only the types that the deserialization supports, which is a fixed list).
History has shown that, if you want things to be popular, you should choose the latter, but I think the tide has turned enough that the former could be the right choice now. That's also the reason why we use Typescript instead of JS, so mandatory static typing would definitely fit with the zeitgeist.
Honestly, the boundary is the one place where I wouldn't want to not have types.
I.e. imagine you need this:
promise1 = foo.call1();
promise2 = foo.call2();
promise3 = foo.call2(promise1 + promise2);
Can't implement that "+" there unless... promise1 = foo.call1();
promise2 = foo.call2();
promise3 = foo.add(promise1, promise2)
promise4 = foo.call2(promise3);
You can also make some kind of a RpcNumber object so you can use their Proxy function to do promise1.add(promise2) but ultimately you don't want to write such classes on the spot every time. Or functions on the server for it.The problem is even that won't give you conditions (loops, branches) that run on the server, the server execution is blocked by the client.
Once you realize THAT, you realize it's most optimal if both sides exchange command buffers in general, including batch instructions to remote and local calls and standardized expression syntax and library.
What they did with array.map() is cute but it's not obvious what you can and what you can't do with this, and most developers will end up tripping up every time they use it, both trying to overuse this feature and underusing it, unaware of what it maps, how, when and where.
For example this record replay can't do any (again...) arithmetic, logic, branching and so on. It can record calling method on the Proxy and replaying this on the other side, in simple containers, like an object literal.
This is where GraphQL is better because it's an explicit buffer send and an explicit buffer return. The number of roundtrips and what maps how is not hidden.
GraphQL has its own mess of poorly considered features, but I don't think Cap'n Web survives prolonged contact with reality because of how implicit and magical everything is.
When you make an abstraction like this, it needs to work ALL THE TIME, so you don't have to think about it. If it only works in demo examples written by developers who know exactly when the abstraction breaks, real devs won't touch it.
Why not "application/octet-stream" header and sending ArrayBuffer over the network ?
I'm obviously a huge fan of binary serialization; I wrote Cap'n Proto and Protobuf v2 after all.
But when you're working with pure JS, it's hard to be much faster than the built-in JSON implementation, and even if you can beat it, you're only going to get there with a lot of code, and in a browser code footprint often matters more than runtime speed.