While we use Hocuspocus at Tiptap as the collaboration backend for our cloud services, it also works with any Yjs client (Slate, Quill, Monaco, ProseMirror, or your own setup), and Yjs documents aren't limited to text at all. You can sync any structured data through them, and in the meantime we see projects that rely on Hocuspocus without using the Tiptap editor.
We released Hocuspocus v4 under the MIT license a few weeks ago, and the biggest change is that it's no longer tied to Node. The previous versions depended on the ws package, which meant you couldn't run Hocuspocus on Bun, Deno, or Cloudflare Workers. We moved to crossws, a universal websocket adapter, so the same server now runs on Node, Bun, Deno, Cloudflare Workers, and Node with uWebSockets. That also lets you run collaboration at the edge.
The other changes are smaller but are important if you're using Hocuspocus in production:
1. Every core class and hook payload takes a generic Context type now, so the auth/session shape you build in onAuthenticate flows through every other hook with full type safety (defaults to any so existing code doesn't break).
2. Document updates are now processed sequentially per connection through an internal queue, which fixes a correctness bug where async hooks could cause CRDT updates to apply out of order under load.
3. Transaction origins are structured objects now with a source field instead of raw values and there's an isTransactionOrigin() helper for narrowing.
4. Hook payloads use web-standard Request and Headers instead of Node's IncomingMessage.
5. The wire protocol is backward compatible in both directions, so you can roll out servers and providers independently.
If you want to test Hocuspocus: npm install @hocuspocus/server @hocuspocus/provider
Docs at: https://tiptap.dev/docs/hocuspocus
Source at: https://github.com/ueberdosis/hocuspocus
Because running real-time collaboration on Workers or Durable Objects is new in v4, that's the use case we'd most like to hear your questions and feedback on.
So, say a data-privacy conscious prospect is interested a click up from the editor, considers the service, and pokes around. Can't find anywhere clarifying how you cannot even if you are ordered to by warrant see a customer's documents content. You have a sample app for legal; that type of client is going to care about this.
Also not readily seeing how security or auth actually works. Requests over TLS are sufficient for the "end to end military grade encryption" type marketing claims; every site with HTTPS or an S3-type storage can make the same claims about encryption in motion and encryption at rest. That relies on transport and provider. It's more interesting if the content is encrypted against you as the provider, like Apple's Advanced Data Protection for iCloud-stored content (e.g. Messages, Reminders, Bookmarks, iCloud Drive, Notes, Voice Memos…).
Any time a SaaS is asking a firm to keep all their documents on or run them through the SaaS, the data protection story should be stronger than this present security page.
Even Cybersecurity & Infrastructure Security Agency (CISA) might randomly write passwords into a notes document…
Alternatively, say HIPAA and etc. shouldn't be on it yet, and talk about when that is on the roadmap. But security story is generally best when baked into design from start.
1) Materializing documents. Assuming you don't have "live" yjs documents and you only merge diffs with diffUpdate, when one or more user are connected, it's always worth to have the blob in RAM to quickly merge diffs in it and save it periodically; when the usages of a document go away, you save it for the last time and you "ice" it in long term storage, offloading from RAM. I typically use a LRU cache for that. The problem is when too many users are working on too many docs and they all have to fit in RAM. How do you solve that?
2) GC. Again, assuming you don't have live documents but you only merge diffs, those blobs need to be garbage collected to compact them after a while iirc (if the doc is live it's done automatically). This normally is a periodic process that eventually GCs all documents in turn, one after the other. If you handle that, how do you manage to not make your server essentially unpredictable when it comes to compacting big blobs? GC'ing takes a toll on your CPU, and not GC-ing takes a toll on your RAM and secondary storage.