I've been doing reliability for most of my career, and have always been able to hide behind, "We're not a bank, if we lose a few requests it doesn't matter". They can't do that. :)
One advantage that they have is that the market closes, so they can do maintenance that takes the whole system down, but when you're running a global consumer product, it's a lot harder to do that without pushback.
So for most of us, our stress is around zero downtime maintenance, and theirs is around never dropping a request when the system is live.
There are multiple layers of controls and manual interventions and things, which while absolutely painful, slow, expensive and shitstorm-conjuring -- are ultimately the final authority on some failures.
For e.g, in payments -- every single settlement or clearing anomaly is looked at by a real human, and rectified/rebooked manually.
So, yeah, the stakes can be really high when you have a couple billion in memory on your server, but -- it's just a system.
And it will fail, and we plan for it to do so.
I've always said that with infinite money we could get 100% uptime, but no one has infinite money. Trading firms are about as close as I can imagine to infinite money though.
I work with a major one and, being honest, from day one it was obvious they were incompetent. They employ a huge number of engineers and are unable to deliver basic features at any reasonable pace. Not even remotely close to it either (as in: you ask them to do something, they say yes, execs say yes, you get a deadline, date comes...deployment difficulties, environment not working, run around goes on and on forever).
I remember the CEO got on a call with us at the start and was slapping himself on the back saying they had no downtime...because they were able to do maintenance when markets shut (and have heard very bad things about how that goes). But it is 24/7 world now, our service is up 24/7 and, of course, this led to massive issues in time due to the very different expectations around delivery/quality. Our execs were impressed, our engineers said this was a bad sign. And, ofc, it transpired that they were total amateurs (to be clear, this is one of the biggest exchanges in the world) and were unable to deliver.
To come back to my original statement: there is a company of 16 people total who is, from the point of view of customers, delivering features faster. It is difficult to understate how insane that is.
It depends what you mean by easy. Even if you are using a slow chain, you still have to compete for finite block space, you still have to work out how to risk/matching fast, etc.
With chains built for exchange use, operating them easier, that is why they don't require thousands of engineers. But the actual technical capability of the system is significantly in excess of tradfi exchanges. For example, risk function is real-time on-chain as opposed to EoD settlement. This significantly changes the possible feature set. Once you have built it, it is very easy...the question is why big exchanges rely so heavily on eod processes? The answer is: they are bad at engineering.
Very few people want the financial system to be a contractual suicide pact - they want it to be predictable, but when the unpredictable happens - they want the retail and institutional investor to be protected (the HFT players can go beat each other up - no one will really cry about them). And unpredictable can be anything from a power event taking out multiple exchanges in the NJ triangle (Sandy hurricane) to a cyber-attack (never happened yet) to a flash-crash driven by algorithms from multiple HFT driving each other nuts (happened at least once).
So, it is not EOD processes as such, but the ability to pause, assess the entire system holistically, and then correct it before it blows up the portfolios of everyone holding a 401k. So even though the exchanges _could_ got to 24/7 trading, I'd be surprised if we just went away from cyclical 24-hr based windows of settlement.
I don’t think you have made a case for anything yet.
I have a really hard time believing something decentralized will surpass the the physical limitations of speed of light and low level assembler from C++ optimizations without any GC
also the fact that hyperliquid sequencing of orders is opaque and not opensource, and there is indeed latency in the consensus, I cannot believe yet there are p99 stability in completed transactions
If done right, it would be a complete separate system. Separate IP addresses and all.
I image you'd have to use shadow execution, where you roll out a full second copy, run every transaction through both, and compare the results. And then, only after a certain time, switch traffic to the new infra and tear down the old.
But you would need a ton of extra hardware (more than double) and a lot of ways to keep data in sync. And of course if you put an LLM or other non-deterministic system in there, that's a whole other can of worms.
Like I said, a fun problem to solve. :)
I couldn’t do it. I like infra and all but it’s just not my cup of tea. Def true that in a trading pov the trade must be executed. It must settle. It must work. Or capital flight will be huge.
Isn't the plan more like 23/5 like is already the case for several markets?
I can't see the standard sessions moving more 9:30am/4pm weekdays to 24/7. I take it they'd still let, at least, one hour off for technical reasons.
If I'm not mistaken it's the reason several markets are 23/5 and not 24/5: that one hour of downtime is basically for servers/maintenance right? (maybe someone can chime in)
P.S: I take it technically there's 24/7 trading already seen that cryptocurrencies exchanges are opened 24/7 (I'm not sure: but I think that's the case) but I don't think those do anywhere near the volume of, say, options trading on equities during standard sessions (40 Gbit/s with peak over 70 Gbit/s for the full options feed).
Every so often a new stock is listed or a stock ticker is changed or a stock is split, etc. There are smaller changes every single day, like to the settlement date of your trade.
It's very convenient to be able to restart all your systems at 5pm, have them all load the updated reference data, and start them again in time for 6pm (or 7pm, or 4am tomorrow...). Even if you trade stocks and options and currencies and futures all over the world, a quirk of the calendar means they're basically all closed between 4 and 5pm Chicago time.
Of course it's possible in principle to build systems where all this is dynamic and you can seamlessly trade with the old configuration at 4:59:59.999 and start trading the new one a millisecond later. But literally everyone has built systems that don't work on this, that rely on being able to chunk the continuous passage of time into discrete days. It would be painful to rearchitect them all now.
I have heard similar talks from Shopify and such back in the day, about their own product, but always love listening to more.
The clickbait title of "billions of dollars a day" is nothing to praise.
I’ve met some exceptional people: top researchers from top universities from several fields, super well paid engineers working on products you probably use, some of the best hackers an advanced persistent threat actor could ask for; they’re just people.
I think if you get a collection of competent, thoughtful people together they would come up with similar solutions to the problems discussed in this talk.