If you want a specific question to answer, answer this: why does PTP need hardware timestamping to achieve high precision (where the network card itself assigns timestamps to packets, rather than having the kernel do it as part of TCP/IP processing)? If we use software timestamps, why can we do microsecond precision at best? If you understand this, it goes a very long way to understanding the core ideas behind precise clock sync.
Once you have a solid understanding of PTP, look into White Rabbit. They’re able to sync two clocks with sub-ns precision. In case that isn’t obvious, that is absolutely insane.
[1] So do a lot of people. For example audio engineers. Once, an audio engineer absolutely talked my ear off about ptp. I had no idea that audio people understood clock sync so well but they do!
Indeed. PTP (various, not-necessarily compatible, versions) is at the core of modern ethernet-based audio networking: Dante (proprietary, PTP: IEEE 1588 v1), AVB (IEEE standard, PTP: 802.1AS), AES67 (AES standard, PTP: IEEE 1588 v2). And now the scope of the AVB protocol stack has been expanded to TSN for industrial and automotive time sensitive network applications.
Sadly, they're generally just a bit too expensive for me to justify it as a toy.
I don't work in trading (though not for lack of trying on my end), so most of the stuff I work on has been a lot more about "logical clocks", which are cool in their own right, but I have always wondered how much more efficient we could be if we had nanosecond-level precision to guarantee that locks are almost always uncontested.
[1] I'm not talking about those clocks that radio to Colorado or Greenwich, I mean the relatively small ones that you can buy that run locally.
This is only true if you use wall clock time as part of your database’s consistency algorithm. Generally I think this is a huge mistake. It’s almost always much easier to swap to a logical clock - which doesn’t care about wall time. And then you don’t have to worry about ntp.
The basic idea is this: event A happened before event B iff A (or something that happened after A) was observed by the node that generated B before B was generated. As a result, you end up with a dag of events - kind of like git. Some events aren’t ordered relative to one another. (We say, they happened concurrently). If you ever need a global order for all events, you can deterministically pick an arbitrary order for concurrent events by comparing ids or something. And this will give you a total order that will be the same on all peers.
If you make database events work like this, time is a little more complex. (It’s a graph traversal rather than simple numbers). But as a result the system clock doesn’t matter. No need to worry about atomic clocks, skew, drift, monotonicity, and all of that junk. It massively simplifies your system design.
Also I still remember having fun with the "Determine the order of events by saving a tuple containing monotonic time and a strictly monotonically increasing integer as follows" part.
My take on this is that second timing is close enough for this. And that all my internal systems need agree on the time. So if I'm off by 200ms or some blather from the rest of the world, I'm not overly concerned. I am concerned, however, if a random internal system is not synced to my own ntp servers.
This doesn't mean I don't keep our servers synced, just that being off by some manner of ms doesn't bother me inordinately. And when it comes to timing of events, yes, auto-increment IDs or some such are easier to deal with.
Many years later, in 2020, I ended up living in San Francisco, and I had the fortune to meet Leslie Lamport after I sent him a cold email. Lovely and smart guy. This is the text of the first part of that email, just for your curiosity:
Hey Leslie!
You have accompanied me for more than 20 years. I first met your name when studying Lamport timestamps.
And then on, and on, and on, up to a few minutes ago, when I realized that you are also behind the paper and the title of "Byzantine Generals problem", renamed after the "Albanian" generals to the suggestion of Jack Goldberg. Who is he? [1]
[0]: https://en.wikipedia.org/wiki/Lamport_timestamp
[1]: Jack Goldberg (now retired) was a computer scientist and Lamport's manager at SRI.
This post is about more complicated synchronization for more demanding applications. And it's very good. I'm just marveling at how in my lifetime I from "no clock is ever set right" to assuming most anything was within a second of true time.
Once you get to internationa phones, you'll have places where the phone does not include all timezones and specifically is missing the actual local timezone, so automatic sync is typically disabled so that the time can be set so that the displayed time matches local time... even if that means the system time is not correct.
Yes, but imagine your local time is US Pacific time, but you have a phone intended to be sold in Mexico, so your phone only has Mexico time zones and MX Pacific Time has no DST. During part of the year, you can use automatic time sync, but during the summer, you disable automatic sync and set the clock so that the time displayed matches local time. Your epoch time is now an hour ahead of properly synched devices, but whatevs, your phone shows the right time and that's what counts.
I don't think civilian clock synchronization was an issue since a long time ago.
DCF77 and WWVB has been around for more than 50 years. You could use some cheap electronics and get well below millisecond accuracy. GPS has been fully operational for 30 years, but it needs more expensive device.
I suspect you could even get below 1 sec accuracy using a watch with a hacking movement and listening to radio broadcast of time beeps / pips.
The first manufactured GPS clock I owned (as in: switch it on and time is shown on a dedicated display) was in a 2007 Honda.
But a firmware bug ruined that clock: https://didhondafixtheclocks.com/
And even after it began displaying the right time again, it had the wrong date. It was offset by years and years, which was OK-ish, but also by several months.
Having the date offset by months caused the HVAC to behave in strange incurable ways because it expected the sun to be in positions where it was not.
But NTP? NTP has never been fickle for me, even in the intermittently-connected dialup days I experienced ~30 years ago: If I can get to the network occasionally, then I can connect to a few NTP servers and keep a local clock reasonably-accurate.
NTP has been resolutely awesome for me.
If I ever get the chance, I'll try to remember to tell the 1995 version of me to watch out for that pesky overflow bug that they might experience with NTP -- two score and 1 year in their future.
Sadly I think the actual antenna and hardware were relatively large since it's a long wave signal, but maybe with SDR it'll all fit on the head of a pin these days.
Unfortunately there's no real way to cheat physics as far as shrinking a wavelength goes. With RF antennas about the best you can do is a major dimension 1/10th the frequency of interest.
The wavelength is around 3.8km...
Probably DCF77 or WWVB.
> I think the actual antenna and hardware were relatively large since it's a long wave signal
Casio has some normal sized wristwatches that synchronizes to DCF77, it would definitely fit into a stove, microwave, or basically anything.
That’s the radical developer simplicity promised by TrueTime mentioned in the article.
What TrueTime says is that clocks are synchronized within some delta just like NTP, but that delta is significantly smaller thanks to GPS time sync. That enables applications to have tighter bounds on waiting to see if a conflict may exist before committing which is why Spanner is fast. CockroachDB works similarly but given the logistical challenge of getting GPS receivers into data centers, they worked to achieve a smaller delta through better NTP-like timestamps and generally get fairly close performance.
https://programmingappliedai.substack.com/p/what-is-true-tim...
> Bounded Uncertainty: TrueTime provides a time interval, [earliest, latest], rather than a single timestamp. This interval represents the possible range of the current time with bounded uncertainty. The uncertainty is caused by clock drift, synchronization delays, and other factors in distributed systems.
This is the part I was referring to. You cannot just compare timestamps and know which happened first. You have to actually handle the case where you don’t know if there’s a happens before relationship between the timestamps. Thats a very important distinction
> External consistency states that Spanner executes transactions in a manner that is indistinguishable from a system in which the transactions are executed serially, and furthermore, that the serial order is consistent with the order in which transactions can be observed to commit. Because the timestamps generated for transactions correspond to the serial order, if any client sees a transaction T2 start to commit after another transaction T1 finishes, the system will assign a timestamp to T2 that is higher than T1's timestamp.
Of course there is always the edge case where two commits have the same commit timestamp. Therefore from the perspective of Spanner, they happen simultaneously and there is no way to determine which happens first. But there is no need to. There is no causality relationship between them. If you insist, you can arbitrarily assign a happens-before relationship in your own code and nothing will break.
If everyone is synced to +/- 20ns, that's great. Then when someone flies over your datacenter with an GPS jammer (purposeful or accidental), this needs to not be a bad day where suddenly database transactions happen out of order, or you have an outage.
The other benefit of building in this uncertainty to the underlying software design is you don't have to have your entire infrastructure on the same hardware stack. If you have one datacenter that's 20yrs old, has no GPS infrastructure, and operates purely on NTP - this can still run the same software, just much more slowly. You might even keep some of this around for testing - and now you have ongoing data showing what will happen to your distributed system if GPS were to go away in a chunk of the world for some sustained period of time.
And in a brighter future, if we're able to synchronize everyone's clocks to +/- 1ns, the intervals just get smaller and we see improved performance without having to rethink the entire design.
Most NTP/PTP appliances have internal clocks that are OCXO or rubidium that have holdover (even for several days).
If time is that important to you then you'll have them, plus perhaps some fibre connections to other sites that are hopefully out of range of the jamming.
I guess it's not inconceivable that eventually there's a global clock network using a White-Rabbit-like protocol over dedicated fibre. But if you have to worry about GPS jamming you probably have to worry about undersea cable cutting too.
TrueTime is the software algorithm for managing the timestamps. It’s agnostic to the accuracy of the underlying time source. If it was inaccurate then you get looser bounds and as you note higher latency. Google already does everything you suggest for TrueTime while also having atomic clocks in places.
* NTP pool server usage requires using DNS
* people have DNSSEC setup, which requires accurate time or it fails
So if your clock is off, you cannot lookup NTP pool servers via DNS, and therefore cannot set your clock.
This sheer stupid has been discussed with package maintainers of major distros, with ntpsec, and the result is a mere shrug. Often, the answer is "but doesn't your device have a battery backed clock?", which is quite unhelpful. Many devices (routers, IOT devices, small boards, or older machines, etc) don't have a battery backed clock, or alternatively the battery may just have died.
Beyond that, the ntpsec codebase has a horrible bug where if DNS is not available when ntpsec starts, pool server addresses are never, ever retried. So if you have a complete power-fail in a datacentre rack, and your firewalls take a little longer to boot than your machines, you'll have to manually restart ntpsec to even get it to ever sync.
When discussing this bug the ntpsec lads were confused that DNS might not exist at times.
Long story short, make sure you aren't using DNS in any capacity, in NTP configs, and most especially in ntpsec configs.
One good source is just using the IPs provided by NIST. Pool servers may seem fine, but I'd trust IPs assigned to NIST to exist longer than any DNS anyhow. EG, for decades.
I don’t think so, I think it is solved in the general sense. However what Spanner does is unique, and it does use synchronised clocks in order to do it.
However, Spanner does not solve the inter-continental acid database with high write throughput. So I don’t see it as ground breaking. CRDT’s are interesting, I’ve followed your work for a long time, but too constrained to solve this general problem I think.
As a teacher I love the way Judah Levine explains
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time...
A regular pulse is emitted from a specialized high-precision device, possibly over a specialized high-precision network.
Enables picosecond accuracy (or at least sub-nano).
I worked on the NTP infra for a very large organization some time ago and the starriest thing I found was just how bad some of the clocks were on 'commodity hardware' but this just added a new parameter for triaging hardware for manufacturer replacement.
This is an ok article but it's just so very superficial. It goes too wide for such a deep subject matter.
In particular I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics, who never took intro to computer engineering.
Yeah. I was a physics major and it really helped to have had my naive assumptions about time and clocks completely demolished early on by taking classes in special and general relatively. When I eventually found my way into tech a lot of distributed systems concepts that are difficult to other people (clock sync, indeterminate ordering of events, consensus) came quite naturally because of all that early training.
I think it's no accident that distributed systems theory guru Leslie Lamport had written an unpublished book on General Relativity before he wrote the famous Time, Clocks and the Ordering of Events in a Distributed System paper and the Paxos paper. In the former in particular the analogy to special relatively is quite plain to see.
you buy the hardware, plug it all in, and it works
It's to the point timing server vendors I've spoken to have their own test labs where they have to validate network gear and then publish lists of recommended and tested configurations.
Even some older cards where you'd think the PTP issues would be solved still have weird driver quirks in Linux!
Hot take: I've seen this and enough other badly configured time sync settings that I want to ban system time from robotics systems - time from startup only! If you want to know what the real world time was for a piece of data after, write what your epoch is once you have a time sync, and add epoch+start time.
But it doesn’t have to be the first requirement you relax.
https://www.usenix.org/system/files/conference/nsdi18/nsdi18...
But I just watched/listened to a Richard Feynmann talk on the nature of time and clocks and the futility of "synchronizing" clocks. So I'm chuckling a bit. In the general sense, I mean. Yes yes, for practical purposes in the same reference frame on earth, it's difficult but there's hope. Now, in general ... synchronizing two clocks is ... meaningless?
Alice and Bob, in different reference frames, both witness events C and D occurring. Alice says C happened before D. Bob says D happened before C. They're both correct. (And good luck synchronizing your watches, Alice and Bob!)
But when you are moving you may see very closely spaced events in different order, because you’re moving toward Carol but at an angle to Doug. Versus someone else moving toward Doug at an angle to Carol.
It's a little trickier to imagine introducing cause-and-effect though. (Alice sees that C caused D to happen, Bob sees that D caused C to happen).
I think a "light cone" is the thought-experiment to look up here.
In special relativity, time is relative and when things actually happened can be different in different frames. Casually linked events are always really in the same order. But disconnected events can be seen in different orders depending on speed of observer.
What are "disconnected events"? In a subtle but still real sense, are not all events causally linked? e.g. gravitationally, magnetically, subatomically or quantumly?
I can understand that our simple minds and computational abilities lead us to consider events "far away" from each other as "disconnected" for practical reasons. But are they really not causally connected in a subtle way?
There are pieces of space time that are clearly, obviously causally connected to each other. And there are far away regions of the universe that are, practically speaking, causally disconnected from things "around here". But wouldn't these causally disjoint regions overlap with each other, stringing together a chain of causality from anywhere to anywhere?
Or is there a complete vacuum of insulation between some truly disconnected events that don't overlap with any other observational light cone or frame of reference at all?
However if they were both triggered by a binary black hole merger, then they're dependent events but not on each other.
But I think the general discussion is more of a 'Han shot first' sort. One intelligent system reacting to an action of another intelligent system, and not being able to discern as a person from a different reference frame as to who started it and who reacted. So I suppose when we have relativistic duels we will have to preserve the role of the 'second' as a witness to the events. Or we will have to just shrug and find something else to worry about.
I think you might be confusing events that have some history between them, and those are influence each other. Like say right now, Martian rover sends message to Earth and Earth sends message to them, those aren't causally connected cause don't know about the other message until light speed delay has passed.
Yes.
> stringing together a chain of causality from anywhere to anywhere?
No? Causality reaching one edge of a sphere doesn't mean it instantaneously teleports to every point in that same sphere. This isn't a transitive relationship.
> What are "disconnected events"?
The sentence you're responding to seems like a decent definition. Disconnected events are events which might be observed in either order depending on the position of an observer.
We shouldn’t impose a universal timeline just because some future operation might depend on some past one. Dependencies should be explicit and local: if two operations interact, they share a causal scope; if they don’t, they shouldn’t pay the cost of coordination.
For starters, the spacetime interval between two events IS a Lorentz invariant quantity. That could probably be used to establish a universal order for timelike separations between events. I suspect that you could use a reference clock, like a pulsar or something to act as an event against which to measure the spacetime interval to other events, and use that for ordering. Any events separated by a light-like interval are essentially simultaneous to all observers under that measure.
The problem comes for events with a space like or light like separation. In that case, the spacetime interval is still conserved, but I’m not sure how you assign order to them. Perhaps the same system works without modification, but I’m not sure.
The best approach, imho, is to abandon the concept of a global time. All timestamps are wrt a specific clock. That clock will skew at a rate that varies with time. You can, hopefully, rely on any particular clock being monotonous!
My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time. The fewer stops the better.
I kinda don’t like PTP. Too complicated and requires specialized hardware.
This article only touches on one class of timesync. An entirely separate class is timesync within a device. Your phone is a highly distributed compute system with many chips each of which has their own independent clock source. It’s a pain in the ass.
You also have local timesync across devices such as wearables or robotics. Connecting to a PTP system with GPS and atomic clocks is not ideal (or necessary).
TicSync is cool and useful. https://sci-hub.se/10.1109/icra.2011.5980112
In my view the specialised hardware is just a way to get more accurate transmission and arrival timestamps. That's useful whether or not you use PTP.
> My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time.
This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock. I like the concept, but obviously it adds complexity to end-stations beyond what PTP requires (i.e. increases the hardware cost of embedded implementations). Such a system would also need to model the network topology, or control routing (as PTP does), because packets traversing different routes to the same host will experience different delay and jitter statistics.
> TicSync is cool
I hadn't seen this before, but I have implemented similar convex-hull based methods for clock recovery. I agree this is obviously a good approach. Thanks for sharing.
Well, it requires having the conversion function for each edge in the traversed path. And such function needs to exist only at the location(s) performing the conversion.
> obviously it adds complexity to end-stations beyond what PTP requires
If you have PTP and it works then stick with it. If you’re trying to timesync a network of wearable devices then you don’t have PTP stamping hardware.
> because packets traversing different routes
Fair callout. It’s probably a more useful model for less internty use cases. Of which there are many!
For example when trying to timesync a collection of different sensors on different devices/microcontrollers.
Roboticists like CanBus and Ethercat. But even that is kinda overkill imho. TicSync can get you tens of microseconds of precision in user space.
?????
I run PTP on everything from RPI's to you name it, over fiber, ethernet, etc.
The main thing hardware gives is filtration of PTP packets or hardware timestamping.
Neither is actually required, though some software has decided to require it.
Additionally, something like 99% of sold gigabit or better chipsets since 2012 support it (I210 et al)
How much regularity? If you sent PTP packets with 5 milliseconds of randomness in the scheduling, does that cause real problems? It's still going to have an accurate timestamp.
> instruct the host operating system to disable high-exit-latency sleep features
Why, though? You didn't explain this. As long as the packet got timestamped when it arrived, the CPU can ask the NIC how many nanoseconds ago that was, and correct for how long it was asleep. Right?
I'm sorry, this is just moving the goalposts.
You said "It can't achieve better-than-NTP results without disabling PCI power saving features and deep CPU sleep states."
This is flat wrong, as pointed out.
Now you are pedantically arguing that some NIC's that do PTP hardware timestamping might also use a feature that some operating systems might respect.
That's a very far cry from "It can't achieve better-than-NTP results without disabling PCI power saving features and deep CPU sleep states".
In most cases, people would just say "hey i was wrong about that but there are cases that i think matter where it falls down".
In multicast IP mode, with multiple switches, it requires what anything running multicast between switches/etc would require (IE some form of IGMP snopping or multicast routing or .....)
In unicast IP mode, it requires nothing from your network.
Therefore, i have no idea what it means to "require support on the network".
I have used both ethernet and multicast PTP across a complete mishmash of brands and types and medias of switches, computers, etc, with no issues.
The only thing that "support" might improve is more accurate path delay data through transparent clocks. If both master and slave do accurate hardware timestamping already, and the path between them is constant, it is easily possible to get +-50 nanoseconds without any transparent clock support.
Here is the stats from a random embedded device running PTP i just accessed a second ago:
Reference ID : 50545030 (PTP0)
Stratum : 1
Ref time (UTC) : Sun Dec 28 02:47:25 2025
System time : 0.000000029 seconds slow of NTP time
Last offset : -0.000000042 seconds
RMS offset : 0.000000034 seconds
Frequency : 8.110 ppm slow
Residual freq : -0.000 ppm
Skew : 0.003 ppm
So this embedded ARM device, which is not special in any way, is maintaining time +-35ns of the grandmaster, and currently 30ns of GPS time.The card does not have an embedded hardware PTP clock, but it does do hardware timestamp and filtering.
This grandmaster is an RPI with an intel chipset on it and the PPS input pin being used to discipline the chipset's clock. It stays within +-2ns (usually +-1ns) of GPS time.
Obviously, holdover sucks, but not the point :)
This qualifies as better-than-NTP for sure, and this setup has no network support. No transparent clocks, etc. These machines have multiple media transitions involved (fiber->ethernet), etc.
The main thing transparent clock support provides in practice is dealing with highly variable delay. Either from mode of transport, number of packet processors in between your nodes, etc. Something that causes the delay to be hard to account for.
The ethernet packet processing in ethernet mode is being handled in hardware by the switches and basically all network cards. IP variants would probably be hardware assisted but not fully offloaded on all cards, and just ignored on switches (assuming they are not really routers in disguise).
The hardware timestamping is being done in the card (and the vast majority of ethernet cards have supported PTP harware timestamping for >1 decade at this point), and works perfectly fine with deep CPU sleep states.
Some don't do hardware filtering so they essentially are processing more packets that necessary but .....
> Here’s a video of me explaining this.
Do you need a video? Do we need a 42 minute video to explain this?
I generally agree with Feynman on this stuff. We let explanations be far more complex than they need to be for most things, and it makes the hunt for accidental complexity harder because everything looks almost as complex as the problems that need more study to divine what is actually going on there.
For Spanner to be useful they needed a high transaction rate and in a distributed system that requires very tight grace periods for First Writer Wins. Tighter than you can achieve with NTP or system clocks. That’s it. That’s why they invented a new clock.
Google puts it this way:
Under external consistency, the system behaves as if all transactions run sequentially, even though Spanner actually runs them across multiple servers (and possibly in multiple datacenters) for higher performance and availability.
But that’s a bit thick for people who don’t spend weeks or years thinking about distributed systems.