A few months ago I published https://cpu.land (discussion: https://news.ycombinator.com/item?id=37062422). After cpu.land, I felt a lot of pressure to make another Big Giant Thing but didn't really have anything compelling. So I just hacked away on personal projects and, through some coincidental learning on how the Internet works, ended up hacking together a traceroute program that could live stream to a website from scratch!
I realized I had never seen this sort of thing on the web before, and it was actually a kind of cool and novel way of visualizing the structure of the Internet, so I polished it up and built a pretty site around it. In the process, I learned some really interesting things about how BGP and the structure of The Internet, so I melted the traceroute tool with an article sharing that knowledge.
I'm still hacking on this and I'm sure my code will manage to break somehow, so please let me know if you have any suggestions! :)
(Side note: why Rust? I don’t think programming language choice matters that much, but I wanted to quickly write a very dependable low-level program, and I really like Rust’s error handling primitives. Why do you care about this?)
Search for: "looking glass bgp" and you'll find some[1]. One of the first CGI programs I wrote nearly three decades ago (ugh...) was a Perl script that wrapped traceroute and streamed the results via server push[2]. Everything old is new again. :-)
That said, your site has a very nice presentation.
BTW, ipv4 TTL is dejure seconds even though it's defacto hop count since no router takes more than a second and the minimum decrement is 1 (except middleboxes which wish to remain hidden won't decrement at all). Also, Linux/Unix traceroute by default use UDP to a high numbered (and usually closed) port for probe packets instead since UDP historically is less likely to be dropped/filtered than ICMP.
Aside: asking how traceroute works is one of my interview questions, most people don't know (if they do the question is no good) and many are unable to figure it out from first principles no matter how many questions I answer about TCP/IP. I still think being able to figure it out is a reasonable problem solving question.
1. e.g. https://www.bgplookingglass.com/
Curious what type of roles you interview for, are they networking-centric? Iirc this is CCNA-level material, I'd expect anyone working in networking to be able to describe how traceroute works. I've used it more as a smoke test question than a question that most people don't know.
Even seasoned network engineers get it wrong.
Traceroute is simple. Sure, there's lots you can do to enrich the data you receive (e.g, reverse DNS and geolocation), or sending multiple sequences to identify equal cost multipath. But these are not inherent or necessary to perform a traceroute.
And understanding why different protocols exhibit different behavior / observe different metrics, or why some nodes don't send ICMP TTL expired, is important. But that's more in line with what you call "using it to troubleshoot", which is not "how it works."
But "how traceroute works" is simple: First you send a packet with TTL=1, then you send a packet with TTL=2, and so on. That's it, that's how it works.
Some candidates throw up their hands immediately, which makes it a short interview. Some candidates already know, which makes it a useless question and we move on to other things. For everyone else, I think it's a good interview question.
1. *"What is a packet?"* - A packet is a data unit sent over a network, encapsulated within protocols like IP and TCP/UDP. It contains both the payload (actual data) and control information such as source and destination IP addresses. Packets enable efficient routing and reassembly at the destination in network communication.
2. *"What is a router?"* - A router is a device that forwards data packets between networks, operating at the network layer. It uses IP addresses, routing tables, and algorithms to determine the best path for packet forwarding, connecting different network types and managing traffic between internal and external networks.
3. *"How does a packet get to the destination?"* - A packet reaches its destination through routing. It passes through routers that consult their routing tables to forward the packet. The packet traverses multiple networks, with IP protocols guiding it towards the destination, ensuring error checking and congestion handling.
4. *"What if there's a routing loop?"* - In a routing loop, a packet is passed continuously between routers. The Time-To-Live (TTL) field in IP packets prevents endless loops by decrementing each time a packet passes a router. If TTL hits zero before the destination is reached, the packet is discarded to prevent network clogging. Routing protocols also have mechanisms to detect and prevent loops.
Have you never been asked a question in an interview that starts a discussion or has follow up questions? In isolation its not a good question, true for most questions, but to initiate something deeper its good. After the initial explanation of how it works you can get into how you have used it, what kind of issues you have solved with it. Then maybe look at an actual case and give your interpretation of the data. You could get into router hardware architectures, what the control/data planes are, why some drops in the output are not a problem and when they are, ECMP, why bidirectional traceroutes are useful, routing topology, flapping routes, etc.
Not every router will return a tel expired of course, and many ISPs route ipv4 traffic via rfc1918 addresses nowadays , so you can get big gaps in their networks, but that applies whether your outbound packet is icmp, tcp, UDP, or any other type.
Worth remembering that packets with different source and destination ports can route via different paths, so sometimes you need to be aware of the entire ip/port/protocol for src/dst and configuring them properly. Nat can cause problems there too when it changes your source ports.
Here's a link to their videos: https://www.youtube.com/playlist?list=PLO8DR5ZGla8iVN2v3UKkR...
This is the ras' "Troubleshooting with Traceroute" tutorial: https://www.youtube.com/watch?v=WL0ZTcfSvB4&list=PLO8DR5ZGla...
Slides: https://archive.nanog.org/meetings/nanog47/presentations/Sun...
(There are one or two other traceroute tutorials, not sure how different they are from the above, e.g. https://www.youtube.com/watch?v=4dUqVlZ6trU&list=PLO8DR5ZGla... ).
We use the Internet every day. I like folks to have an idea how it works, to be intellectually curious, and to be generally informed about the technology they use.
The traceroute man page explains how it works.
FWIW, my CS degree included a networking class.
...huh. I'm realizing my CS department was pretty weak. but tbf I knew how traceroute worked by high school, so it didn't matter.
thanks! ive been wondering about this for ages!
It just says that outputTTL is (inputTTL - 1). With some exceptions.
[edit: I missed that that RFC is for MPLS but would be interested in your comment anyway; the definitive version seems to be https://datatracker.ietf.org/doc/html/rfc1122]
> The time is measured in units of seconds, but since every module that processes a datagram must decrease the TTL by at least one even if it process the datagram in less than a second, the TTL must be thought of only as an upper bound on the time a datagram may exist.
https://www.rfc-editor.org/rfc/rfc791.html
The equivalent field in IPv6 is named hop limit in recognition of how the TTL field is used in practice with IPv4:
And WOAH, that's really interesting about TTLs. Thanks for sharing! That's awesome and terrifying!
TBF it's neither TCP nor IP, but ICMP :-)
Edit: I meant to write "neither TCP nor UDP" -- but even though I could make the correction I'll leave my error in place.
Huh. TIL. But how would a router know how long a packet took to traverse the hop? The packet doesn't have the information to figure that out … were they expecting people to configure routers to know how far away the previous hop was?
Also >1 second is most of the way to moon. (Yes, yes, speed in a non-vacuum is blah blah and switches blah buffers blah…)
> The time to live is set by the sender to the maximum time the segment is allowed to be in the internetwork system. If the segment is in the internetwork system longer than that the segment should be destroyed. This field should be decreased at each point that the internet header [is] processed to reflect the time spent processing the segment. Even if no local information is available on the time actually spent, the field should be decremented. The time is measured in units of seconds (i.e. the value 1 means one second). Thus the maximum time to live is 255 seconds or 4.25 minutes.
A router would have to track of when each packet enters and leaves and then round to the nearest number of seconds.
- I wonder if you could get more accurate results by using TCP or UDP instead of ICMP. I think traditional traceroute has an option to use UDP, mtr [1] can use TCP or UDP, and tcptraceroute [2] can use TCP.
- This would be a perfect fit for some Talking Heads references. "And you may ask yourself, well, how did I get here?" [3]
[1] https://github.com/traviscross/mtr
[2] https://linux.die.net/man/1/tcptraceroute
[3] https://en.wikipedia.org/wiki/Once_in_a_Lifetime_(Talking_He...
2. Wayy ahead of you, check for HTML comments :))
When a network device is busy, ICMP may be dropped entirely.
ICMP is a great tool to establish baseline connectivity, assuming the device responds to it. TCP will provide more accurate results (or UDP given the device responds to UDP packets) if you know a specific port is open.
traceroute uses UDP by default. tracert.exe only uses ICMP.
Rather what is going on is most network device data planes are going to punt _all_ IP packet's "time-to-live exceeded" case to the management plane, and so it will have the problems you describe (latency, throttling).
So why is ICMP still worse? First of all it misses having the UDP/TCP "port" information that allows it to be flow hashed through different routes. Secondly if the route hits any firewalls, they often have very different configuration for ICMP then they do for the TCP/UDP that they are configured to let through.
In IPv4, ICMP is used to send TTL exceeded messages regardless of what upper level protocol was used in the packet that expired in transit. UDP, TCP, ICMP, doesn’t matter.
The slides you linked to are correct, in general, about router slow path behavior, but that isn’t what makes UDP ping “more reliable”. It’s “more reliable”, theoretically, because UDP is less likely to be subject to indiscriminate filtering than ICMP.
Additionally, generally only ICMP sent to or generated from a router goes through the slow path. An IP packet being sent through a router generally goes through that device’s fast path, regardless of the payload type.
No matter which protocol you use the packet only has one destination, the destination IP, so it does not go through the slow path on any hop on the way just because its ICMP. When a packet hits the TTL limit on a hop it will go through the slow path as the CPU will generate the response ICMP. The protocol used for traceroute makes no difference here.
You can use the -P flag to set the protocol (TCP, UDP, ICMP Other, GRE) and -p to set the port number, if applicable.
Microsoft should have stuck to the BSD traceroute. I don't know why they "invented" tracert.exe (which I believe is _based_ on BSD traceroute, like most of those early network tools in Windows). I wonder if this was a Winsock (or STREAMS[0]!) limitation.
[0] https://web.archive.org/web/20151229084950/http://www.kuro5h...
[0] https://learn.microsoft.com/en-us/windows/win32/winsock/tcp-...
Probes to hops beyond will virtually always go through the fast data plane.
<!-- This is not my beautiful house. -->
<!-- This is not my beautiful wife. -->
The kids are all right. :)(Now if only I could figure out how to enable traceroute to work on each hop from a given workstation through corporate cisco access switch, core switch, BGP tunnel to aws transit gateways, and eventually land at the VPC route table on the EC2 instance, then i might actually be able to call myself a network guy)
- “L3 Switches”, which in essence are routers that do not have discrete ports and instead have closely coupled switch. Alternatively the same thing can be viewed as switch with powerful enough control-plane CPU that it can do routing (pure L2 managed switches usually have ridiculously underpowered CPUs). This can be and often is combined with some amount of offloading the routing and even more complex upper-layer processing into hardware.
- things like Shortest Path Bridging, which uses L3-style routing protocol (IIRC it is IS-IS) in order to build L2 FIB tables for L2 switches. The idea there is to not only have (R)STP-like redundant paths for ethernet, but to use full capacity of such paths when it is available.
On the hardware level it is mostly about accelerating the fast-path, where the hardware FIB contains some cache of bit patterns seen in the received frame, where it should be forwarded and maybe how it should be rewritten. And packets that are not matched by HW FIB are passed to the CPU to be processed. In a “dumb” bridge (ie. L2 switch), such frames would be simply broadcasted to all ports.
While the general idea is the same, there is material difference in the silicon complexity between matching destination MAC of ethernet frame and matching arbitrary bit patterns somewhere in TCP header of IPv6 packet.
Unfortunately so many nodes ignore traceroute packets that it basically said my exit node connected to Linode and then Linode connected to your computer. I have the same experience with forward traceroutes, my router replies, my server replies, and if I'm lucky, one node in my ISP's network. The rest is locked up tight.
traceroute bad.horse
https://itsfoss.com/star-wars-linux/
But when I tested just now it didn't work for me so your milage may vary.
tracepath -m128 bad.horse
works just as well.thanks for the smile to all in this thread from here up!
Keep up the amazing work!
A technical nitpick though: Routes can be asymmetric—going across one path in one direction and another for the opposite. This means that your tool potentially doesn't show the route packets from the user took to reach your server, but rather the route packets took from your server to reach the user. I believe that querying with BGP looking glass tools would allow you to construct the route in either direction, but it is maybe a bit less cool looking than the real-time traceroute that is a result of actual traffic.
I'm posting here because it might be interesting for you. How it was built: https://presentations.clickhouse.com/meetup85/app/index.html
I was reminded of working at a company in 1996... we had windows 95(!) with Trumpet WinSock and a dial-up modem (24k, IIRC). I was just learning how all this stuff worked and fit together. I stumbled on a traceroute screen that would slowly drip out each hop and... it was magical to me. Suddenly realizing the idea of 'a big global network' I'd read about was actually... right at my fingertips, and I could see which computers my traffic was being routed through... that kept me up at night for a while. Not sure I'll say it was life-changing, but it sort of felt like it for a bit at that time :)
The narrative based traceroute in green is something I’ve never seen before. How many providers, like CDNs, did you take the time to map into a narrative?
This feels targeted towards folks who kind of already understand computers. It be cool to repackage this in a way that can show non-technical people the stuff they take for granted. The mountains that move on a user’s behalf under every keystroke is humbling.
One of my favorite books on this topic is Interconnections: Bridges, Routers, Switches, and Interconnections by Radia Perlman if you haven’t come across it.
Easier said than done, but don't feel like you have to provide a constant flux of interesting things. That kind of pressure ends up being toxic pretty quickly. Do what you enjoy, if it hits an audience, great, but don't feel like you have to make it happen.
Total side note, your other post (about cpu.land) is at exactly 1337 upvotes now :D https://share.cleanshot.com/ktVWL2pr
How fitting.
https://news.ycombinator.com/item?id=37062422
I'm 17 and wrote this guide on how CPUs run programs (3 months ago)
Kudos and carry on!
It's actually surprisingly easy to get an ASN for yourself and speak BGP. If you find building something like this tool interesting, you should give it a try. I wrote an introduction of sorts earlier (https://qt.ax/asn) if that interests you.
tl;dr in my experience the networks traversed are usually very similar, and the content is relevant and interesting either way around
https://archive.nanog.org/sites/default/files/traceroute-201...
And yes, traffic is routing different ways from my Cairo office to my UK core --London ->Cairo is direct and still suffering massive loss, Cairo->London is now routing via ntt and seems fine. If they haven't fixed it by tomorrow might have to change some local prefs.
It's actually impossible. Responses are essentially free-form (if the server responds at all). I tried my hand at this; you can make an ad-hoc "parser" that works for 90% of addresses/domains (or you could, ten years ago when I tried). But the remainder are intractable.
Nowadays it's much worse; nearly everything is hidden behind privacy shields, which purport to protect PII. But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.
This is ICANN's doing, I'm afraid. ICANN had a rule that networks should provide public WHOIS servers. They never enforced the rule, and now they've scrapped it.
Doesn't a whois have to include email, phone number and physical address? For a company that's not really PII, but I don't understand how it wouldn't be considered personal information in the whois for my personal website.
Not everything runs an RDAP server, though; I do wish ICANN/IANA or whoever would enforce that.
> Nowadays it's much worse; nearly everything is hidden behind privacy shields, which purport to protect PII. But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.
Network operator info can also be PII. My info is PII, but I have a domain name, so putting my info into WHOIS is putting PII into WHOIS.
The privacy guard just forwards everything to me, minus spam.
(If it's a corporation, I don't think there's a good reason to permit privacy guards. But not all domains are owned by BigCos, yet.)
RDAP has the benefit of being JSON, but even then it’s a reaaally crappy format. For example, contacts are represented by the jCard pseudo-standard, which is a JSON version of vCard, and it’s completely awful and hard to deal with. Basically instead of a nice JSON object, it’s arrays in arrays in arrays…
RDAP should get better in the future versions, but I’m not sure registrars will follow in good faith because the initial specs were a bit of a shit show.
ccTLDs don't. Proof by counter-example: .co, .io.
I don't know if that can be enforced, but it'd make my life easier.
(Technically, too, AIUI, it's the registries that run the RDAP servers, not the registrars.)
Yeah, the JSON is a bit of a crap-shoot, but I think it's maybe marginally better than trying to parse raw WHOIS text…? IDK. Probably depends on the exact datum you want to pull out of it.
Could generative AI help out these days? "Here's a whois, give me [the info I want]:"
https://research.cs.washington.edu/networking/astronomy/reve...
paper: http://www.cs.washington.edu/homes/ethan/papers/reverse_trac...
If you think about how IP works, you’ll see that this doesn’t particularly matter but that it can make understanding the routing more difficult.
Boise State University, and the University of Idaho are two schools at opposite ends of the state of Idaho. UIdaho in the north is close to Spokane, and almost all of its connectivity comes from Seattle. Boise is closer to Salt Lake so most of its connectivity comes via Portland or Salt Lake City. The middle of the state between the two schools is mountains and very, very little large scale connectivity at all, except there was a small line way bad in the day because the UofIdaho had remote classrooms in the southern part of the state. Sometime in the late 90's a network engineer from BSU and one from UofI realized that they both had switches and routing kit in the same building so they ran an ethernet cable between them.
The effect was catastrophic. It turns out that both networks happily started announcing BPG to each other, which in turn announced the connection to the internet as a whole. Suddenly there was a very short jump between networks in Seattle and networks in Salt Lake City. That poor little t1 (iirc) was absolutely getting saturated. But, interestingly only in one direction. See Boise announced the route, but Idaho didn't so the traffic was effectively only failing in one direction.
Needless to say the cable as disconnected and years later when I worked at the UofIdaho it was still well known that the two networks shouldn't ever be connected again! (Which was ironic because I was working on a program to setup I2 at both universities)
I'm working on it right now and hopefully will be working better soon! In the meantime I've increased timeouts so loading will be longer but it should work better.
This article from APNIC explains more about mtr and how to read it (plus some interesting details about how MPLS can obscure true paths)
https://blog.apnic.net/2022/03/28/how-to-properly-interpret-....
Also worth noting: It's also sometimes useful to trace with UDP, and many routers will drop ICMP selectively under strain.
Nice article, and excellent presentation!
I was wondering if we'd address this. That was my first thought - how can you do this without initiating ICMP from my side?
> Does running a “reverse traceroute” sacrifice accuracy? A little, actually.
> As I said when describing Internet routing, each device a packet traverses makes a decision about where to send the packet next until it reaches its final destination. If you send a packet in the other direction, the devices might make different routing decisions… and if one device makes one different decision, the rest of the path will certainly be different.
> This reverse traceroute is still helpful. The paths will be roughly the same, likely differing only in terms of which specific routers see your packet.
Sure... But it's pretty common for multi pathed AS' to traverse in all sorts of different ways. My experience (non residential) is that more often than not, the trace and reverse trace were different. Your upstreams and my upstreams have very different commercial agreements, and both peer and transit in multiple places.
Still cool though, well done!
How did you manage to tilt the section header text? I've not seen that done before.
transform: rotate(-2deg) skew(-2deg);
transform-origin: bottom left;
9 a23-203-147-39.deploy.static.akamaitechnologies.com (23.203.147.39) 36.707 ms 36.783 ms 40.110 ms 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * *
I announce only IPv6, because I don't currently have access to any v4 blocks, they are expensive and I have little need for one.
Then one packet will get all the echoes in one go instead of having to send a tirade of packets with increasing TTL values.
If you care about time rather than packet count, you can send packets with all reasonable TTL values at once.
Just use a better client. Takes about 3 seconds to do an mtr -b over 15 responding hops from a server in London to something like 43.249.179.0 in the south pacific.
Makes me imagine an online programming textbook that could to walk you through what your own custom code is doing. Very cool!
I guess it's supposed to do something like this: https://dnschecker.org/online-traceroute.php
A Practical Guide to (Correctly) Troubleshooting with Traceroute by Richard A Steenbergen explains it well
Holy shit. This girl's going places. I just skimmed https://kognise.dev and saw that in addition to the deep understanding of TCP/IP and all 7 layers of the OSI model she appears to posses, she also does front- and back-end development, embedded hardware, mobile apps, and compilers. She also rock climbs, can pilot a Cessna (all by herself), build robots, plays (and composes music for) the cello (since she was 5 years old apparently).
Do I need to keep going? This is nothing short of incredible. If I did 1/10 of the things this kid's already done by the time I kick the bucket I would have lived a full life.
.text h3 {
margin-top: 60px;
margin-bottom: -4px;
transform: rotate(-2deg) skew(-2deg);
transform-origin: bottom left;
}
Can I trace the location of an AS?