A steam locomotive from 1993 broke my yarn test
165 points
1 day ago
| 18 comments
| blog.cloudflare.com
| HN
bouke
1 day ago
[-]
So the real problem is that Jest just executes to whatever `sl` resolves. The fix they intent to release doesn't address that, but it tries to recognise the train steaming through. How is this acceptable behaviour from a test runner, as it looks like a disaster to happen. What if I have `alias sl=rm -rf /`, as one typically wants to have such a command close at hand?
reply
tlb
1 day ago
[-]
Exec doesn't know about shell aliases. Only what's in the $PATH.

I liked the shell in MPW (Mac Programmer's Workshop, pre-NeXT) where common commands had both long names and short ones. You'd type the short ones at the prompt, but use the long, unambiguous ones in scripts.

reply
Kwpolska
1 day ago
[-]
PowerShell has long commands and short aliases, but the aliases can still shadow executables, e.g. the `sc` alias for `Set-Content` shadows `sc.exe` for configuring services. And you only notice when you see no output and weird text files in the current working directory.
reply
szszrk
1 day ago
[-]
Networking crowd probably think it's obvious. Because of things like Cisco cli, or even Mikrotik. Or "ip" cli as well, I guess.

I never bothered to check what's the origin of that pattern.

reply
hnlmorg
1 day ago
[-]
Ive taken entire web farms offline due to an unexpected expansion of a command on a Cisco load balancer.

The command in question was:

    administer-all-port-shutdown 
(Or something to that effect —it’s been many years now)

And so I went to log in via serial port (like I said, *many years ago so this device didn’t have SSH), didn’t get the prompt I was expecting. So typed the user name again:

    admin
And shortly afterwards all of our alarms started going off.

The worst part of the story is that this happened twice before I realised what I’d done!

I still maintain that the full command is a stupid name if it means a phrase as common as “admin” can turn your load balancer off. But I also learned a few valuable lessons about being more careful when running commands on Cisco gear.

reply
skykooler
1 day ago
[-]
Theoretically you could do this in Linux by calling /usr/bin/sl or whatever - but since various distros put binaries in different places, that would probably cause more problems than it could solve.
reply
Tractor8626
1 day ago
[-]
No. This is not the real problem. There is nothing you can do if your 'bash', 'ls', 'cat', 'grep', etc do something they not supposed to do.

Proper error handling would be helpful though.

reply
Etheryte
1 day ago
[-]
The fact that Jest blindly calls whatever binary is installed as `sl` is downright reckless and that's an understatement. If they need the check, a simple way to avoid the problem would be to install it as a dependency, call `require.resolve()` [0] and Bob's your uncle. If they don't want the bundle size, write a heuristic, surely Meta can afford it. Blindly stuffing strings into exec and hoping it works out is not fine.

[0] https://nodejs.org/api/modules.html#requireresolverequest-op...

reply
Joker_vD
1 day ago
[-]
"That's just, like, your opinion, man". There is another school of thought that postulates that an app should use whatever tools that exist in the ambient environment that the user has provided the app with, instead of pulling and using random 4th-party dependencies from who knows where. If I symlinked e.g. "find", or "python3", or "sh", or "sl" to my weird interceptor/preprocessor/trapper script, that most likely means that I do want the apps to use it, damn it, not their own homebrewed versions.

> a simple way to avoid the problem would be to install it as a dependency

I've seen once a Makefile that had "apt remove -y [libraries and tools that somehow confuse this Makefile] ; apt install -y [some other random crap]" as a pre-install step, I kid you not. Thankfully, I didn't run it with "sudo make" (as the README suggested) but holy shit, the presumptuousness of some people.

The better way would have been to have "Sapling CLI" explicitly declared as a dependency, and checked for, somehow. But as the whole history of dev experience shows, that's too much ask from the people, and the dev containers are, sadly, the sanest and most robust way to go.

reply
Etheryte
1 day ago
[-]
I think where our opinions differ is what boundaries this logic should cross. When I'm in Bash-land, I'm happy that my Bash-isms use the rest of what's available in the Bash env. When I'm in Node, likewise, as this is an expected and desirable outcome. Where this doesn't sit right with me is when a Node-land script crosses this boundary and starts murking around with things from a different domain.

In general, I would want everything to work by the principle of least surprise, so Node stuff interacts with Node dependencies, Python does Python things, Bash does Bash env, etc. If I need one to interact with the other, I want to be explicit about it, not have some spooky action at a distance.

reply
Joker_vD
1 day ago
[-]
Completely understandable, it's just... it's just not in the cards. A large part of UNIX ecosystem has not, historically, been kind to this view. Remember autotools/autoconf, makefiles with DESTDIR, and all that similar jazz? People genuinely proposed that stuff as the solution for the management of ambient dependencies. And it takes just one slip up of "shelling out" (hopefully it's actually "forking off", not literally shelling out) for all kinds of funny business re-appearing again — and don't even start on the /lib and .so management.
reply
blueflow
1 day ago
[-]
What else should the test runner do?
reply
pavel_lishin
1 day ago
[-]
There must be a better way to tell if a repo is a Sapling repo than by running some arbitrary binary, right?
reply
Symbiote
1 day ago
[-]
For Git one could look for .git/config. There must be something equivalent.
reply
remram
12 hours ago
[-]
.git will be a file and not a directory, if you are in a submodule or a worktree.

You just illustrated why trying to assume the functionality of a third-party app instead of calling it creates even more bugs.

reply
pasc1878
1 day ago
[-]
Use the full path of sl and not rely on $PATH in the same way cron and macOS GUI apps do for I assume this exact reason.
reply
stonegray
1 day ago
[-]
Is the full path guaranteed? For example homebrew, snap, and apt might put it all in different places. $PATH is a useful tool.
reply
pasc1878
1 day ago
[-]
But not in this case where you have two executables with the same name.

You have to know where the tool was installed or else be certain no other sl is on your path.

reply
Joker_vD
1 day ago
[-]
How would knowing the full path help you anyway? It's either in "/usr/bin/sl", or "/usr/local/bin", or "~/.local/bin", now what?

By the way, believe it or not, POSIX compliance requires existence of only two directories (/dev and /tmp) and three files (/dev/console, /dev/null, and /dev/tty) on the system; everything else is completely optional, including existence of /bin, /etc, and /usr.

reply
pasc1878
23 hours ago
[-]
Because you know what you installed and so which sl to use.
reply
Joker_vD
23 hours ago
[-]
But the sl is not invoked by you. It is invoked by some npm module (a 5-times-removed dependency from any side) which hopes that either there is "sl" in the $PATH and it is the Sapling CLI, or there is no "sl" in the $PATH. This module can't use absolute paths because it does not know how the end user's system looks.
reply
pasc1878
18 hours ago
[-]
In that case it is a large security risk as well as it does not work as per the article
reply
Joker_vD
24 minutes ago
[-]
A program invoking some other the program that the user themself consciously have installed on their system (and put into the PATH) is not a security risk per se, it's literally UNIX Way™ working as intended.
reply
skipants
1 day ago
[-]
What if the full path is just `/usr/bin/sl`?
reply
pasc1878
1 day ago
[-]
Then yopu get the sl there which could be correct.
reply
charcircuit
1 day ago
[-]
Finding the full path of sl requires looking at $PATH
reply
pasc1878
1 day ago
[-]
In this case not as then you find the wrong sl - you need to know where the correct sl was installed.
reply
GTP
1 day ago
[-]
Just from the title, I suspected that Steam Locomotive had something to do with it. So I quickly glanced through the article up to the point where the locomotive shows up. Sometimes there's the idea hanging in my mind to make a version called Slow Locomotive, where the train slows down every time you press ctrl-c.
reply
dullcrisp
1 day ago
[-]
If you press ^Z does it stop entirely?

And do these sorts of ideas ever get you into trouble?

reply
throwanem
1 day ago
[-]
I once reimplemented in Perl Nethack's logic for phase-of-moon and Friday 13th computation and notification, and added the resulting cute little script to the root .profile on our consulting firm's main web hosting boxes.

I didn't get fired when my boss found it by surprise a couple months (and lunar cycles) later, but I did learn a valuable lesson about how one may wisely limit one's exercise of whimsy.

Google took a few years more to achieve the same discovery, as I recall, but presumably this has to do with pedagogical methods involving not as many ex-sergeants.

reply
GTP
21 hours ago
[-]
> If you press ^Z does it stop entirely?

Great idea, if I ever end up doing this I will steal it :D

> And do these sorts of ideas ever get you into trouble?

Not so far, but an idea is never a problem in itself. The problem can be the context. I don't see any issue in publishing a project like this on GitHub, while I see how I could get in trouble if I install it on a corporate server.

reply
fifticon
1 day ago
[-]
as a 30+y employed systems programmer, when I read a story like this, I get angry at the highly piled brittle system,not at the guy having sl installed. I am aware there exists a third option of not getting angry in the first place, but I hate opaque nonrobust crap. This smells like everything I hate about front-end tooling. ignorance and arrogance in perfect balance.
reply
ericmcer
1 day ago
[-]
What would you have done differently? They were dependent on SL (which is a facebook source control system written in C) but the user had overwritten the expected path with a shell script. That is not something most engineers would build around... "what if the user is overwriting the path to dependencies with nonsense shell scripts?".

It doesn't feel like something that is entirely the Jest maintainers fault, I am not sure why Jest needs a source control system but there are probably decent reasons.

Like if I overwrite `ls` to a shell script that deletes everything on my desktop and then I execute code you wrote that relies on `ls` are you to blame because you didn't validate its behavior before calling it?

reply
MD87
1 day ago
[-]
The difference is that `ls` is specified in POSIX and everyone has roughly the same expectations of what it does.

Nothing specifies what a binary called `sl` does. The user didn't "overwrite" anything. They just had an `sl` binary that was not the `sl` binary Jest expects. Arguably they had the more commonly known binary with that name.

reply
mmlb
1 day ago
[-]
Use the lessons learned from those before us in less heterogeneous days, aka inspect the binaries you're going to call out to for fitness. Things like "check if grep is gnu or bsd" or "check if sl is sapling or steamlocomotive".

I've done that a bit to deal with macos crippled bash for example.

reply
ploxiln
1 day ago
[-]
jest (or whatever was trying to auto-detect a "sapling" repo) should take explicit configuration to enable "sapling" or "mercurial" or whatever integration. And not try to run "sl" 16+ times in various modules/threads trying to auto-detect it.

"automagic" things trying to be easy and helpful is really a significant source of my stress fixing software these days.

reply
sixothree
1 day ago
[-]
I hate to say it but choosing to name something sl in the first place is about as arrogant as you can get. I just can’t understand the world in which sl was an acceptable name to use much less an acceptable executable to have a dependency on.
reply
Tractor8626
1 day ago
[-]
Totally happens in C code too. Maybe even more often.

Just today had proxmox not working because of invalid localhost line in /etc/hosts. Or had problem with logging in KDE because /etc/shadow was owned by root.

In both cases only incomprehensible error messages. Luckily solutions was googleable.

reply
salmonellaeater
1 day ago
[-]
A useful error message would have made this a 1-minute investigation. The "fix" of trying to detect this specific program is much too narrow. The right fix is to change Yarn to print a message about what it was trying to do (check for a Sapling repo) and what happened instead. This is also likely a systemic problem, so a good engineer would go through the whole program and fix other places that need it.
reply
burnte
1 day ago
[-]
I discovered SL in 1999, and forgot about it. I rediscovered it 5 years later when on my personal server I typoed ls as sl and hit enter. A steam locomotive drove across my screen, and I remembered installing it 5 years later and laughed by butt off. I wound up pranking myself and it took 5 years to pay off!
reply
pjc50
1 day ago
[-]
Plus points for using strace. It's one of those debugging tools everyone know about for emergencies that can't be solved at a higher level, and a great convenience of using Linux. The Windows ETW system is much harder to use, and I'm not sure if it's even possible at all under OSX security.
reply
throwway120385
1 day ago
[-]
I have solved an incredible number of problems just by looking at strace output very carefully. Strace combined with Wireshark or Tcpdump are incredible as a toolset for capturing what a program is doing and for capturing what the effect is either on the USB or the NIC.
reply
frizlab
1 day ago
[-]
macOS has dtrace which is actually nicer to use. Cannot be used on all processes when SIP is on though.
reply
pjc50
1 day ago
[-]
Last time I tried SIP prevented me from using it on my own processes, but I may have been holding it wrong.
reply
dontlaugh
1 day ago
[-]
macOS’s Solaris-inspired dtrace is actually nicer, especially the UI.
reply
pjc50
1 day ago
[-]
Is there a guide for how to use this, including the UI, with SIP on?
reply
jntun
1 day ago
[-]
Instruments is implemented under-the-hood with dtrace, that could be what they are referring to.
reply
dontlaugh
1 day ago
[-]
Yes. Most things run well with Instruments attached. I’ve only used the dtrace cli a few times.
reply
mrguyorama
1 day ago
[-]
The chrome folks built https://randomascii.wordpress.com/2015/04/14/uiforetw-window... to improve ETW usability.

You usually don't need that full industrial level tracing though on Windows! Process Monitor is 95% of the solution for most people, and provides very similar functionality to strace, if a lot easier to read.

reply
snovymgodym
1 day ago
[-]
The real story here is that the author and his coworker wasted a bunch of time tracking down this bug because their dev environment was badly set up.

> his system (MacOS) is not affected at all versus mine (Linux)

> nvm use v20 didn't fix it

If you are writing something like NodeJS, 99% of the time it will only ever be deployed server-side on Linux, most likely in a container.

As such, your dev environment should include a dev dockerfile and all of your work should be done from that container. This also has the added benefit of marginally sandboxing the thousands of mystery-meat NPM packages that you will no doubt be downloading from the rest of your machine.

There is zero reason to even mess with a "works on my machine" or a "try a different node version" situation on this kind of NodeJS project. Figure out your dependencies, codify them in your container definition, and move on. Oh, your tests work on MacOS? Great, it could not matter less because you're not deploying there.

Honestly, kind of shocking that a company like Cloudflare wouldn't have more standard development practices in place.

reply
bilekas
1 day ago
[-]
>If you are writing something like NodeJS, 99% of the time it will only ever be deployed server-side on Linux, most likely in a container.

I'm really curious where you're getting this impression from ? I for one never run docker containers on my dualcore atom server with 4gb ram.. but i have a lot of node services running.

> There is zero reason to even mess with a "works on my machine" or a "try a different node version" situation on this kind of NodeJS project

There are a lot of reasons to investigate these things, infact that's what I would expect from a larger more industry invoved companies, knowing the finer nuances and details of these things can be important. What might seem benign can just as quickly become something really dangerous or important when working on a huge scale such as CloudFlare.

Edit : BTW I do agree mistakes were made, and the hell that is NPM chain of delivery attacks is terrifying. Those are the points I would focus on more personally.

reply
snovymgodym
1 day ago
[-]
> I'm really curious where you're getting this impression from?

Experience mainly, though perhaps I live in a bubble. My "99%" assertion was more pointed at the "server-side on Linux" part than the "most likely in a container" part.

Really the point I wanted to make was that your development and test environment should be the same as, or as close as possible to, your production environment.

If your app is going to be deployed on Red Hat Enterprise Linux (whether in a container, VM, or baremetal), then don't bother chasing down cryptic NPM errors that arise when you run it on Ubuntu, Mac, or Windows. Just run everything out of a RHEL docker container which mimics your production environment and spent your limited time doing the actual task at hand. It simply is not worth your time to rabbit hole endlessly on NPM errors that happen on an environment you'll never deploy to.

> There are a lot of reasons to investigate these things, ...

Sure, I don't really disagree with that and generally it's good to have a solid understanding of your tools and what lies in the layers below the abstractions that you normally work with. The detective work in the post is solid.

But the thing is that the author was supposed to be learning NodeJS in order to ramp up on a React project. But he got derailed (heh) by this side quest which delayed him being able to do the actual work he set out to do. Whether or not it was worth the time is subjective. But either way, it would not have happened in the first place with better dev environment practices.

reply
bilekas
1 day ago
[-]
> Really the point I wanted to make was that your development and test environment should be the same as, or as close as possible to, your production environment.

I’m really glad to hear that actually, I think you did make that point but it was a bit overlooked with the other points.

About having better Dev environments I think you're also spot on, not just with infrastructure but also with support from other maybe more experienced developers who could identify these things early and knowledge share, for me at least that's one of the main development requirements, if you're not learning, you should be teaching.

reply
throwanem
1 day ago
[-]
The last time I dealt with a non-dockerized Node deployment, at work or at home, was in 2013. That this was also the year of Docker's initial release is no coincidence at all.
reply
bilekas
1 day ago
[-]
I think for production it’s a good move, it just doesn’t feel like a sure assumption that the majority of node services are containerized.
reply
throwanem
1 day ago
[-]
Well, the argument is more that the vast majority of Node services should be containerized, because the potentially large benefit of so doing outweighs the relatively small cost. I can't speak to anyone's assumptions, but I can say I'm inclined to support this argument because my professional experience for many years has been that containerization causes far fewer problems than it solves.
reply
Kwpolska
1 day ago
[-]
Naming your source control tool after a common mistyping of ls is such a Facebook move.
reply
m4rtink
1 day ago
[-]
Yeah! What are they going to do next - call a programming language "go" or something ? Even Google would not be that stupid - imaging Googling for that and getting only irrelevant stuff!
reply
12345hn6789
1 day ago
[-]
Go slice array differences golang
reply
computerfriend
1 day ago
[-]
Naming it after a commonly installed program that has been around since 1993 is also some hubris.
reply
mrguyorama
1 day ago
[-]
The reality is that most devs writing code in Facebook were not alive in 93, and certainly weren't Linux admins at that time.

Does Facebook even have any greybeards in the trenches?

reply
wrs
1 day ago
[-]
I had a similar problem where builds were timing out. When I looked at the build log, there was a calendar in it (?!). I eventually figured out a script was calling `date`, and something I had `go install`ed (I think) had a test binary called `date` that was an interactive calendar.
reply
rossdavidh
1 day ago
[-]
I demonstrated that I am not a serious or good programmer by installing steam locomotive on my Linux laptop immediately after reading this.
reply
normie3000
1 day ago
[-]
> git commit, which hooked into yarn test

There's the real wtf. How are you meant to commit a failing test? Or any other kind of work in progress?

reply
zdragnar
1 day ago
[-]
You mark the failing test with "failing". The test runner knows that it might fail but doesn't fail the suite.

I'm not a big fan of git commit hooks, but it can give faster feedback than waiting for a CI runner to point out something that should have been obvious if you keep it light weight (such as style linting or compiler warnings).

Edit: replaced "Todo" with "failing" since we're talking about jest specifically: https://jestjs.io/docs/api#testfailingname-fn-timeout

reply
computerfriend
1 day ago
[-]

    git commit -n
reply
normie3000
22 hours ago
[-]
Aha! I have a new alias!
reply
jokoon
1 day ago
[-]
I thought a real steam locomotive was passing next to a data center and crashed the server because of the vibrations of the train.
reply
rrauenza
1 day ago
[-]
I'm trying to recall -- wasn't there someone who had a similar issue with a game? Maybe a (pun not intended) Steam game? They'd try to run their game and something else would launch? Or vice versa?
reply
sureglymop
1 day ago
[-]
Relatable debugging, though after 2 tries I would have moved straight to strace/truss.

Edit: okay I continued reading and that was actually the next step. :)

reply
mzs
1 day ago
[-]
reply
zitterbewegung
1 day ago
[-]
If you were troubleshooting this and I know what I’m saying is with 20/20 hindsight why wouldn’t you try to test this on someone else’s machine to see if it is an environment issue ? They seemed to get use extensive analysis at that point. Also I’ve seen Jenkins deployments that have test runners that would run JS unit tests.
reply
polygot
1 day ago
[-]
Would dev containers solve this issue?
reply
tyzoid
1 day ago
[-]
Most likely, yes. Then it wouldn't have mattered that the `sl` package was installed.
reply
WalterBright
1 day ago
[-]
Not about steam locomotives. Disappointed.
reply