Beyond Ctrl-C: The dark corners of Unix signal handling
165 points
4 months ago
| 4 comments
| sunshowers.io
| HN
chrsig
4 months ago
[-]
My favorite signal surprise was running nginx and/or httpd in the foreground and wondering why on earth it quit whenver i resized the window.

Turns out, they use SIGWINCH (which is sent on WINdow CHange) for graceful shutdown.

It's a silly, silly problem.

reply
eadmund
4 months ago
[-]
> Turns out, they use SIGWINCH (which is sent on WINdow CHange) for graceful shutdown.

That’s … that’s even worse than people who send errors with an HTTP 200 response code.

reply
aunderscored
4 months ago
[-]
Disagree. Annoyingly there is a reasonable case for 200 but with an error, if http is your transport but not your application, then 200 says "yes, the message was transfered and understood correctly, here is your response" which may be an error response from the application
reply
eadmund
4 months ago
[-]
If you’re using HTTP for something other than transferring hypertext — i.e., if your application is not a hypermedia application — then you are doing something just as wrong as encoding IP in DNS packets or email messages. Don’t do that. It’s wrong, even if it is technically interesting.

If, OTOH, your application is a hypermedia application, then returning a success status for errors is just wrong.

reply
aunderscored
4 months ago
[-]
Every JSON API under the sun disagrees, but I do agree in principle. People very much like using HTTP as a JSON (or XML) transfer protocol
reply
sunshowers
4 months ago
[-]
This ship sailed the day the first HTTP proxy was installed, and likely well before that.
reply
andreyvit
4 months ago
[-]
Sorry, what? HTTP is perfectly fine for APIs which are not hypermedia.
reply
Izkata
4 months ago
[-]
For example: Apache (httpd) replaces the 4xx and 5xx response body with its own content instead of whatever you'd returned from an external handler like wsgi. You have to use a 2xx (except for 204) to get a relevant error message back out.
reply
AdieuToLogic
4 months ago
[-]
> For example: Apache (httpd) replaces the 4xx and 5xx response body with its own content instead of whatever you'd returned from an external handler like wsgi.

This is the default behavior. Apache httpd can be configured to produce different responses by way of ErrorDocument[0]. From the documentation:

  Customized error responses can be defined for any HTTP
  status code designated as an error condition - that is,
  any 4xx or 5xx status.
HTH

0 - https://httpd.apache.org/docs/trunk/custom-error.html

reply
jjnoakes
4 months ago
[-]
Even with custom error documents configured in the web server, you still lose the application-specific (and probably request- and error-specific) message generated by the application itself.
reply
Izkata
4 months ago
[-]
Yeah, this is how we ran across it - whoever originally wrote a particular feature was trying to do the right thing by using an HTTP error code, but with a message that would be presented to the user about why that operation failed. A generic response wouldn't work, there were multiple possible reasons all fixable by the user, and tying a whole error code to one specific feature would've probably been a bad idea anyway.
reply
Groxx
4 months ago
[-]
Which is why "you resized the terminal window, clearly you meant to shut down this web server" is even crazier, yes
reply
aunderscored
4 months ago
[-]
Indeed. That is particularly good at violating the principle of least astonishment
reply
thezilch
4 months ago
[-]
That's ... not what most people are doing. People send _application_ errors on HTTP 200 response codes, because HTTP response codes are for HTTP and not applications. Most "REST" libraries and webdev get this wrong, building ever more fragile web services.
reply
ChocolateGod
4 months ago
[-]
Applications using status codes is useful because it can tell browsers and load balancers to not cache the page in a uniform way.
reply
sunshowers
4 months ago
[-]
I don't think the distinction is as clear-cut as you're making it out to be.

For example, HTTP 409 Conflict generally means an application-level conflict (e.g. an optimistic concurrency mechanism detected a conflict).

HTTP 422 Unprocessable Entity is also usually an application-level error (e.g. hash validation failure, or identifier not recognized by the server).

reply
LoganDark
4 months ago
[-]
Task failed successfully
reply
chrsig
4 months ago
[-]
y'know...what really is an error, anyway?
reply
thebruce87m
4 months ago
[-]
For what is an error, if not a success at failing?
reply
chrsig
4 months ago
[-]
Exactly. Gotta be happy you got a response at all!
reply
AStonesThrow
4 months ago
[-]
In my day, successful commands output nothing at all, so it would seem that a blank page is the only truly error-free result.
reply
thayne
4 months ago
[-]
Why? That's what SIGTERM is for.
reply
chrsig
4 months ago
[-]
No clue what the decision making process was.

There's a bug report for httpd dating back to 2011[0]. The nginx mailling list also has a grumpy person contemporary with that[1].

My guess is someone thought "httpd is a server running somewhere without a monitor attached, why on earth would it get a SIGWINCH!? surely it's available to use for something completely different", not considering users running it in the foreground during development. Nginx probably followed suit for convention, but that's pure speculation on my part.

Also that was before docker really took off (I'm not sure if it was around in 2011 yet; still in it's infancy maybe). Running it in the foreground didn't happen as much yet. People were still using wamp or installing it via apt and restarting via sudo.

[0] https://bz.apache.org/bugzilla/show_bug.cgi?id=50669

[1] https://mailman.nginx.org/pipermail/nginx/2011-August/028640...

reply
hulitu
4 months ago
[-]
> why on earth would it get a SIGWINCH!?

Reminds me of those "/* not reached */" stories.

reply
lolinder
4 months ago
[-]
They use SIGWINCH for gracefully shutting down workers but not the main process [0]. SIGQUIT is used for a graceful shutdown and SIGTERM for a sort of graceful shutdown (with timeouts).

SIGWINCH is apparently used for an online upgrade [1]. Because it only shuts the workers down you can quickly transition back to the old binary and old configuration if there's a problem, even after upgrading the binary or config stored on the hard drive.

I'm sure there are other ways to get a similar capability, but this set of signals is apparently what they came up with.

[0] http://nginx.org/en/docs/dev/development_guide.html#processe...

[1] https://www.digitalocean.com/community/tutorials/how-to-upgr...

reply
ibash
4 months ago
[-]
I tried to find out why.

Unfortunately the change that introduces it predates the official release by a few months. And predates the mailing list by about a year:

https://trac.nginx.org/nginx/changeset/5238e93961a189c13eeff...

reply
chrsig
4 months ago
[-]
ok, I found a commit in 2005, coming about because linuxthreads was interfering with the SIGUSR1 signal.

It looks like they wound up making it platform specific, so BSDs and unix like operating systems might still use SIGUSR1.

https://github.com/apache/httpd/commit/395896ae8d19bbea10f82...

reply
ykonstant
4 months ago
[-]
I don't know whether to laugh or cry.
reply
chrsig
4 months ago
[-]
definitely laugh! life's too short, you'll never get out alive :)
reply
layer8
4 months ago
[-]
> Another common extension is to use what is sometimes called a double Ctrl-C pattern. The first time the user hits Ctrl-C, you attempt to shut down the database cleanly, but the second time you encounter it, you give up and exit immediately.

This is a terrible behavior, because users tend to hit Ctrl-C multiple times without intending anything different than on a single hit (not to mention bouncing key mechanics and short key repeat delays). Unclean exits should be reserved for SIGQUIT (Ctrl-\) and SIGKILL (by definition).

reply
tripdout
4 months ago
[-]
If you don't know about it, sure, but I find it's kind of convenient to get a safe shutdown and then be able to easily say "I don't care, just stop this program" without needing a separate kill -9 command or something.
reply
wombatpm
4 months ago
[-]
Kids these day. Try resetting server windows on a sgi.

Subject: -42- How can I restart the X server? Date: 10 Sep 1995 00:00:01 EST

  To restart the X server (Xsgi) once, do any one of the following
  (in increasing order of brutality):

  - killall -TERM Xsgi
  - hold down the left-Control, left-Shift, F12 and keypad slash keys
    (this is fondly known as the "Vulcan Death Grip")
  - /usr/gfx/stopgfx; /usr/gfx/startgfx
  - reboot

  To restart the X server every time someone logs out of the console,
  edit /var/X11/xdm/xdm-config, change the setting of
  "DisplayManager._0.terminateServer" from "False" to "True" and do
  'killall -HUP xdm'.
reply
layer8
4 months ago
[-]
As I wrote, Ctrl-\ should do the trick. And it’s just not practical having to know which program applies the double pattern, and having to train yourself to not accidentally hit Ctrl-C twice.
reply
__MatrixMan__
4 months ago
[-]
My brush with the double-ctrl-C pattern was in a place that wrote a lot of Java. It was generally frowned on to write any code that relied on signals which windows users can't send, and if I recall, Java made it quite difficult anyhow.

Windows does have a tradition of using ctrl-c to quit though, so SIGINT ends up being one of the few that you can use in both places. It's not pretty, but giving it a different meaning based on how many times you've ordered it seems like a somewhat natural next step, if a hacky one.

reply
bonzini
4 months ago
[-]
In the Meson build system's test harness, a single Ctrl-C terminates the longest running test with a SIGTERM; while three Ctrl-C in a second interrupt the whole run as if you sent the harness a SIGTERM. This was done because it's not uncommon that there are hundreds of tests left to run and you have seen what you want, and it's useful to have an intuitive shortcut for that case.

However, in both cases it's a clean shutdown, all running are terminated and the test report is printed.

reply
jcelerier
4 months ago
[-]
> Unclean exits should be reserved for SIGQUIT (Ctrl-\) and SIGKILL (by definition).

I don't know how it works on your keyboard but on french layout, Ctrl-\ is a two-hands, three-fingers, very unpleasant on the wrist, keyboard shortcut. Not a chance I'd use that for such a common operation.

reply
mananaysiempre
4 months ago
[-]
The byte that sends SIGQUIT is very much configurable with stty quit ^X , but unfortunately X has to be a-z or one of \]^_ (that is, 0x41 through 0x5F except 0x5B = [ which would conflict with other uses of ESC = ^[ = 0x1B) because of how the Ctrl modifier traditionally works. Looking at a map of AZERTY, I don’t see any good options, but you may still want to experiment.
reply
jks
4 months ago
[-]
Curiously, on many terminal emulators the following work:

Ctrl-2 = Ctrl-@ = NUL byte

Ctrl-3 = Ctrl-[ = ESC

Ctrl-4 = Ctrl-\ = default for SIGQUIT

Ctrl-5 = Ctrl-] = jump to definition in vim

Ctrl-6 = Ctrl-^ = mosh escape key

Ctrl-7 = Ctrl-_ = undo in Emacs

I think these probably originate in xterm.

reply
cperciva
4 months ago
[-]
I map SIGQUIT to ^Q because that's the easiest to remember.
reply
glandium
4 months ago
[-]
I suppose you never hit CTRL+S by accident?
reply
kzrdude
4 months ago
[-]
stty -ixon

Make sure that thing is disabled

reply
marcosdumay
4 months ago
[-]
I like that Konlose defaults into disabling that thing. And also that there is a visual sign of the terminal being stopped.
reply
icedchai
4 months ago
[-]
Ctrl-S / Ctrl-Q was super useful in the dialup modem days.
reply
cperciva
4 months ago
[-]
Rarely enough that needing to open another terminal and use kill to send a signal doesn't bother me.
reply
remram
4 months ago
[-]
I think the point is that it is not to be a common operation.
reply
jcelerier
4 months ago
[-]
well I don't know, it feels like I must mash ctrl-c twenty times per day on average at least
reply
Sophira
4 months ago
[-]
While on UK keyboards it's the opposite "problem" - the left Ctrl key and the \ key are right next to each other (making it potentially a one-finger operation), which is the opposite of how a US keyboard is laid out (where Ctrl-\ was presumably intended to need to be a two-handed, two-finger operation).
reply
Izkata
4 months ago
[-]
> which is the opposite of how a US keyboard is laid out (where Ctrl-\ was presumably intended to need to be a two-handed, two-finger operation).

We have a right Ctrl, so one-hand two-finger.

reply
Am4TIfIsER0ppos
4 months ago
[-]
When using a keyboard "properly" how are you gonna manage that?
reply
LtWorf
4 months ago
[-]
two handed operations shouldn't exist.
reply
Sophira
4 months ago
[-]
I completely agree - they're very inaccessible. That's why I quoted the word "problem"; it's not actually a problem at all.
reply
mananaysiempre
4 months ago
[-]
stty quit ^] ?
reply
marcosdumay
4 months ago
[-]
It's worse, because there are languages that encode interruption into the error handling functionality, so it's common that people mismanage their errors and programs require several Ctrl-C presses to actually reach the interruption handler.

What means that you have to memorize a list of "oh, this program needs Ctrl-C 3 times; oh, this program must only receive Ctrl-C once!"... I don't know of any "oh, this program needs Ctrl-C exactly 2 times", but it's an annoying possibility.

reply
wongarsu
4 months ago
[-]
Any software I've come across that uses intentional double ctrl-c shows a message after the first ctrl-c. Something to the effect of "shutting down gracefully, press ctrl-c again for immediate shutdown".

Hence you can just press it once and wait half a second, if no message to this effect appears you can spam ctrl-c.

reply
sunshowers
4 months ago
[-]
Yep, this is generally the pattern.
reply
bcrl
4 months ago
[-]
That shouldn't matter. Your database should be consistent in the face of an unclean exit. ACID has been around for a long time.
reply
Levitating
4 months ago
[-]
They can print a message that states that it is attempting to quit cleanly but can be forced to quit by pressing Ctrl+C another time(s). Unison does this.
reply
sunshowers
4 months ago
[-]
While I agree in spirit, I also want to meet users where they are.
reply
cperciva
4 months ago
[-]
The article doesn't mention the most useful of all signals: SIGINFO, aka "please print to stderr your current status". Very useful for tools like dd and tar.

Probably because Linux doesn't implement it. Worst mistake Linus ever made.

Also, it talks about self-pipe but doesn't mention that self-socket is much better since you can't select on a pipe.

reply
epcoa
4 months ago
[-]
> self-socket is much better since you can't select on a pipe.

This needs further explanation. Why can’t you select on a pipe? You certainly can use select/poll on pipes in general and I’m not sure of any reason in particular they won’t work for the self pipe notification.

Its even right in the original: https://cr.yp.to/docs/selfpipe.html

reply
cperciva
4 months ago
[-]
Oops, brainfart. Sadly it's too late for me to edit that comment.

Yes, you can select just fine on pipes. What I was thinking of is that recv and send doesn't work on pipes, and asynchronous I/O frameworks typically want to use send/recv rather than write/read because the latter don't have a flags parameter.

reply
sunshowers
4 months ago
[-]
Thanks for the feedback! As the talk and the post both mentioned, I was focusing on signals that work on all Unix platforms. Within the constraints of a 30 minute talk there must be material left on the cutting room floor. (If I started talking about the specifics of various Unix lineages I could fill up a whole day...)

For most users in the real world, self-pipes are sufficient. This includes mio (Tokio's underlying library)'s portable Unix implementation of wakers (how parts of the system tell other parts to wake up).

reply
avidiax
4 months ago
[-]
SIGSTOP and SIGCONT are very useful as well.

SIGSTOP is the equivalent of Ctrl-Z in a shell, but you can address it to any process. If you have a server being bogged down, you can stop the offending process temporarily.

SIGCONT undoes SIGSTOP.

The cpulimit tool does this in an automated way so that a process can be limited to use x% of CPU. Nice/renice doesn't keep your CPU from hitting 100% even with an idle priority process, which may be undesirable if it drains battery quickly or makes the cooling fan loud.

reply
sunshowers
4 months ago
[-]
Note Ctrl-Z is actually SIGTSTP, which is basically "SIGSTOP except the process can install a signal handler for it".

I have a very exciting blog post about debugging a nasty bug with how SIGTSTP works, coming very soon.

reply
fragmede
4 months ago
[-]
dd prints out status when sent SIGUSR1, but yeah that would be cool if other utilities did that as well off SIGINFO.
reply
cperciva
4 months ago
[-]
And does ^T map to SIGUSR1? That's the other thing which makes it so useful in BSD.
reply
saagarjha
4 months ago
[-]
You wouldn’t want it to, because the default behavior for SIGUSR1 is to terminate.
reply
cperciva
4 months ago
[-]
Exactly. Whereas on BSD hitting ^T is (a) very likely to print useful information, and (b) if it doesn't do that, won't do anything at all.
reply
efxhoy
4 months ago
[-]
I recently wrote a little data transfer service in python that runs in ECS. When developing it locally it was easy to handle SIGINT: try write a batch, except KeyboardInterrupt, if caught mark the transfer as incomplete and finally commit the change and shut down.

But there’s no exception in python to catch for a SIGTERM, which is what ECS and other service mangers send when it’s time to shut down. So I had to add a signal handler. Would have been neat if SIGTERM could be caught like SIGINT with a “native” exception.

reply
mananaysiempre
4 months ago
[-]

  from signal import SIGTERM, raise_signal, signal
  import sys # for excepthook
  class Terminate(BaseException):
      pass
  def _excepthook(type, value, traceback):
      if not issubclass(type, Terminate):
          return _prevhook(type, value, traceback)
      # If a Terminate went unhandled, make sure we are killed
      # by SIGTERM as far as wait(2) and friends are concerned.
      signal(SIGTERM, _prevterm)
      raise_signal(SIGTERM)
  _prevhook, sys.excepthook = sys.excepthook, _excepthook
  def terminate(signo=SIGTERM, frame=None):
      signal(SIGTERM, _prevterm)
      raise Terminate
  _prevterm = signal(SIGTERM, terminate)
reply
Spivak
4 months ago
[-]
I mean you can just have the signal handler throw StopRequested in your Python boilerplate and never think about it again.

One common pattern is raising KeyboardInterrupt from your handler so it's all handled the same.

reply