Why pipes sometimes get "stuck": buffering
389 points
6 days ago
| 32 comments
| jvns.ca
| HN
Veserv
6 days ago
[-]
The solution is that buffered accesses should almost always flush after a threshold number of bytes or after a period of time if there is at least one byte, “threshold or timeout”. This is pretty common in hardware interfaces to solve similar problems.

In this case, the library that buffers in userspace should set appropriate timers when it first buffers the data. Good choices of timeout parameter are: passed in as argument, slightly below human-scale (e.g. 1-100 ms), proportional to {bandwidth / threshold} (i.e. some multiple of the time it would take to reach the threshold at a certain access rate), proportional to target flushing overhead (e.g. spend no more than 0.1% time in syscalls).

Also note this applies for both writes and reads. If you do batched/coalesced reads then you likely want to do something similar. Though this is usually more dependent on your data channel as you need some way to query or be notified of “pending data” efficiently which your channel may not have if it was not designed for this use case. Again, pretty common in hardware to do interrupt coalescing and the like.

reply
toast0
6 days ago
[-]
I think this is the right approach, but any libc setting automatic timers would lead to a lot of tricky problems because it would change expectations.

I/O errors could occur at any point, instead of only when you write. Syscalls everywhere could be interrupted by a timer, instead of only where the program set timers, or when a signal arrives. There's also a reasonable chance of confusion when the application and libc both set timer, depending on how the timer is set (although maybe this isn't relevant anymore... kernel timer apis look better than I remember). If the application specifically pauses signals for critical sections, that impacts the i/o timers, etc.

There's a need to be more careful in accessing i/o structures because of when and how signals get handled.

reply
Veserv
6 days ago
[-]
You will generally only stall indefinitely if you are waiting for new data. So, you will actually handle almost every use case if your blocking read/wait also respects the timeout and does the flush on your behalf. Basically, do it synchronously at the top of your event loop and you will handle almost every case.

You could also relax the guarantee and set a timeout that is only checked during your next write. This still allows unbounded latency, but as long as you do one more write it will flush.

If neither of these work, then your program issues a write and then gets into a unbounded or unreasonably long loop/computation. At which point you can manually flush what is likely the last write your program is every going to make which would be a trivial overhead since that is a single write compared to a ridiculously long computation. That or you probably have bigger problems.

reply
toast0
6 days ago
[-]
Yeah, these are all fine to do, but a libc can really only do the middle one. And then, at some cost.

If you're already using an event loop library, I think it's reasonable for that to manage flushing outputs while waiting for reads, but I don't think any of the utilities in this example do; maybe tcpdump does, but I don't know why grep would.

reply
Veserv
5 days ago
[-]
Sure, but the article is talking about grep, not write() or libc implementations.

grep buffers writes with no flush timeout resulting in the problem in the article.

grep should probably not suffer from the problem and can use a write primitive/library/design that avoids such problems with relatively minimal extra complexity and dependencies while retaining the performance advantages of userspace buffering.

Most programs (that are minimizing dependencies so can not pull in a large framework, like grep or other simple utilities) would benefit from using such modestly more complex primitives instead of bare buffered writes/reads. Such primitives are relatively easy to use and understand, being largely a drop-in replacement in most common use cases, and resolve most remaining problems with buffered accesses.

Essentially, this sort of primitive should be your default and you should only reach for lower level primitives in your application if you have a good reason for it and understand the problems the layers were designed to solve.

reply
toast0
5 days ago
[-]
> Sure, but the article is talking about grep, not write() or libc implementations.

Yes, but you said

> In this case, the library that buffers in userspace should set appropriate timers when it first buffers the data

The library that buffers in userspace for grep and tcpdump is almost certainly libc.

reply
Veserv
5 days ago
[-]
Okay, I should have said “a” instead of using “the” when there is no clear antecedent allowing it to be interpreted ambiguously in exactly that single sentence which apparently invalidated the fact that I was obviously talking in generalities of API design and implementation.

It did not even occur to me that anybody would even think this was some sort of statement about whatever libc they use on Linux given that I said just “buffered accesses” with no reference to platform or transport channel.

I thought somebody might think I was talking about just writes, so I deliberately wrote accesses.

I thought somebody would make some sort of pedantic statement if I just said “should” so I wrote “should almost always”.

I thought somebody might think I was talking about write() in particular so I deliberately avoided talking about any specific API to head that off.

In my reply I deliberately said “blocking read/wait” instead of select() or epoll() or io_uring or whatever other thing they use these days to avoid such confusion that it was a specific remedy for a specific library or API.

But, alas, here we are. My pedantry was no match for first contact. You will just have to forgive my inability to consider the dire implications of minor ambiguities.

reply
nine_k
6 days ago
[-]
I don't follow. Using a pipe sets an expectation of some amount of asynchronicity, because we only control one end of the pipe. I don't see a dramatic difference between an error occurring because of the process on the other end is having trouble, or because of a timeout handler is trying to push the bytes.

On the reading end, the error may occur at the attempt to read the pipe.

On the writing end, the error may be signaled at the next attempt to write to or close the pipe.

In either case, a SIGPIPE can be sent asynchronously.

What scenario am I missing?

reply
toast0
6 days ago
[-]
> In either case, a SIGPIPE can be sent asynchronously.

My expectation (and I think this is an accurate expecation) is that a) read does not cause a SIGPIPE, read on a widowed pipe returns a zero count read as indication of EOF. b) write on a widowed pipe raises SIGPIPE before the write returns. c) write to a pipe that is valid will not raise SIGPIPE if the pipe is widowed without being read from.

Yes, you could get a SIGPIPE from anywhere at anytime, but unless someone is having fun on your system with random kills, you won't actually get one except immediately after a write to a pipe. With a timer based asynchronous write, this changes to potentially happening any time.

This could be fine if it was well documented and expected, but it would be a mess to add it into the libcs at this point. Probably a mess to add it to basic output buffering in most languages.

reply
asveikau
6 days ago
[-]
I think doing those timeouts transparently would be tricky under the constraints of POSIX and ISO C. It would need to have some cooperation from the application layer
reply
jart
6 days ago
[-]
The only way you'd be able to do it is by having functions like fputc() call clock_gettime(CLOCK_MONOTONIC_COARSE) which will impose ~3ns overhead on platforms like x86-64 Linux which have a vDSO implementation. So it can be practical sort of although it'd probably be smarter to just use line buffered or unbuffered stdio. In practice even unbuffered i/o isn't that bad. It's the default for stderr. It actually is buffered in practice, since even in unbuffered mode, functions like printf() still buffer internally. You just get assurances whatever it prints will be flushed by the end of the call.
reply
asveikau
6 days ago
[-]
That's just for checking the clock. You'd also need to have a way of getting called back when the timeout expires, after fputc et al are long gone from the stack and your program is busy somewhere else, or maybe blocked.

Timeouts are usually done with signals (a safety nightmare, so no thanks) or an event loop. Hence my thought that you can't do it really transparently while keeping current interfaces.

reply
jart
6 days ago
[-]
Signals aren't a nightmare it's just that fflush() isn't defined by POSIX as being asynchronous signal safe. You could change all your stdio functions to block signals while running, but then you'd be adding like two system calls to every fputc() call. Smart thing to do would probably be creating a thread with a for (;;) { usleep(10000); fflush(stdout); } loop.
reply
asveikau
6 days ago
[-]
Signals are indeed a nightmare. Your example of adding tons of syscalls to make up for lack of safety shows that you understand that to be true.

And no, creating threads to solve this fringe problem in a spin loop with a sleep is not what I'd call "smart". It's unnecessary complexity and in most cases, totally wasted work.

reply
jart
5 days ago
[-]
The smartest thing to do is still probably not buffering. What's wrong with the thread? It would take maybe 15 lines of code to implement. It would be correct without rarely occurring bugs. It doesn't need signals or timers. It wouldn't add overhead to stdio calls. It's a generalized abstraction. You won't need to change your program's event loop code. Create the thread with a tiny 64kb stack and what's not to like? Granted, it would rub me the wrong way if libc did this by default, since I wouldn't want mystery threads appearing in htop for my hello world programs. But for an app developer, this is a sure fire solution.
reply
klempner
5 days ago
[-]
How exactly does this interact with fork()?
reply
jart
4 days ago
[-]
Your libc fork() implementation will lock the stdio mutexes automatically before calling SYS_fork, so the code I wrote in my comment would be fork safe. Your child process would need to spawn the flusher thread again if it's desired though.
reply
asveikau
5 days ago
[-]
> It would take maybe 15 lines of code to implement

This is a very bad reason to justify something. Especially introduce threads. Your response here is like saying "I don't know why people say it's so hard to write multi threaded programs, the thread create API is so simple." It completely misses the point why added complexity can be harmful.

> without rarely occurring bugs.

Except for a glaring thing like "what if fflush gets an I/O error in this background thread"?

> Granted, it would rub me the wrong way if libc did this by default,

This is exactly my point. It needs cooperation from the application layer. It wouldn't make sense to be transparent.

reply
jart
4 days ago
[-]
Almost no one checks for error when printing to stdout. That's why SIGPIPE exists.

Complexity, readability, etc. is the argument people make when they've run out of arguments.

reply
vlovich123
6 days ago
[-]
Typical Linux alarms are based on signals and are very difficult to manage and rescheduling them may have a performance impact since it requires thunking into the kernel. If you use io_uring with userspace timers things can scale much better, but it still requires you to do tricks if you want to support a lot of fast small writes (eg > ~1 million writes per second timer management starts to show up more and more and you have to do some crazy tricks I figured out to get up to 100M writes per second)
reply
Veserv
6 days ago
[-]
You do not schedule a timeout on each buffered write. You only schedule one timeout on the transition from empty to non-empty that is retired either when the timeout occurs or when you threshold flush (you may choose to not clear on threshold flush if timeout management is expensive). So, you program at most one timeout per timeout duration/threshold flush.

The point is to guarantee data gets flushed promptly which only fails when not enough data gets buffered. The timeout is a fallback to bound the flush latency.

reply
vlovich123
6 days ago
[-]
Yes that can work but as I said that has trade offs.

If you flush before the buffer is full, you’re sacrificing throughput. Additionally the timer firing has additional performance degradation especially if you’re in libc land and only have a sigalarm available.

So when an additional write is added, you want to push out the timer. But arming the timer requires reading the current time among other things and at rates of 10-20Mhz and up reading the current wall clock gets expensive. Even rdtsc approaches start to struggle at 20-40Mhz. You obviously don’t want to do it on every write but you want to make sure that you never actually trigger the timer if you’re producing data at a relatively fast enough clip to otherwise fill the buffer within a reasonable time.

Source: I implemented write coalescing in my nosql database that can operate at a few gigahertz for 8 byte writes/s into an in memory buffer. Once the buffer is full or a timeout occurs, a flush to disk is triggered and I net out at around 100M writes/s (sorting the data for the LSM is one of the main bottlenecks). By comparison DBs like RocksDB can do ~2M writes/s and SQLite can do ~800k.

reply
Veserv
6 days ago
[-]
You are not meaningfully sacrificing throughput because the timeout only occurs when you are not writing enough data; you have no throughput to sacrifice. The threshold and timeout should be chosen such that high throughput cases hit the threshold, not the timeout. The timeout exists to bound the worst-case latency of low access throughput.

You only lose throughput in proportion to the handling cost of a single potentially spurious timeout/timeout clear per timeout duration. You should then tune your buffering and threshold to cap that at a acceptable overhead.

You should only really have a problem if you want both high throughput and low latency at which point general solutions are probably not not fit for your use case, but you should remain aware of the general principle.

reply
vlovich123
6 days ago
[-]
> You should only really have a problem if you want both high throughput and low latency at which point general solutions are probably not not fit for your use case, but you should remain aware of the general principle.

Yes you’ve accurately summarized the end goal. Generally people want high throughput AND low latency, not to just cap the maximum latency.

The one shot timer approach only solves a livelock risk. I’ll also note that your throughput does actually drop at the same time as the latency spike because your buffer stays the same size but you took longer to flush to disk.

Tuning correctly turns out to be really difficult to accomplish in practice which is why you really want self healing/self adapting systems that behave consistently across all hardware and environments.

reply
YZF
5 days ago
[-]
I would tend to disagree. The buffering here is doing what's it's supposed to be. The mix of something that's supposed to be interactive with a contract that's not meant to be interactive is the source of the problem (tail following into a pipe). There's no real problem to solve here. The "hardware" analog is a tank of water that accumulates rainwater and you only move it somewhere when it fills up. I'm not sure what examples you have in mind but time based flushes aren't common in hardware AFAIK.

The proposed fix makes the contract a lot more complicated.

reply
dullcrisp
5 days ago
[-]
How is there no problem to solve when the post clearly identifies a problem, as well as why the behavior is confusing to a user?

“The system is working as it was designed,” is always true but unhelpful.

reply
YZF
5 days ago
[-]
I get that. But it's like complaining why your pliers aren't good at unscrewing a bolt. I'm willing to accept the UX isn't great but `tail -f | grep` is like `vim | grep` or `rogue | grep` (I'm exaggerating to try and make a point). `tail -f` is also if I'm not mistaken a much more recent development vs. the origins of the Unix command line pipes.

So sure, it would maybe be a better UX to be able to combine things and have them work, but there is fundamental tension between building something that's optimized for moving chunks of data and building things that's interactive. And trying to force one into the other, in my humble opinion, is not the solution.

reply
kiitos
5 days ago
[-]
> `tail -f | grep` is like `vim | grep` or `rogue | grep`

I think this position is user-hostile.

`vim` and `rogue` are fully user-interactive programs. The same is not true of `tail -f`, which by default appears to users as a stream of lines.

I understand why, at a technical level, `tail -f | grep` doesn't work in the way that's expected here. But it should! At least, when invoked from a user-interactive shell session -- in that context, a "chunk of data" is clearly expected to be a newline-delimited line, not a buffer of some implicitly-defined size.

reply
dullcrisp
5 days ago
[-]
In fact, `tail -f | grep` works the way they expect. What doesn’t work is `tail -f | grep | grep`.

It’s hard to argue that grep isn’t supposed to work like this when grep tries to work like this. It’s not a fundamental tension, it’s just that isatty(stdout) doesn’t always tell you when you’re running in an interactive terminal.

reply
YZF
5 days ago
[-]
That's a good point. If the proposal is to propagate that all the way through the pipe chain I'd support that. My gut negative reaction is to the complexity of adding flush timers.
reply
Dylan16807
5 days ago
[-]
What are you saying is "not meant to be interactive"? That's not true of pipes in general, or of grep in general.

Or, even if it is true of pipes, then we need an alternate version of a pipe that signals not to buffer, and can be used in all the same places.

It's a real problem either way, it just changes the nature of the problem.

> The proposed fix makes the contract a lot more complicated.

How so? Considering programs already have to deal with terminals, I'm skeptical a way to force one aspect of them would be a big deal.

reply
YZF
5 days ago
[-]
Sure. An alternative for combining interactive terminal applications might be interesting. But I think there is tension between the Unix mechanisms and interactive applications that's not easy to resolve. What's `less | grep` or `vim | grep`... do we need to send input back through the pipe now?

It's one of those things you get used to when you've used Unix-like systems long enough. Yes, it's better things just work as someone who is not a power user expects them to work but that's not always possible and I'd say it's not worth it to try to always meet that goal, especially if it leads to more complexity.

reply
Dylan16807
5 days ago
[-]
> But I think there is tension between the Unix mechanisms and interactive applications that's not easy to resolve.

I would say that the platonic ideal of the pipe Unix mechanism has no buffering, and the buffer is only there as a performance optimization.

> What's `less | grep` or `vim | grep`... do we need to send input back through the pipe now?

Well, this is "interactive" in the timing sense. It still has one-way data flow. That's how I interpreted you and how I used the word.

If you meant truly interactive, then I think you're talking about something unrelated to the post.

reply
YZF
5 days ago
[-]
You're right that it does not take input from the user. But I think you're in agreement with me that not everything is pipe-able.

The buffer in Unix (or rather C?) file output goes back to the beginning of time. It's not the pipe that's buffering.

Anyways, as soon as your mental model of these command line utilities includes the buffering then the behavior makes sense. How friendly it is can be debated. Trying to make it work with timers feels wrong and would introduce more complexity and deviate from some people's mental model.

reply
Dylan16807
5 days ago
[-]
The not-buffering when attached to a terminal also goes back the beginning of time.

I don't see any particular reason for a pipe to be more like a file than a terminal.

And I don't see why my mental model should be file-like and only file-like.

> Trying to make it work with timers feels wrong and would introduce more complexity and deviate from some people's mental model.

Oh, that's the specific proposed fix you meant. Okay, I can see why you'd dislike that, but I would say that forcing line mode doesn't have those downsides.

reply
worksonmymach
5 days ago
[-]
I prefer a predictable footgun. Your idea is good but it would need to be another flag, so you have to know it exists. Not knowing the semantics is the issue, rather than the semantics themselves.
reply
Twirrim
6 days ago
[-]
This is one of those things where, despite some 20+ years of dealing with NIX systems, I know* it happens, but always forget about it until I've sat puzzled why I've got no output for several moments.
reply
hiatus
6 days ago
[-]
> Some things I didn’t talk about in this post since these posts have been getting pretty long recently and seriously does anyone REALLY want to read 3000 words about buffering?

I personally would.

reply
CoastalCoder
6 days ago
[-]
It depends on the writing.

I've read that sometimes wordy articles are mostly fluff for SEO.

reply
TeMPOraL
6 days ago
[-]
In case of this particular author, those 3000 words would be dense, unbuffered wisdom.
reply
penguin_booze
6 days ago
[-]
Summarizing is one area where I'd consider using AI. I haven't explored what solutions exist yet.
reply
EasyMark
3 days ago
[-]
I think this is one of those things where AI helps a lot. Have it summarize the article, then proof read it. Have a TLDR and a NTLDR (never too long, did read) section
reply
londons_explore
6 days ago
[-]
I'd like all buffers to be flushed whenever the systemwide CPU becomes idle.

Buffering generally is a CPU-saving technique. If we had infinite CPU, all buffers would be 1 byte. Buffers are a way of collecting together data to process in a batch for efficiency.

However, when the CPU becomes idle, we shouldn't have any work "waiting to be done". As soon as the kernel scheduler becomes idle, all processes should be sent a "flush your buffers" signal.

reply
corank
5 days ago
[-]
An interesting idea! Sending signals to all processes sounds expensive as hell though. All that work just to make a system call again to flush a buffer. Maybe a mechanism can be added to make the kernel aware of the user space buffer so it can directly fetch stuff from there when it's idle? Is it kind of like io_uring https://man7.org/linux/man-pages/man3/io_uring_register_buff...
reply
londons_explore
5 days ago
[-]
> Sending signals to all processes sounds expensive as hell though.

You'd only send signals to processes who had run any code since the CPU was last idle (if the process hasn't executed, it can't have buffered anything).

There could also be some kind of "I have buffered something" flag a process could set on itself, for example at a well-known memory address.

reply
webstrand
5 days ago
[-]
That's a neat idea, probably not all at once though. And it could reduce efficiency for low power systems to be speculatively waking sleeping processes?
reply
o11c
5 days ago
[-]
This article is confusing two different things: "unbuffered" vs "line-buffered".

Unbuffered will gratuitously give you worse performance, and can create incorrect output if multiple sources are writing to the same pipe. (sufficiently-long lines will intermingle anyway, but most real-world output lines are less than 4096 bytes, even including formatting/control characters and characters from supplementary planes)

Line-buffering is the default for terminals, and usually what you want for pipes. Run each command under `stdbuf -oL -eL` to get this. The rare programs that want to do in-line updates already have to do manual flushing so will work correctly here too.

You can see what `stdbuf` is actually doing by running:

  env -i `command -v stdbuf` -oL -eL `command -v env`
reply
BoingBoomTschak
6 days ago
[-]
Also made a post some time ago about the issue: https://world-playground-deceit.net/blog/2024/09/bourne_shel...

About the commands that don't buffer, this is either implementation dependent or even wrong in the case of cat (cf https://pubs.opengroup.org/onlinepubs/9799919799/utilities/c... and `-u`). Massive pain that POSIX never included an official way to manage this.

Not mentioned is input buffering, that would gives you this strange result:

  $ seq 5 | { v1=$(head -1); v2=$(head -1); printf '%s=%s\n' v1 "$v1" v2 "$v2"; }
  v1=1
  v2=
The fix is to use `stdbuf -i0 head -1`, in this case.
reply
jagrsw
6 days ago
[-]
I don't believe a process reading from a pipe/socketpair/whatever can enforce such constraints on a writing process (except using heavy hackery like ptrace()). While it might be possible to adjust the pipe buffer size, I'm not aware of any convention requiring standard C I/O to respect this.

In any case, stdbuf doesn't seem to help with this:

  $ ./a | stdbuf -i0 -- cat

  #include <stdio.h>
  #include <unistd.h>
  int main(void) {
   for (;;) {
    printf("n");
    usleep(100000);
   }
  }
reply
BoingBoomTschak
6 days ago
[-]
I'm sorry, but I don't understand what you're meaning. The issue in your example is the output buffering of a, not the input buffering of cat. You'd need `stdbuf -o0 ./a | cat` there.
reply
calibas
6 days ago
[-]
Buffers are there for good reason, it's extremely slow (relatively speaking) to print output on a screen compared to just writing it to a buffer. Printing something character-by-character is incredibly inefficient.

This is an old problem, I encounter it often when working with UART, and there's a variety of possible solutions:

Use a special character, like a new line, to signal the end of output (line-based).

Use a length-based approach, such as waiting for 8KB of data.

Use a time-based approach, and print the output every X milliseconds.

Each approach has its own strengths and weaknesses, depends upon the application which one works best. I believe the article is incorrect when mentioning certain programs that don't use buffering, they just don't use an obvious length-based approach.

reply
qazxcvbnmlp
6 days ago
[-]
Having a layer or two above the interface aware of the constraint works the best (when possible). Line based approach does this but requires agreement on the character (new line).
reply
akira2501
6 days ago
[-]
Which is exactly why setbuf(3) and setvbuf(3) exists.
reply
PhilipRoman
6 days ago
[-]
Also, it's not just the work needed to actually handle the write on the backend - even just making that many syscalls to /dev/null can kill your performance.
reply
NelsonMinar
5 days ago
[-]
I've been doing Unix for 35+ years and have never fully understood how this works. I appreciate the holistic description here, talking about buffering behaviors across a bunch of different systems and components. I definitely learned something from it.
reply
toast0
6 days ago
[-]
> when you press Ctrl-C on a pipe, the contents of the buffer are lost

I think most programs will flush their buffers on SIGINT... But for that to work from a shell, you'd need to deliver SIGINT to only the first program in the pipeline, and I guess that's not how that works.

reply
akdev1l
6 days ago
[-]
The last process gets sigint and everything else gets sigpipe iirc
reply
tolciho
6 days ago
[-]
No, INTR "generates a SIGINT signal which is sent to all processes in the foreground process group for which the terminal is the controlling terminal" (termios(4) on OpenBSD, other what passes for unix these days are similar), as complicated by what exactly is in the foreground process group (use tcgetpgrp(3) to determine that) and what signal masking or handlers those processes have (which can vary over the lifetime of a process, especially for a shell that does job control), or whether some process has disabled ISIG—the terminal being shared "global" state between one or more processes—in which case none of the prior may apply.

  $ make pa re ci
  cc -O2 -pipe    -o pa pa.c
  cc -O2 -pipe    -o re re.c
  cc -O2 -pipe    -o ci ci.c
  $ ./pa | ./re | ./ci > /dev/null
  ^Cci (2) 66241 55611 55611
  pa (2) 55611 55611 55611
  re (2) 63366 55611 55611
So with "pa" program that prints "y" to stdout, and "re" and "ci" that are basically cat(1) except that these programs all print some diagnostic information and then exit when a SIGPIPE or SIGINT is received, here showing that (on OpenBSD, with ksh, at least) a SIGINT is sent to each process in the foreground process group (55611, also being logged is the getpgrp which is also 55611).

  $ kill -l | grep INT
   2    INT Interrupt                     18   TSTP Suspended
reply
toast0
6 days ago
[-]
That makes sense to me, but the article implied everything got a sigint, but the last program got it first. Eitherway, you'd need a different way to ask the shell to do it the otherway...

Otoh, do programs routinely flush if they get SIGINFO? dd(1) on FreeBSD will output progress if you hit it with SIGINFO and continue it's work, which you can trigger with ctrl+T if you haven't set it differently. But that probably goes to the foreground process, so probably doesn't help. And, there's the whole thing where SIGINFO isn't POSIX and isn't really in Linux, so it's hard to use there...

This article [1] says tcpdump will output the packet counts, so it might also flush buffers, I'll try to check and report a little later today.

[1] https://freebsdfoundation.org/wp-content/uploads/2017/10/SIG...

reply
toast0
6 days ago
[-]
> This article [1] says tcpdump will output the packet counts, so it might also flush buffers, I'll try to check and report a little later today.

I checked, tcpdump doesn't seem to flush stdout on siginfo, and hitting ctrl+T doesn't deliver it a siginfo in the tcpdump | grep case anyway. Killing tcpdump with sigint does work: tcpdump's output is flushed and it closes, and then the grep finishes too, but there's not a button to hit for that.

reply
ctoth
6 days ago
[-]
Feels like a missed opportunity for a frozen pipes joke.

Then again...

Frozen pipes are no joke.

reply
mg
6 days ago
[-]
Related: Why pipes can be indeterministic.

https://www.gibney.org/the_output_of_linux_pipes_can_be_inde...

reply
Thaxll
6 days ago
[-]
TTY, console, shell, stdin/out, buffer, pipe, I wish there was a clear explanation somewhere of how all of those are glue/work together.
reply
MathMonkeyMan
6 days ago
[-]
Here's a resource for at least the first one: https://www.linusakesson.net/programming/tty/
reply
why-el
6 days ago
[-]
Love it.

> this post is only about buffering that happens inside the program, your operating system’s TTY driver also does a little bit of buffering sometimes

and if the TTY is remote, so do the network switches! it's buffering all the way down.

reply
josephcsible
6 days ago
[-]
I've ran into this before, and I've always wondered why programs don't just do this: when data gets added to a previously-empty output buffer, make the input non-blocking, and whenever a read comes back with EWOULDBLOCK, flush the output buffer and make the input blocking again. (Or in other words, always make sure the output buffer is flushed before waiting/going to sleep.) Wouldn't this fix the problem? Would it have any negative side effects?
reply
392
2 days ago
[-]
Setting nonblocking takes effect on the file description rather than just the file descriptor. Meaning if your program crashes the description remains in nonblocking mode for the next program, which is not prepared to handle it.

Node.js sets stdin to nonblocking. This is great because it means copy and pasting a shell script containing an npm install into your shell will work, since the description is reset between each program by your terminal. But when those same lines are executed by the bash interpreter directly, processes after npm will randomly fail by failing to read from stdin with a return value they never expected to see. Ask me how I know

reply
Rygian
6 days ago
[-]
Learned two things: `unbuffer` exists, and “unnecessary” cats are just fine :-)
reply
samatman
5 days ago
[-]
I prefer the trivial cat, because the < redirect puts the source in the middle of the pipe.

  cat foo.txt | bar | blah > out.log
vs.

  bar < foo.txt | blah > out.log
It looks more like what it is. Also, with cat you can add another file or use a glob, that's come in handy more than once.

Furthermore, it means the first command isn't special, if I decide I want something else as the first command I just add it. Pure... concatenation. heh.

It's useful to know both ways, I suppose. But "don't use trivial cat" is just one of those self-perpetuating doctrines, there's no actual reason not to do things that way if you want.

reply
delamon
5 days ago
[-]
You can do it like this:

     < foo.txt bar | blah > out.log
reply
samatman
5 days ago
[-]
Hey, TIL.

I like that more, in a way, and less, in a way. The angle bracket is pointing off into nothing but throws `foo.txt` into `bar` anyway, so the control flow seems more messed up than in `bar < foo.txt`.

On the other hand it's structurally a bit more useful, because I can insert a different first stage very easily. But I still can't add a filename or change it to a glob, so cat is still more flexible.

So I'm going to stick to my trivial-catting ways, but thanks for the head's up.

reply
wrsh07
6 days ago
[-]
I like unnecessary cat because it makes the rest of the pipe reusable across other commands

Eg if I want to test out my greps on a static file and then switch to grepping based on a tail -f command

reply
chatmasta
6 days ago
[-]
Yep. I use unnecessary cats when I’m using the shell interactively, and especially when I’m building up some complex pipeline of commands by figuring out how to do each step before moving onto the next.

Once I have the final command, if I’m moving it into a shell script, then _maybe_ I’ll switch to file redirection.

reply
Dan42
5 days ago
[-]
grep has a --line-buffered option that does the job fine in most cases. Just set in your aliases grep='grep --line-buffered', that way you get the correct behavior when you tail logs piped to a sequence of greps, and you avoid the performance penalty in scripts.
reply
wyuenho
5 days ago
[-]
An equally important but opposite problem with pipes getting stuck is pipes getting broken because some commands at the front of the pipe expects buffering down the pipe. Some years ago I was scratching my head trying to figure out why

  curl ... | grep -q
was giving me a "Failed write body error". I knew "grep -q" would close stdin and exit as soon as a match is found, and therefore I needed a buffer in front of grep but I was on a Mac, which to this day still doesn't come with "sponge" (or stdbuf and unbuffer for that matter), so I had to find a cross-platform command that does a little buffering but not too much, and could handle stdout being closed. So I settled on:

  curl ... | uniq | grep -q
To this day people are still confused why there's not a "sort" in front of uniq and the comment about this cross-platform buffer thing I put in the script.
reply
worksonmymach
5 days ago
[-]
Lol I just realized this problem hit me. I did a docker --follow | grep and got nothing. Then found the data in the docker logs later. Assumed it was a brainfart and as I was demoing something didn't investigate just brushed over it. Thanks jvns!
reply
Joker_vD
6 days ago
[-]
> I think this problem is probably unavoidable – I spent a little time with strace to see how this works and grep receives the SIGINT before tcpdump anyway so even if tcpdump tried to flush its buffer grep would already be dead.

I believe quite a few utilities actually do try to flush their stdout on receiving SIGINT... but as you've said, the other side of the pipe may also very well have received a SIGINT, and nobody does a short-timed wait on stdin on SIGINT: after all, the whole reason you've been sent SIGINT is because the user wants your program to stop working now.

reply
xorcist
5 days ago
[-]
When you think about it, this behaviour makes sense. How else would you implement it? I/O is always block oriented. The reason you don't always notice is that UNIX is so dearly line-based that every layer of the software stack goes to enormous pain to buffer everything to make interactive use convenient. (The article then talks about different types of buffering, but that is not important to the point being made.)

So we are on top of a large pile of optimizations the even make "grep | grep | grep" possible, but we are so spoiled we don't even notice until the pile of abstractions runs too deep and the optimizations doesn't work anymore!

In this specific case, it is however a strange thing to do. Why would you want to start two processes in order to search for two words? Each with its own memory management, scheduling, buffering etc.? Even if the buffering can be turned off, the other overhead will be noticeable.

If you wanted to search for ten words, would you start ten separate grep processes, each responsible for their own word?

No, you would ask grep to search for lines containing these both these two words in any order, like so:

  grep 'thing1.*thing2\|thing2.*thing1'
This is probably what the article alludes to in the "grep -E" example (the -E is there in order to not have to escape the pipe character), but the author forgot to write the whole argument so the example is wrong.

It would be practical to have a special syntax for "these things, in any order" in grep, but sadly that's one of the things missing from original grep. This makes it unnecessarily difficult to construct the argument programmatically with a script. This was one of the things Perl solved with lookahead and lookbehind buffers, you can use these to look ahead from the start of the line to see if all words are present:

  grep -P '^(?=.*?thing1)(?=.*?thing2)'
This syntax is hard to read and understand, but it's the least bad when needed.

(Related: How to search for any one of several words? That's much easier because each match is independent of each other, just give grep several expressions:

  grep -e 'thing1' -e 'thing2'
which has none of the above problems.)
reply
jakub_g
6 days ago
[-]
In our CI we used to have some ruby commands that were piped to prepend "HH:MM:SS" to each line to track progress (because GitLab still doesn't support this out of the box, though it's supposed to land in 17.0), but it would sometimes lead to some logs being flushed with a large delay.

I knew it had something to do with buffers and it drove me nuts, but couldn't find a fix, all solutions tried didn't really work.

(Problem got solved when we got rid of ruby in CI - it was legacy).

reply
BeefWellington
6 days ago
[-]
AFAIK, signal order generally propagates backwards, so the last command run will always receive the signal first, provided it is a foreground command.

But also, the example is not a great one; grepping tcpdump output doesn't make sense given its extensive and well-documented expression syntax. It's obviously just used as an example here to demonstrate buffering.

reply
Joker_vD
6 days ago
[-]
> grepping tcpdump output doesn't make sense given its extensive and well-documented expression syntax

Well. Personally, every time I've tried to learn its expression syntax from its extensive documentation my eyes would start to glaze over after about 60 seconds; so I just stick with grep — at worst, I have to put the forgotten "-E" in front of the pattern and re-run the command.

By the way, and slightly off-tangent: if anyone ever wanted grep to output only some part of the captured pattern, like -o but only for the part inside the parentheses, then one way to do it is to use a wrapper like this:

    #!/bin/sh -e

    GREP_PATTERN="$1"
    SED_PATTERN="$(printf '%s\n' "$GREP_PATTERN" | sed 's;/;\\/;g')"
    shift

    grep -E "$GREP_PATTERN" --line-buffered "$@" | sed -r 's/^.*'"$SED_PATTERN"'.*$/\1/g'
Not the most efficient way, I imagine, but it works fine for my use cases (in which I never need more than one capturing group anyway). Example invocation:

    $ xgrep '(^[^:]+):.*:/nonexistent:' /etc/passwd
    nobody
    messagebus
    _apt
    tcpdump
    whoopsie
reply
cafeinux
5 days ago
[-]
I usually use PCRE mode and prepend what I want to be displayed with `\K` (and append a lookahead if needed):

  ~ $ echo "foo bar1 baz foo bar2" | grep -oP 'foo \Kbar\d'
  bar1
  bar2
  ~ $ echo "foo bar1 baz foo bar2" | grep -oP 'foo \Kbar\d(?= baz)'
  bar1
  ~ $ echo "foo bar1 baz foo bar2" | grep -oP 'foo \Kbar\d(?=$)'
  bar2
Of course, it implies using a version of `grep` supporting the `-P` option. Notably, MacOS doesn't by default, although if -P is utterly needed, there are ways to install gnu-grep or modify the command used to achieve the same result. Your way is perhaps more cross-platform, but for my (very personal) use cases, mine is easier to remember and needs no setup.

Edit: worst case, piping to `cut` or `awk` can also be a solution.

reply
Joker_vD
5 days ago
[-]
Ah, I knew about the \K to cut things before the match, but I could never find how to cut away the things after. But it exists and it's (?=... Well, better late than never.

> worst case, piping to `cut` or `awk` can also be a solution.

Yeah, I've used that too, and that's how I ended with writing the script down: constantly piping things through the second filter with yet another stupid regex that needs tinkering as well... isn't there a way to reuse the first regex, somehow? Hmm, don't the patterns in the sed's "substitute" use the same syntax as the grep does?.. They do! How convenient.

reply
chatmasta
6 days ago
[-]
ChatGPT has eliminated this class of problem for me. In fact it’s pretty much all I use it for. Whether it’s ffmpeg, tcpdump, imagemagick, SSH tunnels, Pandas, numpy, or some other esoteric program with its own DSL… ChatGPT can construct the arguments I need. And if it gets it wrong, it’s usually one prompt away from fixing it.
reply
toast0
6 days ago
[-]
> grepping tcpdump output doesn't make sense given its extensive and well-documented expression syntax.

I dunno. If doesn't make sense in the world where everyone makes the most efficient pipelines for what they want; but in that world, they also always remember to use --line-buffered on grep when needed, and the line buffered output option for tcpdump.

In reality, for a short term thing, grepping on the grepable parts of the output can be easier than reviewing the docs to get the right filter to do what you really want. Ex, if you're dumping http requests and you want to see only lines that match some url, you can use grep. Might not catch everything, but usually I don't need to see everything.

reply
two_handfuls
6 days ago
[-]
Maybe that's why my mbp sometimes appears not to see my keyboard input for a whole second even though nothing much is running.
reply
YZF
5 days ago
[-]
Shouldn't be related ... but who knows. I've not noticed this myself (also MBP user).
reply
frogulis
6 days ago
[-]
Hopefully not a silly question: in the original example, even if we had enough log data coming from `tail` to fill up the first `grep` buffer, if the logfile ever stopped being updated, then there would likely be "stragglers" left in the `grep` buffer that were never outputted, right?
reply
fwip
5 days ago
[-]
Yes, that's correct.
reply
chrsw
5 days ago
[-]
I've run into this buffering issue on Unix before and it had me scratching my head for a while. Caution: not all 'awk' implementations behave the same with respect to buffering.
reply
pixelbeat
6 days ago
[-]
Nice article. See also: https://www.pixelbeat.org/programming/stdio_buffering/

It's also worth mentioning a recent improvement we made (in coreutils 8.28) to the operation of the `tail | grep` example in the article. tail now notices if the pipe goes away, so one could wait for something to appear in a log, like:

    tail -f /log/file | grep -q match
    then_do_something
There are lots of gotchas to pipe handling really. See also: https://www.pixelbeat.org/programming/sigpipe_handling.html
reply
mpbart
6 days ago
[-]
Wow I had no idea this behavior existed. Now I’m wondering how much time I’ve wasted trying to figure out why my pipelined greps don’t show correct output
reply
gigatexal
5 days ago
[-]
Julia’s blog is one of the hall of fame blogs. I aspire to share content so enriching one day on my own piece of the internet.
reply
m00x
5 days ago
[-]
Because it's not a truck that you can just dump things on, it's a series of tubes.
reply
kreetx
6 days ago
[-]
From experience, `unbuffer` is the tool to use to turn buffering off reliably.
reply
radarsat1
6 days ago
[-]
Side note maybe, but is there an alternative to chaining two greps?
reply
ykonstant
6 days ago
[-]
The most portable way to do it with minimal overhead is with sed:

  sed -e '/pattern1/!d' -e '/pattern2/!d'
which generalizes to more terms. Easier to remember and just as portable is

  awk '/pattern1/ && /pattern2/'
but now you need to launch a full awk.

For more ways see https://unix.stackexchange.com/questions/55359/how-to-run-gr...

reply
jrpelkonen
5 days ago
[-]
These are good suggestions, but maybe it’s worth noting that one of the solutions offered in the article is not equivalent:

  tail -f /some/log/file |  grep -E 'thing1.*thing2'
This will only match if the subpatterns, i.e. thing1 & thing2, are in this order, and also require that the patterns do not overlap.
reply
SG-
6 days ago
[-]
how important is buffering in 2024 on modern ultra fast single user systems I wonder? I'd be interested in seeing it disabled for testing purposes.
reply
akira2501
6 days ago
[-]
> on modern ultra fast single user systems I wonder?

The latency of a 'syscall' is on the order of a few hundred instructions. You're switching to a different privilege mode, with a different memory map, and where your data ultimately has to leave the chip to reach hardware.

It's absurdly important and it will never not be.

reply
chatmasta
6 days ago
[-]
It depends what the consumer is doing with the data as it exits the buffer. If it’s a terminal program printing every character, then it’s going to be slow. Or more generally if it’s any program that doesn’t have its own buffering, then it will become the bottleneck so the slowdown will depend on how it processes input.

Ultimately even “no buffer” still has a buffer, which is the number of bits it reads at a time. Maybe that’s 1, or 64, but it still needs some boundary between iterations.

reply
YZF
5 days ago
[-]
Those ultra fast systems also have ultra fast I/O. Buffering is critical to get good performance e.g. on your NVMe. The difference between writing one character at a time and writing a few megabytes at a time would be many orders of magnitude (x1000? x10000?) enough to make pipe processing of large files be unacceptably slow. Even between processes you want to move large blocks of data otherwise you're just context switching all the time. You can try this by flushing after every character in a toy program and do some sort of chain `toy largefile | toy | toy > mycopy`
reply
emcell
6 days ago
[-]
one of the reasons why i hate computers :D
reply