We found a bug in the hyper HTTP library
113 points
4 days ago
| 9 comments
| blog.cloudflare.com
| HN
Twey
2 hours ago
[-]
This would have been flagged by Clippy lints `let_underscore_untyped` or `let_underscore_must_use`, which sadly are not enabled by default.
reply
microgpt
44 minutes ago
[-]
Or just by not writing let _ =
reply
Twey
23 minutes ago
[-]
All recurrent people problems are system problems.
reply
pwdisswordfishq
1 hour ago
[-]
Ehh, easy fix

    #[allow(clippy::let_underscore_untyped,clippy::let_underscore_must_use)]
    let _ = self.poll_flush(cx)?;
reply
Twey
1 hour ago
[-]
I said ‘flagged’, not ‘fixed’ :)

You can always write the wrong code if you want it enough. But hopefully a warning would have prompted someone to think harder about this flow.

reply
pwdisswordfishq
52 minutes ago
[-]
But "let _ =" is already an explicit suppression of a must-use warning. Where does this arms race of "no, I really know what I am doing, compiler" versus "no, this really looks like a mistake, programmer" end?
reply
Twey
26 minutes ago
[-]
That's an excellent question I don't have an answer for in general :)

IMHO the goal is usually for the compiler not to make these decisions but to provide the tools for the APIs people build to make them. That's kind of passing the buck, though.

I guess in this case the core problem is that the API for these I/O calls has no representation in the type system for what's happening to the buffer. Proxying it as ‘the programmer must think about this code path’ is a reasonable best-effort but, evidently, sometimes inadequate.

reply
lunar_mycroft
1 hour ago
[-]
You can set the lints to `forbid` instead of `deny`, which means they can't be `allowed` like that.
reply
nesarkvechnep
1 hour ago
[-]
Yeah, but you must know about them and the possible bug first in order to allow them...
reply
Twey
1 hour ago
[-]
Hence ‘sadly’. IMNSHO both of these (or at least _untyped) should be enabled by default. Untyped `let _` is too big a footgun during refactorings.
reply
Joker_vD
1 hour ago
[-]
At which point you wouldn't have written this bug in the first place; or the warnings would trigger immediately, you'd change _ to an actual variable and then remove the warning pragmas because now you don't assign to _.
reply
Twey
1 hour ago
[-]
`Poll` is marked `#[must_use]` so if you were assigning to something other than `_` you'd get a warning that you're ignoring the `Pending` path. The Clippy lint is only for `_` which Rust considers a use by default.
reply
edelbitter
6 hours ago
[-]
Cloudflare does not notice (until a customer complains) that they are sending broken responses at scale? I would have thought they would notice this from sampling and linting a few replies.. just in case they did something like Cloudbleed again.
reply
ramon156
3 hours ago
[-]
Can you get reasonable results without exposing sensitive info? I'm asking because I genuinely have no idea what it's like at their scale
reply
worldsavior
3 hours ago
[-]
> We spent six weeks chasing a nearly invisible bug — a race condition that occurred only under specific conditions — in the hyper library that impacted how the Images binding returned processed image data back to the client. In the end, it took four lines of code to fix it.

That's a long time, must be frustrating.

reply
microgpt
2 hours ago
[-]
Would using Rust have prevented this?
reply
Ygg2
1 minute ago
[-]
No. Anyone expecting that hasn't read No Silver Bullet essay.
reply
geodel
24 minutes ago
[-]
Agree. This is warning to people who thought Rust is optional at cloud scale.
reply
re-thc
2 hours ago
[-]
Isn't this already Rust?
reply
pjmlp
2 hours ago
[-]
That was obviously a joke question, pointing that Rust isn't the solution for everything.
reply
lelanthran
1 hour ago
[-]
Woosh :-)
reply
Cthulhu_
2 hours ago
[-]
The Hyper library in question is a Rust library.

Did you read the article, or are you a "use rust" parrot / bot based on titles?

reply
waysa
2 hours ago
[-]
Sarcasm. (I guess)
reply
frankharv
1 hour ago
[-]
Obviously written by a C freak using a BSD
reply
pseudony
1 hour ago
[-]
So “fearless concurrency” still only happens when one just decides to not be afraid… :)
reply
c0balt
1 hour ago
[-]
This does not appear to be a concurrency bug though?
reply
microgpt
42 minutes ago
[-]
Of course it's a concurrency bug. It races sending data to the kernel against the kernel sending data to the network. If the wrong one wins the bug occurs.
reply
pseudony
1 hour ago
[-]
“ a race condition that occurred only under specific conditions — in the hyper library”
reply
nopurpose
1 hour ago
[-]
Nice writeup, but I don't understand how `curl` didn't trigger bug for them (or any other hyper HTTP server out there), given the explanation in the article.

`curl --http1.1` sends `Connection: Close` so sender (hyper) must attempt to shutdown connection after sending whole body. Surely any network is slower than memory copy into socket kernel buffers, so it must reliably trigger condition "buffer flush can't be done in one go" and thus trigger early TCP shutdown.

reply
Thaxll
2 hours ago
[-]
So much for Rust forcing you to handle errors.
reply
jerf
3 minutes ago
[-]
There's a hidden equivocation there. "Handling" errors, as far as the language is concerned, mean you do something with them, but explicitly discarding them is most definitely a "something".

From a human perspective we can consider that not handling the error.

But the language has no mechanism for "knowing" that discarding the error is wrong. Discarding errors is a fully valid mechanism that we must be able to do in a program because it is sometimes correct. There really isn't even a sensible way to define a way to "force" a user to "handle" errors. The language can only be designed to make it hard to forget to "handle" them somehow in the way the language sees, but it is always possible for the user to incorrectly handle them, of which discarding them when they shouldn't have is only one particularly cognitively-available option but is hardly the full scope of possibilities. Probably isn't even the most common mistake to make, I would imagine there are far more errors that are not handled "correctly" than ones that are spuriously discarded.

Note I keep saying "language" rather than Rust. All a language can do is surface the issue, and Rust does that. It can't force good code. No language can.

reply
Matl
36 minutes ago
[-]
Go does force you too, but it also supports _ as a bypass - because sometimes you do know better. Just not in this case.

Rust never promised it'll let programmers turn off their brain, that's what LLMs are for.

reply
wongarsu
2 hours ago
[-]
You could argue the bug happened exactly because hyper's poll_flush treats flushing some but not all data as a successful return, not an error case.
reply
atoav
2 hours ago
[-]
You could say the exact same thing about safety belts and airbags in cars after someone has died in a crash.

Why even bother with measures that prevent many problems if they won't prevent all of them, right?

reply
chlorion
36 minutes ago
[-]
This is the argument I like too.

It's the same argument anti-vaxers love to make. "Well you can still get covid after getting the shot", which is something I read and heard quite a lot. That doesn't make the thing useless.

Humans are really dumb.

reply
100ms
4 hours ago
[-]
> The failure was caused by a timing-dependent race condition in hyper’s HTTP/1 connection handling. When the reader was slower and the socket buffer filled, poll_flush returned Poll::Pending, but the dispatch loop discarded that result. Hyper then treated the response as complete and shut down the socket while data remained buffered internally, causing the client to receive an EOF before the full body arrived.

https://github.com/hyperium/hyper/issues/4022

Saved you 3000 words

reply
michalc
3 hours ago
[-]
Reminds me of another “slow client”-related bug in gunicorn: https://github.com/benoitc/gunicorn/issues/3334
reply
microgpt
2 hours ago
[-]
That's not even a bug. That's how TCP works. If you keep sending data to a socket the other side has closed, you get RST.
reply
edelbitter
2 hours ago
[-]
In case of plain HTTP over TCP, there is even a hint in the spec about why and how a server might want to avoid fully closing prematurely.

https://datatracker.ietf.org/doc/html/rfc9112#section-9.6 (this was already in https://datatracker.ietf.org/doc/html/rfc7230#section-6.6)

reply
NooneAtAll3
38 minutes ago
[-]
what is RST?
reply
algoth1
3 hours ago
[-]
I wonder if this bug was found via project glasswing
reply
re-thc
2 hours ago
[-]
> I wonder if this bug was found via project glasswing

Did you read how they said it took weeks? Would run out of tokens at that rate...

reply