FilterHN

hobofan

1 year ago

[-]

I think the article should be called "How do Go standard library HTTP servers figure out Content-Length?".

In most HTTP server implementations from other languages I've worked with I recall having to either:

- explicitly define the Content-Length up-front (clients then usually don't like it if you send too little and servers don't like it if you send too much)

- have a single "write" operation with an object where the Content-Length can be figured out quite easily

- turn on chunking myself and handle the chunk writing myself

I don't recall having seen the kind of automatic chunking described in the article before (and I'm not too sure whether I'm a fan of it).

1 year ago

[-]

I believe the closest prior art would be PHP. It buffers a response by default until the buffer is full or `flush()` gets called, and will automatically set `Content-Encoding: chunked` if `Content-Length` wasn't explicitly set. Any subsequent writes will be automatically chunked.

This approach makes sense from the API standpoint because the caller generally has no idea whether the chunked encoding is necessary, or even its very existence. Honestly that's less confusing than what express.js does to the middleware function: `app.get("/", (req, res) => { ... })` and `app.get("/", (req, res, next) => { ... })` behave differently because it tries to infer the presence of `next` by probing `Function.prototype.length`.

nolok

1 year ago

[-]

Fun thing about that : core PHP used to and still is very very close to HTTP, to the point where I would say your average decent PHP programmer used to knows more about how HTTP work that your average other similar language where the web library abstract stuff. Eg a PHP dev knows a form has to be multipart/form-data if you send files etc ...

But one of the if not THE major exception is this : buffering and flushing works automagically and a lot of PHP dev end up massively blindsinded by it at some point

PS: with the rise of modern PHP and it's high quality object based framework, this become less and less true

PS2: I am not in ANY way saying anything good or bad or superior or inferior about any dev here, just a difference in approach

matsemann

1 year ago

[-]

This gave me flashbacks to my youth of PHP programming. " headers already sent by (output started at ....", when you tried to modify headers but had already started to write the http content (hence the header part was set in stone)

nolok

1 year ago

[-]

And of course it was a space before or after the php delimiters in some random file.

marcosdumay

1 year ago

[-]

Good thing zero width space wasn't common in texts back then when PHP was the most common platform.

Err, https://w3techs.com/technologies/overview/programming_langua... :

rstuart4133

1 year ago

[-]

    PHP 75.8%
    Ruby 6.0%
    ....

JodieBenitez

1 year ago

[-]

Well... not sure it was that magic. We used to play a lot with the ob_* functions.

nolok

1 year ago

[-]

Oh I didn't mean it figured stuff out in a smart way, only that it did auto buffering/chuncking/flushing for you in a way to abstract that whole idea from the dev, while other plateform had you care about it (cf above messages).

But yeah the moment you ended up wanting to do anything advanced, you were doing your own buffer on top of that anyway, or disabling it and going raw.

JodieBenitez

1 year ago

[-]

and the joy of "pure php scripts" with closing ?> tags messing with the output when you didn't want it... all in libraries you had no say...

I can't say I miss those days ! Or this platform for that matter.

1 year ago

[-]

Eventually people started omitting ?> at the very end, which is correct but also unsettling.

earthboundkid

1 year ago

[-]

When I started my first job, I was like "Hey, these files don't have a closing ?>!!" and then my boss sighed.

gary_0

1 year ago

[-]

It's even recommended in the PHP docs: https://www.php.net/manual/en/language.basic-syntax.phptags....

stefs

1 year ago

[-]

so irksome having to leave the tags unbalanced!

everforward

1 year ago

[-]

I may be alone here, but I don't it find it that absurd (though the implementation may be if JS doesn't support this well, no idea). This would be crazy in languages that actually enforce typing, but function type signature overloading to alter behavior seems semi-common in languages like Python and JS.

Entirely unrelated, but the older I get, the more it seems like exposing the things under ".prototype" as parts of the object was probably a mistake. If I'm not mistaken, that is reflection, and it feels like JS reaches for reflection much more often than other languages. I think in part because it's a native part of the object rather than a reflection library, so it feels like less of an anti-pattern.

1 year ago

[-]

To be clear, distinguishing different types based on arity would have been okay if JS was statically typed or `Function` exposed more thorough information about its signature. `Function.prototype.length` is very primitive (it doesn't count any spread argument, partly because it dates back to the first edition of ECMAScript) and there is even no easy way to override it like Python's `@functools.wraps`. JS functions also don't check the number of given arguments at all, which is already much worse compared to Python but anyway, JS programmers like me would have reasonably expected excess arguments to be simply ignored.

everforward

1 year ago

[-]

> JS functions also don't check the number of given arguments at all

I never really thought about this, but it does explain how optional arguments without a default value work in Typescript. How very strange of a language decision.

> To be clear, distinguishing different types based on arity would have been okay if JS was statically typed or `Function` exposed more thorough information about its signature.

I actually like this less in a system with better typing. I don't personally think it's a good tradeoff to dramatically increase the complexity of the types just to avoid having a separate method to register a chunked handler. It would make more sense to me to have "app.get()" and "app.getChunked()", or some kind of closure that converts a chunked handler to something app.get() will allow, like "app.get(chunked((req, res, next) => {}))".

The typing effectively becomes part of the control flow of the application, which is something I tend to prefer avoiding. Data modelling should model the domain, code should implement business logic. Having data modelling impact business logic feels like some kind of recursive anti-pattern, but I'm not quite clever enough to figure out why it makes me feel that way.

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

lolinder

1 year ago

[-]

> Honestly that's less confusing than what express.js does to the middleware function: `app.get("/", (req, res) => { ... })` and `app.get("/", (req, res, next) => { ... })` behave differently because it tries to infer the presence of `next` by probing `Function.prototype.length`.

This feels like a completely random swipe at an unrelated feature of a JavaScript framework, and I'm not even sure that it's an accurate swipe.

The entire point of Function.length (slight nit: Function.prototype.length is different and is always zero) is to check the arity of the function [0]. There's no "tries to": if your middleware function accepts three arguments then it will have a length of 3.

Aside from that, I've also done a bunch of digging and can't find any evidence that they're doing [1]. Do you have a source for the claim that this is what they're doing?

[1] https://github.com/search?q=repo%3Aexpressjs%2Fexpress%20%22...

[1] https://github.com/expressjs/express/commit/76e8bfa1dcb7b293...

1 year ago

[-]

Because we were talking about HTTP server frameworks, it seemed not that problematic to mention one of the most surprising things I've seen in this space. Not necessarily JS bashing, but sorry for that anyway.

I'm surprised to see that it's now gone too! The exact commit is [1], which happened before Express.js 4.7, and you can search for the variable name `arity` in any previous versions to see what I was talking. It seems that my memory was slightly off as well, my bad. The correct description would be that older versions of Express.js used to distinguish "error" callbacks from normal router callbacks by their arities, so `(req, res)` and `(req, res, next)` would have been thankfully okay, while any extra argument added by an accident will effectively disable that callback without any indication. It was a very good reason for me to be surprised and annoyed at that time.

[0] https://github.com/pillarjs/router/blob/2e7fb67ad1b0c1cd2d9e...

LegionMammal978

1 year ago

[-]

Actually, it still uses ≤ 3 vs. = 4 arguments to distinguish between request callbacks and error callbacks. Check out the added lines to lib/router/layer.js in the commit you mention, or the equivalent functions in the current router v2.0.0 package [0].

cxr

1 year ago

[-]

> The entire point of Function.length (slight nit: Function.prototype.length is different and is always zero)

If you want to go that route, it's not Function.length, either—which is different and is always 1 (barring potential future spec changes that change the arity of the Function global).

dbrueck

1 year ago

[-]

Chunked transfer encoding can be a pain, but it's a reasonable solution to several problems: when the response is too big to fit into memory, when the response size is unknown by the HTTP library, when the response size is unknown by the caller of the HTTP library, or when the response doesn't have a total size at all (never-ending data stream).

danudey

1 year ago

[-]

> explicitly define the Content-Length up-front (clients then usually don't like it if you send too little and servers don't like it if you send too much)

We had a small router/firewall thing at a previous company that had a web interface, but for some reason its Content-Length header had an off-by-one error. IIRC Chrome handled this okay (once the connection was closed it would display the content) while Firefox would hang waiting for that one extra byte that never came.

ahoka

1 year ago

[-]

Some frameworks do automatic chunking when you pass a stream as the response body.

pkulak

1 year ago

[-]

And if you set your own content length header, most http servers will respect it and not chunk. That way, you can stream a 4-gig file that you know the size of per the metadata. This makes downloading nicer because browsers and such will then show a progress bar and time estimate.

However, you better be right! I just found a bug in some really old code that was gzipping every response when it was appropriate (ie, asked for, textual, etc). But it was ignoring the content-length header! So, if it was set manually, it would then be wrong after compression. That caused insidious bugs for years. The fix, obviously, was to just delete that manual header if the stream was going to be compressed.

dotancohen

1 year ago

[-]

  > That caused insidious bugs for years.

A lot of people here could probably benefit professionally from hearing about what the bugs were. Knowing what to identify in the future could be really helpful. Thanks.

pkulak

1 year ago

[-]

Well, for years our logs would fill up with these nasty warnings of "connection dropped", something like that. Naturally, you think that's just some badly configured client, mobile connection, something. But then why would it be configured to log that as a warning (or it may have even triggered an error). I think that was because when payloads are small the compression overhead makes them larger, which means the Content-Length is too small and clients would be terminating the connection early. And getting garbage or truncated responses. Ouch!

dotancohen

1 year ago

[-]

Insidious was the right word. Thank you!

bugtodiffer

1 year ago

[-]

Did you see if you could turn this into HTTP Request Smuggling? Or something else with security impact?

Sounds like a powerful bug you have, potentially.

knallfrosch

1 year ago

[-]

To me it sounds like the server handled the request just fine and reused the header (which was wrong.) The client then had the problem of a wrong response.

guappa

1 year ago

[-]

If you say to read 1000 bytes from memory and then pass a 900 bytes long array, that's a security bug that can cause crash, corrupt data, and leaking stuff that shouldn't have leaked.

michaelmior

1 year ago

[-]

It's a bug for sure, but I think whether it's a security issue could depend on the language. If the callee is able to determine the length of the array, it can just return an error instead of a potential buffer overrun.

stickfigure

1 year ago

[-]

In this case, the 1000 bytes aren't being read from memory, they're being read from a socket. If you try to over-read from a socket the worst you'll get is a blocking call or an error (depending what mode you're in).

jpc0

1 year ago

[-]

The size of the buffer and how many bytes are written have nothing intrinsically linked to what the header says. It's a bug sure but does not mean there's any security issue on the server.

guappa

1 year ago

[-]

It will likely generate corrupt files on the client as well.

Aachen

1 year ago

[-]

Not very. The system might allocate that length ahead of time (I've seen that option in torrent clients and iirc ftp systems) but, latest by the time a FIN comes in, it'll know the file is finished and can truncate it. If finished-early downloads are not implemented despite it doing preallocation, that's still not a security bug

guappa

1 year ago

[-]

if a FIN comes, the client will mark the file as partially downloaded.

but it might not come, since decades http sends more than one file per connection, so it might just get the beginning of the next reply, write that and the next reply will be corrupt as well.

bugtodiffer

1 year ago

[-]

maybe response based trickery then? :D What happens to the response after that one, are the first 100 bytes cut of, or what?

I'm pretty sure something like this can cause some form of HTTP desync in a loadbalancer/proxy setup.

https://www.oreilly.com/library/view/high-performance-web/97...

simonjgreen

1 year ago

[-]

Along this theme of knowledge, there is the lost art of tuning your page and content sizes such that they fit in as few packets as possible to speed up transmission. The front page of Google for example famously fitted in a single packet (I don't know if that's still the case). There is a brilliant book that used to be a bit of a bible in the world of web sysadmin from the Yahoo Exceptional Performance Team which is less relevant these days but interesting to understand the era.

NelsonMinar

1 year ago

[-]

See also the 14KB website article: https://endtimes.dev/why-your-website-should-be-under-14kb-i...

Optimizing per-packet really improves things but has gotten very difficult with SSL and now QUIC. I'm not sure Google ever got the front page down to a single packet (would love a reference!) but it definitely paid very close attention to every byte and details of TCP performance.

[1]: https://hpbn.co/building-blocks-of-tcp/#slow-start

divbzero

1 year ago

[-]

The “Slow-Start” section [1] of Ilya Grigorik’s High Performance Browser Networking also has a good explanation for why 14 KB is typically the size of the initial congestion window.

ryantownsend

1 year ago

[-]

iirc, most content delivery networks have now configured initcwnd to be around 40, meaning ~58kb gets sent within the TCP slow start window and therefore 14kb is no longer relevant to most commercial websites (at least with H1/H2, as you mentioned QUIC/H3 uses UDP so it's different)

https://www.cdnplanet.com/blog/initcwnd-settings-major-cdn-p...

divbzero

1 year ago

[-]

When and where you heard that initcwnd is typically 40 for most CDNs?

I was curious but the most recent data I could find was from 2017 when there was a mix of CDNs at initcwnd=10 and initcwnd>10:

Currently Linux still follows RFC6928 and defaults to initcwnd=10:

https://github.com/torvalds/linux/blob/v6.11/include/net/tcp...

eastbound

1 year ago

[-]

And the good olden time when IE only supported 31KB of Javascript.

recursive

1 year ago

[-]

It's time to bring back that rule.

o11c

1 year ago

[-]

I recently did a deep dive into the history of JavaScript standards:

  (pre-ecmascript versions of JS not investigated)
  EcmaScript 1(1997) = JavaScript 1.1 - missing many ES3 features (see below), of which exceptions are the unrecoverable thing.
  EcmaScript 2(1998) - minimal changes, mostly deprecations and clarifications of intent, reserve Java keywords
  EcmaScript 3(1999) - exceptions, regexes, switch, do-while, instanceof, undefined, strict equality, encodeURI* instead of escape, JSON, several methods on Object/String/Array/Date
  EcmaScript 4(2003) - does not exist due to committee implosion
  EcmaScript 5(2009) - strict mode, getters/setters, remove reservations of many Java keywords, add reservation for let/yield, debugger, many static functions of Object, Array.isArray, many Array methods, String().trim method, Date.now, Date().toISOString, Date().toJSON
  EcmaScript 5.1(2011) - I did not notice any changes compared to ES5, likely just wording changes. This is the first one that's available in HTML rather than just PDF.
  EcmaScript 6(2015) - classes, let/const, symbols, modules (in theory; it's $CURRENTYEAR and there are still major problems with them in practice), and all sorts of things (not listed)
  EcmaScript 11(2020) - bigint, globalThis

If it were up to me, I'd restrict the web to ES3 with ES5 library features, let/const from ES6, and bigint/globalThis from ES2020. That gives correctness and convenience without tempting people to actually try to write complex logic in it.

There are still pre-ES6 implementations in the wild (not for the general web obviously) ... from what I've seen they're mostly ES5, sometimes with a few easy ES6 features added.

benmmurphy

1 year ago

[-]

It is interesting how bad some of the http2 clients in browsers send the first http2 request on a connection. It’s often possible to smush it all into 1 TCP packet but browsers are often sending the request in 3 or 4 packets. I’ve even seen some server side bot detection systems check for this brain dead behaviour to verify it’s really a browser making the request. I think this is due to the way all the abstractions interact and the lack of a corking option for the TLS library.

flohofwoe

1 year ago

[-]

Unfortunately the article doesn't mention compression, because this is where it gets really ugly (especially with range requests), because IIRC the content-size reported in http responses and the range defined in range requests are on the compressed data, but at least in browsers you only get the uncompressed data back and don't even have access to the compressed data.

1 year ago

[-]

This is why in 2024 you still must use XmlHttpRequest instead of fetch() when progress reporting is needed. fetch() cannot do progress reporting on compressed streams.

shakna

1 year ago

[-]

Once the header is read, you can iterate over the ReadableStream, though, can't you?

1 year ago

[-]

1. You know the size of the compressed data from the Content-Length header.

2. You can iterate through the uncompressed response bytes with a ReadableStream.

Please explain how would you produce a progress percentage from these?

jstanley

1 year ago

[-]

Recompress the ReadableStream to work out roughly how long the compressed version is, and use the ratio of the length of your recompressed stream to the Content-Length to work out an approximate progress percentage.

1 year ago

[-]

Lol! Upvoted for creative thinking.

leptons

1 year ago

[-]

Many large files that get downloaded are already compressed, so using compression during http response is pointless and just adds overhead.

If you need progress on a text file, then don't compress it while downloading. Text files that are small won't really need progress or compression.

If you're sending a large amount of data that can be compressed, then zip it before sending it, download-with-progress (without http compression), and then unzip the file in the browser and do what you need with the contents.

I'm sure there are probably other ways to handle it too.

1 year ago

[-]

>I'm sure there are probably other ways to handle it too.

Yes, there is. The simplest solution is called XmlHttpRequest, which has a "progress" event that correctly reports the response body bytes.

You can even make a wrapper for XmlHttpRequest and call it "myFetch()" if you insist.

leptons

1 year ago

[-]

That doesn't change the fact that it is possible to use fetch with progress.

1 year ago

[-]

Sure, it is possible to use fetch() with progress, as long as Content-Encoding is not used. Which is precisely what I've said in my first comment.

leptons

1 year ago

[-]

And I've explained why, in most cases, Content-Encoding doesn't need to be used for async download with progress monitoring.

lmz

1 year ago

[-]

If you had control of both ends you could embed a header in the uncompressed data with the number of uncompressed bytes.

badmintonbaseba

1 year ago

[-]

"processed uncompressed bytes"/"all uncompressed bytes" is a distorted progress indication anyway.

mananaysiempre

1 year ago

[-]

Or put that length in the response headers.

1 year ago

[-]

You won't be able to do this if you're downloading from a CDN. Which is exactly where you would host large files, for which progress reporting really matters.

floating-io

1 year ago

[-]

Why not? Every CDN I've ever worked with preserves custom headers.

1 year ago

[-]

Right. For example S3 supports custom headers, as long as that header happens to start with "x-amz-meta-..." - and now your progress reporting is tied to your CDN choice!

Not sure about you, but to me "XmlHttpRequest" in my request handling code feels less dirty than "x-amz-meta-". But to each their own I guess.

floating-io

1 year ago

[-]

S3 is not a CDN.

Check CloudFlare, Akamai, ...

1 year ago

[-]

Good idea. We could give it a new MIME type too. E.g. application/octet-stream-with-length-prefix-due-to-idiotic-fetch-api

Or we can keep using XmlHttpRequest.

Tough choice.

jaffathecake

1 year ago

[-]

The results might be totally different now, but back in 2014 I looked at how browsers behave if the resource is different to the content-length https://github.com/w3c/ServiceWorker/issues/362#issuecomment...

Also in 2018, some fun where when downloading a file, browsers report bytes written to disk vs content-length, which is wildly out when you factor in gzip https://x.com/jaffathecake/status/996720156905820160

AndrewStephens

1 year ago

[-]

When I worked on a commercial HTTP proxy in the early 2000s, it was very common for servers to return off-by-one values for Content-Length - so much so that we had to implement heuristics to ignore and fix such errors.

It may be better now but a huge number of libraries and frameworks would either include the terminating NULL byte in the count but not send it, or not include the terminator in the count but include it in the stream.

aragilar

1 year ago

[-]

Note that there can be trailer fields (the phrase "trailing header" is both an oxymoron and a good description of it): https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Tr...

askvictor

1 year ago

[-]

Seems as though "Footer" might have been a better term to use given that it's the opposite of a header in typographical terms, and that trailer, in movie-talk, is something that comes before a movie

eastbound

1 year ago

[-]

Oh yeah, put the Content-Length after the content, when you know its size! /j

https://notes.benheater.com/books/web/page/multipart-forms-a...

matthewaveryusa

1 year ago

[-]

Next up is how forms with (multiple) attachments are uploaded with Content-Type=multipart/form-data; boundary=$something_unique

jillesvangurp

1 year ago

[-]

It's a nice exercise in any web framework to figure out how you would serve a big response without buffering it in memory. This can be surprisingly hard with some frameworks that just assume that you are buffering the entire response in memory. Usually, if you look hard there is a way around this.

Buffering can be appropriate for small responses; or at least convenient. But for bigger responses this can be error prone. If you do this right, you serve the first byte of the response to the user before you read the last byte from wherever you are reading (database, file system, S3, etc.). If you do it wrong, you might run out of memory. Or your user's request times out before you are ready to respond.

This is a thing that's gotten harder with non-blocking frameworks. Spring Boot in particular can be a PITA on this front if you use it with non-blocking IO. I had some fun figuring that out some years ago. Using Kotlin makes it slightly easier to deal with low level Spring internals (fluxes and what not).

Sometimes the right answer is that it's too expensive to figure out the content length, or a content hash. Whatever you do, you need to send the headers with that information before you send anything else. And if you need to read everything before you can calculate that information and send it, your choices are buffering or omitting that information.

jerf

1 year ago

[-]

"This can be surprisingly hard with some frameworks that just assume that you are buffering the entire response in memory. Usually, if you look hard there is a way around this."

This is the #1 most common mistake made by a "web framework".

Before $YOU jump up with a list of exceptions, it slowly gets better over time, and it has been getting better for a while, and there are many frameworks in the world, so the list that get it right is quite long. But there's still a lot of frameworks out there that assume this, that consider streaming to be the "exception" rather than non-streaming being a special case of streaming, and I still see new people make this mistake with some frequency, so the list of frameworks that still incorporate this mistake into their very core is also quite long.

My favorite is when I see a new framework sit on top of something like Go that properly streams, and it actively wrecks the underlying streaming capability to turn an HTTP response into a string.

Streaming properly is harder in the short term, but writing a framework where all responses are strings becomes harder in the long term. You eventually hit the wall where that is no longer feasible, but then, fixing it becomes very difficult.

Simply not sending a content-length is often the right answer. In an API situation, whatever negative consequences there are are fairly muted. The real problem I encounter a lot is when I'm streaming out some response from some DB query and I encounter a situation that I would have yielded a 500-type response for after I've already streamed out some content. It can be helpful to specify in your API that you may both emit content and an error and users need to check both. For instance, in the common case of dumping JSON, you can spec a top-level {"results": [...], "error": ...} as your return type, stream out a "results", but if a later error occurs, still return an "error" later. Arguably suboptimal, but requiring all errors to be known up front in a streaming situation is impossible, so... suboptimal wins over impossible.

https://github.com/lloeki/http-chunked-progress/blob/master/...

lloeki

1 year ago

[-]

Chunked progress is fun, not many know it supports more than just sending chunk size but can synchronously multiplex information!

e.g I drafted this a long time ago, because if you generate something live and send it in a streaming fashion, well you can't have progress reporting since you don't know the final size in bytes, even though server side you know how far you're into generating.

This was used for multiple things like generating CSV exports from a bunch of RDBM records, or compressed tarballs from a set of files, or a bunch of other silly things like generating sequences (Fibonacci, random integers, whatever...), that could take "a while" (as in, enough to be friendly and report progress).

oefrha

1 year ago

[-]

I once wrote a terminal-style interactive web app as a single chunked HTML page, because I couldn’t be bothered to implement a websocket endpoint. HTML and inline JS are interweaved and sent in reaction to user actions on the page. The only problem is browsers think the page never finishes loading.

eastbound

1 year ago

[-]

Couldn’t you put the initial page in its own separate html, and load the rest as a long-running JS file using AJAX?

oefrha

1 year ago

[-]

Sure, but browsers don't do streaming execution of JavaScript (need to download the entire script), so you then need to manually stream the response and do HTML patching / eval chunked JS, as opposed to the browser trivially loading HTML / executing inline JS as the HTML page is streamed in all by itself. It does solve the loading spinner problem.

lloeki

1 year ago

[-]

Ha I'm curious now, did useragents eventually time out due to lack of interaction? If so, were you sending NOOP keepalives?

oefrha

1 year ago

[-]

Browsers don't time out ongoing requests once it's receiving the body (not sure if there's a maximum time for receiving all the headers). ISPs may drop idle connections after periods of inactivity, especially if they use CGNAT. Since my app runs on LAN, there's no such problem, I don't do any heartbeat and I often pick up right where I left off the previous day. Heartbeats are easy too if necessary: just send  periodically.

dicroce

1 year ago

[-]

At least in the implementation I wrote the default way to provide the body was a string... which has a length. For binary data I believe the API could accept either a std::vector<uint8_t> (which has a size) or a pointer and a size. If you needed chunked transfer encoding you had to ask for it and then make repeated calls to write chunks (that each have a fixed length).

To me the more interesting question is how web server receive an incoming request. You want to be able to read the whole thing into a single buffer, but you don't know how long its going to be until you actually read some of it. I learned recently that libc has a way to "peek" at some data without removing it from the recv buffer..... I'm curious if this is ever used to optimize the receive process?

mikepurvis

1 year ago

[-]

Not sure about Linux, but for LwIP on embedded, the buffer isn’t continuous; it’s a linked list of preallocated pbuf objects. So you can either read directly from those if you play by their rules or if you really do need it in contiguous memory you call a function to copy from LwIP’s buffers to one you supply.

skrebbel

1 year ago

[-]

I thought I knew basic HTTP 1(.1), but I didn't know about trailers! Nice one, thanks.

marcosdumay

1 year ago

[-]

Half of the middleboxes on the way in the internet don't know about them either.

klempner

1 year ago

[-]

And browsers don't support them either, at least in any useful manner.

nraynaud

1 year ago

[-]

I have done crazy stuff to compute the content length of some payloads. For context one of my client works in cloud stuff and I worked in converting hdd format on the fly in a UI VM. The webserver that accepts the files doesn’t do chunked encoding. And there is no space to store the file. So I had to resort to passing over the input file once to transform it, compute its allocation table and transformed size, then throw away everything but the file and the table, restart the scan with the correct header and re-do the transformation.

Sytten

1 year ago

[-]

There is a whole class of attacks called HTTP Desync Attacks that target just that problem since it is hard to get that right, especially accross multiple different http stacks. And if you dont get it right the result.is that bytes are left on the TCP connections and read as the next request in case of a reuse.

6383353950

1 year ago

[-]

My account not open plz help me

6383353950

1 year ago

[-]

Help me sir

Am4TIfIsER0ppos

1 year ago

[-]

stat()?

TZubiri

1 year ago

[-]

len(response)

remon

1 year ago

[-]

Totally worth an article.

_ache_

1 year ago

[-]

> Anyone who has implemented a simple HTTP server can tell you that it is a really simple protocol

It's not. Like, hell no. That is so complex. Multiplexing, underlying TCP specifications, Server Push, Stream prioritization (vs priorization !), encryption (ALPN or NPN ?), extension like HSTS, CORS, WebDav or HLS, ...

It's a great protocol, nowhere near simple.

> Basically, it’s a text file that has some specific rules to make parsing it easier.

Nope, since HTTP/2 that is just a textual representation, not the real "on the wire" protocol. HTTP/2 is 10 now.

TickleSteve

1 year ago

[-]

He was referring to HTTP 1.0 & 1.1

mannyv

1 year ago

[-]

Parts of 1.1 are pretty complicated if you try and implement them. Parts of it are simple.

The whole section on cache is "reality based," and it's only gotten worse as the years have moved on.

Anyway, back in the day Content-Length was one of the fields you were never supposed to trust. There's really no reason to trust it now, but I suppose you can use it as a hint to see amount of buffer you're supposed to allocate. But of course, the content length may exceed that length, which would mean that if you did it incorrectly you'd copy the incoming request data past the end of the buffer.

So even today, don't trust Content-Length.

treflop

1 year ago

[-]

You can’t trust lengths in any protocol or format.

_ache_

1 year ago

[-]

Should have say that so (Nowhere does the article say ‘up to HTTP/1.1’ even talking about HTTP/2 and HTTP/3).

HTTP/1.0 is simple. HTTP/1.1 is undoubtedly more complex but manageable.

The statement that HTTP is simple is just not true. Even if Go makes it look easy.

Cthulhu_

1 year ago

[-]

Every example in the article explicitly states HTTP 1.1, only at the end does it have a remark about how HTTP 2 and 3 don't have chunking as they have their own streaming mechanisms. The basic HTTP protocol is simple, but 2/3 are no longer the basic HTTP protocols.

_ache_

1 year ago

[-]

My point is HTTP isn't HTTP/1.1. There is a lot under the wood, even with HTTP/1.1. Actually, the fact that the whole article explains an implementation of an HTTP header is against the fact that it's simple.

So when the article say "All HTTP requests look something like this", that's false, that is not a big deal but it spread that idea that HTTP is easy and it's not.

lionkor

1 year ago

[-]

Which is reasonably simple that you can build a complete 1.0 server in C in an afternoon, and add some 1.1 stuff like keep-alives, content-length, etc. I did that for fun once, on github.com/lionkor/http

secondcoming

1 year ago

[-]

That’s the easy part. The hard part is working around non-compliant third parties. HTTP is a real mess.

ozim

1 year ago

[-]

That's pretty much every protocol and working around non-compliant third parties is not specific protocol problem ;)

lionkor

1 year ago

[-]

true

dbrueck

1 year ago

[-]

The HTTP 1.1 spec isn't 175 pages long just for the fun of it. :)

pknerd

1 year ago

[-]

Why would someone implement the chunk logic when websockets are here? Am I missing something? What are the use cases?

rsynnott

1 year ago

[-]

HTTP/1.1 came out in 1997. It’s extremely well supported. Websockets were only standardised in 2011, and still have proxy traversal issues.

You can absolutely assume that http 1.1 will work on basically anything; websockets are more finicky even now, and certainly were back in the day.

masklinn

1 year ago

[-]

Websockets are also on the far side of useless when it comes to streaming content the user is downloading. Javascript-filled pages are not the only clients of http.

badmintonbaseba

1 year ago

[-]

It wouldn't be so bad, if web and application APIs made stream processing of messages possible. The protocol itself could handle streaming content just fine, or at least not worse than HTTP.

dylan604

1 year ago

[-]

> Javascript-filled pages are not the only clients of http.

Whaaaaa??? We should eliminate these non-JS filled nonsense immediately!

blueflow

1 year ago

[-]

chunked-encoding is a method of encoding an HTTP response body. The semantics for HTTP responses still apply, caching, compression, etc.

Websocket is a different protocol that is started up via HTTP.

mannyv

1 year ago

[-]

Web sockets have their own issues that can be/are implementation dependent.

For example, some websocket servers don't pass back errors to the client (AWS). That makes it quite difficult to, say, retry on the client side.

Chunked encoding is used by video players - so you can request X bytes of a video file. That means you don't have to download the whole file, and if the user closes the video you didn't waste bandwidth. There are likely more uses of it.

floating-io

1 year ago

[-]

> Chunked encoding is used by video players - so you can request X bytes of a video file. That means you don't have to download the whole file, and if the user closes the video you didn't waste bandwidth. There are likely more uses of it.

Just a nitpick, but what you describe here is byte range requests. They can be used with or without chunked encoding, which is a separate thing.

mannyv

1 year ago

[-]

Oh yeah, ugh - I get those two confused.

david422

1 year ago

[-]

I would say rule of thumb, websockets are for two way realtime communication, http chunked is just for 1 way streaming communication.